Tuesday, January 23, 2018

CFour 2.0beta, MPI v.s. OpenMP parallelism in SCF calculations

Preface: MPI parallelism in CFour for the xvscf module is significantly faster compared to OpenMP parallelism; setting the OpenMP compiled binary with $OMP_NUM_CORES can only get xvscf parallelization by 200%. The drawback of using MPI is that the numbers of scratch directories (rank000~rank###) and file-sizes under them are proportionally multiplied by the processors requested ### (set via $CFOUR_NUM_CORES).

Purpose: Generate converged SCF orbitals with fast OpenMPI calculation, then switch to OpenMP to carry out CCSD calculations, restarted by GUESS=MOREAD (from file OLDMOS ) with the touch JFSGUESS trick.

Binary: ccuf1, /pkg1/chem/c4/cfour_v2b_ompi3-i18/bin compiled by Intel compiler 18 and OpenMPI 3.0.0.

The calculation could be performed manually.

Test molecule: NHC radical intermediate [7.8], open-shell singlet geomertry optimized at M06-2X/6-31+G*.


export PATH="/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games"

. /pkg1/intel/compilers_and_libraries_2018.1.163/linux/bin/compilervars.sh intel64

export PATH="/pkg1/chem/c4/cfour_v2b_ompi3-i18/bin:/pkg1/local/openmpi-3.0.0-i18/bin:$PATH"

Prepare ZMAT and GENBAS (and maybe also GENECP ) files. Then run by hand,
xcfour >& output.stdout &

Monitor system status with top and after all of the xvmol disappear, the SCF calculation begins by lots of xvscf processes.


  1. MP2 module is not implemented with MPI parallelism. Add in ZMAT input with TREAT_PERTURBATION=SEQUENTIAL when using MPI-parallelized binaries for MP2-related calculations.
  2. Adding export MKL_NUM_CORES=2 might further benefit from MKL threading parallel.


No comments: