Tuesday, January 23, 2018

CFour 2.0beta, MPI v.s. OpenMP parallelism in SCF calculations

Preface: MPI parallelism in CFour for the xvscf module is significantly faster compared to OpenMP parallelism; setting the OpenMP compiled binary with $OMP_NUM_CORES can only get xvscf parallelization by 200%. The drawback of using MPI is that the numbers of scratch directories (rank000~rank###) and file-sizes under them are proportionally multiplied by the processors requested ### (set via $CFOUR_NUM_CORES).

Purpose: Generate converged SCF orbitals with fast OpenMPI calculation, then switch to OpenMP to carry out CCSD calculations, restarted by GUESS=MOREAD (from file OLDMOS ) with the touch JFSGUESS trick.

Binary: ccuf1, /pkg1/chem/c4/cfour_v2b_ompi3-i18/bin compiled by Intel compiler 18 and OpenMPI 3.0.0.

The calculation could be performed manually.

Test molecule: NHC radical intermediate [7.8], open-shell singlet geomertry optimized at M06-2X/6-31+G*.

export CFOUR_NUM_CORES=28

export PATH="/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games"

. /pkg1/intel/compilers_and_libraries_2018.1.163/linux/bin/compilervars.sh intel64

export PATH="/pkg1/chem/c4/cfour_v2b_ompi3-i18/bin:/pkg1/local/openmpi-3.0.0-i18/bin:$PATH"

Prepare ZMAT and GENBAS (and maybe also GENECP ) files. Then run by hand,
xcfour >& output.stdout &

Monitor system status with top and after all of the xvmol disappear, the SCF calculation begins by lots of xvscf processes.

Notes:

  1. MP2 module is not implemented with MPI parallelism. Add in ZMAT input with TREAT_PERTURBATION=SEQUENTIAL when using MPI-parallelized binaries for MP2-related calculations.
  2. Adding export MKL_NUM_CORES=2 might further benefit from MKL threading parallel.


PS.