编译并运行CESM

CESM是由美国国家大气中心于2010年发布的新一代地球系统模式,是目前最先进、使用最广泛的地球系统模式之一。CESM利用耦合器协同大气、海洋、陆面、海冰等分量模式进行气候模拟。因为应用比较复杂,所以要编译和运行该应用并不容易。在超算队的训练里这也是一道题目,在这里记录我在此次内培中的过程。

  • 实验环境:超算集群TH-2K

  • 使用包管理软件spack,提前配置好了依赖。

下载了CESM 2.2源码 后,根据文档中的说明执行,执行如下命令:

1
2
cd cesm1_2_2/scripts
./create_newcase -case test1 -res f45_g37 -compset X -mach userdefined

出现了如下的报错:

1
2
3
4
WARNING:
The perl module XML::LibXML is needed for XML parsing in the CESM script system.
Please contact your local systems administrators or IT staff and have them install it for
you, or install the module locally.

在这里需要加载perl的xml模块,用spack load一下:

1
spack load perl-xml-libxml@2.0201

再次执行create_newcase:

1
./create_newcase -case test1  -res f45_g37  -compset X  -mach userdefined

输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-------------------------------------------------------------------------------
For a list of potential issues in the current tag, please point your web browser to:
https://svn-ccsm-models.cgd.ucar.edu/cesm1/known_problems/
-------------------------------------------------------------------------------
grid longname is f45_g37
Component set: longname (shortname) (alias)
2000_XATM_XLND_XICE_XOCN_XROF_XGLC_XWAV (X) (X)
Component set Description:
XATM: XLND: Xrof: XICE: XOCN: XGLC: XWAV: present day:
Grid:
a%4x5_l%4x5_oi%gx3v7_r%r05_m%gx3v7_g%null_w%null (4x5_gx3v7)
ATM_GRID = 4x5 NX_ATM=72 NY_ATM=46
LND_GRID = 4x5 NX_LND=72 NX_LND=46
ICE_GRID = gx3v7 NX_ICE=100 NX_ICE=116
OCN_GRID = gx3v7 NX_OCN=100 NX_OCN=116
ROF_GRID = r05 NX_ROF=720 NX_ROF=360
GLC_GRID = 4x5 NX_GLC=72 NX_GLC=46
WAV_GRID = null NX_WAV=0 NX_WAV=0
Grid Description:
null is no grid: 4x5 is FV 4-deg grid: gx3v7 is Greenland pole v7 3-deg grid: r05 is 1/2 degree river routing grid:
Non-Default Options:
ATM_NCPL: 48
BUDGETS: FALSE
CCSM_CO2_PPMV: 379.000
COMP_ATM: xatm
COMP_GLC: xglc
COMP_ICE: xice
COMP_LND: xlnd
COMP_OCN: xocn
COMP_ROF: xrof
COMP_WAV: xwav
CPL_ALBAV: false
CPL_EPBAL: off
GLC_NEC: 10
OCN_NCPL: 1
OCN_TIGHT_COUPLING: FALSE
ROF_NCPL: $ATM_NCPL
SCIENCE_SUPPORT: NO

The PE layout for this case match these options:
CCSM_LCOMPSET = XATM|DATM.+CLM
GRID = a%4x5
Creating /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1
Created /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/env_case.xml
Created /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/env_mach_pes.xml
Created /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/env_build.xml
Created /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/env_run.xml
Locking file /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/env_case.xml
Successfully created the case for userdefined

然后执行setup:

1
2
cd test1
./cesm_setup

输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ERROR: must set xml variable OS to generate Macros file
ERROR: must set xml variable EXEROOT to build the model
ERROR: must set xml variable MPILIB to build the model
ERROR: must set xml variable NTASKS_LND to build the model
ERROR: must set xml variable NTASKS_ROF to build the model
ERROR: must set xml variable COMPILER to build the model
ERROR: must set xml variable NTASKS_CPL to build the model
ERROR: must set xml variable DIN_LOC_ROOT to build the model
ERROR: must set xml variable NTASKS_ICE to build the model
ERROR: must set xml variable RUNDIR to build the model
ERROR: must set xml variable NTASKS_GLC to build the model
ERROR: must set xml variable NTASKS_OCN to build the model
ERROR: must set xml variable MAX_TASKS_PER_NODE to build the model
ERROR: must set xml variable NTASKS_ATM to build the model
ERROR: must set xml variable NTASKS_WAV to build the model
ERROR: must set xml variable CESMSCRATCHROOT to build the model
Correct above and issue cesm_setup again

这里可以找到有关XML环境变量的说明,按文件修改xml变量如下:

env_build.xml

该文件要修改的变量如下:

OS

修改为LINUX

CESMSCRATCHROOT

修改为当前路径,在这里是:~/zhb/cesm1_2_2/scripts/test1

MPILIB

因为我们预期是采用Intel MPI,但是该xml文件中并没有Intel MPI可以选择,考虑到Intel MPI与mpich非常相似,因此我们可以将mpich的编译运行参数用在Intel MPI上,所以在这里将该值改为mpich。

EXEROOT

该变量为执行的文件夹,我们需要自己建立,在这里我选择了~/zhb/cesm1_2_2/scripts/test1/exe作为运行的路径。

COMPILER

修改为intel。

env_mach_pes.xml

MAX_TASKS_PER_NODE

由于使用的Intel KNL CPU是36核的,所以在这里将该变量改为36。

NTASKS_*

在这里有8个变量,不知道指定多少个任务好,就先都改为1

env_run.xml

RUNDIR

在EXEDIR下面创建run文件夹,在这里填入~/zhb/cesm1_2_2/scripts/test1/exe/run

DIN_LOC_ROOT

这里改为下载的数据所在的路径。

然后就遇到了一个比较离谱的错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
Creating Macros file for userdefined
/GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/ccsm_utils/Machines/config_compilers.xml intel userdefined
Creating batch script test1.run
Locking file env_mach_pes.xml
Creating user_nl_xxx files for components and cpl
Running preview_namelist script
syntax error at /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/models/drv/bld/build-namelist line 784, near "$model qw(cpl atm lnd ice ocn glc rof wav)"
Can\'t redeclare "my" in "my" at /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/models/drv/bld/build-namelist line 787, near "my"
Global symbol "$model" requires explicit package name (did you forget to declare "my $model"?) at /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/models/drv/bld/build-namelist line 787.
Execution of /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/models/drv/bld/build-namelist aborted due to compilation errors.
ERROR: cpl.buildnml.csh failed
ERROR: /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/preview_namelists failed: 25344

根据报错信息,找到对应的代码:

1
foreach my $model qw(cpl atm lnd ice ocn glc rof wav) 

在$model 后加上一对括号,如下:

1
foreach my $model (qw(cpl atm lnd ice ocn glc rof wav)) {

再次执行setup,终于成功:

1
2
3
4
5
6
Macros script already created ...skipping
Machine/Decomp/Pes configuration has already been done ...skipping
Running preview_namelist script
infile is /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/Buildconf/cplconf/cesm_namelist
See ./CaseDoc for component namelists
If an old case build already exists, might want to run test1.clean_build before building

修改Macros:

SLIBS后面加上-lnetcdff,MPICC改成mpiicc,MPICXX改成mpiicpc,MPIFC改成mpiifort。然后把MPI_PATH和NETCDF_PATH改为对应的路径。

执行test1.build:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-------------------------------------------------------------------------
CESM BUILDNML SCRIPT STARTING
- To prestage restarts, untar a restart.tar file into /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/run
infile is /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/Buildconf/cplconf/cesm_namelist
CESM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------
-------------------------------------------------------------------------
CESM PRESTAGE SCRIPT STARTING
- Case input data directory, DIN_LOC_ROOT, is /GPUFS/sysu_hpcedu_302/wyf/2021-train/CESM/inputdata
- Checking the existence of input datasets in DIN_LOC_ROOT
CESM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------
-------------------------------------------------------------------------
CESM BUILDEXE SCRIPT STARTING
rm: No match.
COMPILER is intel
- Build Libraries: mct gptl pio csm_share
Sat Apr 3 21:42:20 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/intel/mpich/nodebug/nothreads/mct.bldlog.210403-214216
Sat Apr 3 21:43:10 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/intel/mpich/nodebug/nothreads/gptl.bldlog.210403-214216
Sat Apr 3 21:43:13 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/intel/mpich/nodebug/nothreads/pio.bldlog.210403-214216
Sat Apr 3 21:43:54 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/intel/mpich/nodebug/nothreads/csm_share.bldlog.210403-214216
Sat Apr 3 21:44:30 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/atm.bldlog.210403-214216
Sat Apr 3 21:44:31 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/lnd.bldlog.210403-214216
Sat Apr 3 21:44:32 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/ice.bldlog.210403-214216
Sat Apr 3 21:44:32 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/ocn.bldlog.210403-214216
Sat Apr 3 21:44:33 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/glc.bldlog.210403-214216
Sat Apr 3 21:44:34 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/wav.bldlog.210403-214216
Sat Apr 3 21:44:35 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/rof.bldlog.210403-214216
Sat Apr 3 21:44:35 CST 2021 /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/cesm.bldlog.210403-214216
- Locking file env_build.xml
CESM BUILDEXE SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------

执行test1.run,输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-------------------------------------------------------------------------
CESM BUILDNML SCRIPT STARTING
- To prestage restarts, untar a restart.tar file into /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/exe/run
infile is /GPUFS/sysu_hpcedu_302/zhb/cesm1_2_2/scripts/test1/Buildconf/cplconf/cesm_namelist
CESM BUILDNML SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------
-------------------------------------------------------------------------
CESM PRESTAGE SCRIPT STARTING
- Case input data directory, DIN_LOC_ROOT, is /GPUFS/sysu_hpcedu_302/wyf/2021-train/CESM/inputdata
- Checking the existence of input datasets in DIN_LOC_ROOT
CESM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY
-------------------------------------------------------------------------
Sat Apr 3 21:49:31 CST 2021 -- CSM EXECUTION BEGINS HERE
Sat Apr 3 21:49:31 CST 2021 -- CSM EXECUTION HAS FINISHED
(seq_mct_drv): =============== SUCCESSFUL TERMINATION OF CPL7-CCSM ===============