Questions and remarks about code_saturne usage
Rodolphe
Posts: 18 Joined: Sun Mar 14, 2021 12:59 pm
Post
by Rodolphe » Wed Jun 16, 2021 2:38 pm
Hi,
I am working on a cluster with SLURM as batch system. I am computing the coupling between Code Saturne and Syrthès from the tutorial 'Three 2D disks' on CS website. I did compute the tutorial for both separately first but when I'm trying to perform the coupling, I've got an error message related to MPI. Indeed, once I perform the coupling case, the following message is displayed in the output file :
Code: Select all
mpiexec noticed that process rank 0 with PID 38883 on node lm3-w007 exited on signal 11 (Segmentation fault).
I don't really know what does it mean. I guess it is linked with the multi processing calculation. I join the compile.log from Saturne and Syrthès as well as the output files (error and out), setup file from Saturne and Syrthès, runcase_coupling, summary, coupling parameters and the meshes in the zip archive.
Note that I already tried multiprocessing with Code Saturne solely and it worked when using Metis and not Scotch for partition.
Thanks for your help !
Rodolphe
Attachments
FILES.zip
(1.28 MiB) Downloaded 292 times
Yvan Fournier
Posts: 4173 Joined: Mon Feb 20, 2012 3:25 pm
Post
by Yvan Fournier » Wed Jun 16, 2021 4:15 pm
Hello,
Do yo also have run_solver.log (for the fluid domain) and syrthes.log (or listing, I am not sure) for the solid domain ?
That would help determine where the issue appears.
Best regards,
Yvan
Rodolphe
Posts: 18 Joined: Sun Mar 14, 2021 12:59 pm
Post
by Rodolphe » Thu Jun 17, 2021 3:02 pm
Hello,
I did found a run_solver file but I don't know if it is the one you were talking about (see joined file).
The listing file for the solid domain is empty.
Best regards,
Rodolphe
Attachments
run_solver.txt
(2.14 KiB) Downloaded 255 times
Yvan Fournier
Posts: 4173 Joined: Mon Feb 20, 2012 3:25 pm
Post
by Yvan Fournier » Thu Jun 17, 2021 10:30 pm
Hello,
If you did not find a run_solver.log file, it means the computation crashed before creating it, at initialization.
I suspect an installation issue, probably with code_saturne and Syrthes using different MPI libraries.
Do you have any other messages in the terminal (or in the case of a batch system, in the job log file) ?
Otherwise, could you run and post the output of "ldd SOLID/syrthes" and "ldd FLUID/cs_solver" ? After loading the modules listed in run_solver.text...
Best regards,
Yvan
Rodolphe
Posts: 18 Joined: Sun Mar 14, 2021 12:59 pm
Post
by Rodolphe » Fri Jun 18, 2021 1:06 pm
Hello,
Since I've installed Code Saturne with the semi-automatic installation script, I did not specified what was the path to MPI libraries (I let the default options) while during the installation of Syrthès, I had to specify it in the setup file. How can I check which MPI libraries are used by Saturne ? (Sorry I'm quite a beginner in this domain).
I joined the job log files (one for the error message and one for the output text).
ldd FLUID/cs_solver gives :
Code: Select all
linux-vdso.so.1 => (0x00002aaaaaacd000)
libcs_solver-6.0.so => /home/ucl/tfl/rvanco/Code_Saturne/6.0.6/code_saturne-6.0.6/arch/Linux_x86_64/lib/libcs_solver-6.0.so (0x00002aaaaaccf000)
libsaturne-6.0.so => /home/ucl/tfl/rvanco/Code_Saturne/6.0.6/code_saturne-6.0.6/arch/Linux_x86_64/lib/libsaturne-6.0.so (0x00002aaaaaed5000)
libple.so.2 => /home/ucl/tfl/rvanco/Code_Saturne/6.0.6/code_saturne-6.0.6/arch/Linux_x86_64/lib/libple.so.2 (0x00002aaaac749000)
libcgns.so.3.3 => /home/ucl/tfl/rvanco/cgns/lib/libcgns.so.3.3 (0x00002aaaac95b000)
libmedC.so.11 => /home/ucl/tfl/rvanco/Code_Saturne/6.0.6/med-4.0.0/arch/Linux_x86_64/lib/libmedC.so.11 (0x00002aaaacf8e000)
libhdf5.so.100 => /opt/cecisw/arch/easybuild/2016b/software/HDF5/1.10.0-patch1-foss-2016b/lib/libhdf5.so.100 (0x00002aaaad2b6000)
libmetis.so => /opt/cecisw/arch/easybuild/2016b/software/METIS/5.1.0-foss-2016b/lib/libmetis.so (0x00002aaaaaad6000)
libmpi.so.12 => /opt/cecisw/arch/easybuild/2016b/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib/libmpi.so.12 (0x00002aaaad60b000)
libz.so.1 => /opt/cecisw/arch/easybuild/2016b/software/zlib/1.2.8-foss-2016b/lib/libz.so.1 (0x00002aaaaab56000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00002aaaad980000)
libgfortran.so.3 => /opt/cecisw/arch/easybuild/2016b/software/GCCcore/5.4.0/lib64/../lib64/libgfortran.so.3 (0x00002aaaaab7e000)
libquadmath.so.0 => /opt/cecisw/arch/easybuild/2016b/software/GCCcore/5.4.0/lib64/../lib64/libquadmath.so.0 (0x00002aaaadb84000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00002aaaadbc3000)
libgomp.so.1 => /opt/cecisw/arch/easybuild/2016b/software/GCCcore/5.4.0/lib64/../lib64/libgomp.so.1 (0x00002aaaaaca0000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00002aaaadec5000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00002aaaae0e1000)
libhdf5.so.103 => /home/ucl/tfl/rvanco/Code_Saturne/6.0.6/hdf5-1.10.6/arch/Linux_x86_64/lib/libhdf5.so.103 (0x00002aaaae4af000)
libstdc++.so.6 => /opt/cecisw/arch/easybuild/2016b/software/GCCcore/5.4.0/lib64/../lib64/libstdc++.so.6 (0x00002aaaaea74000)
libgcc_s.so.1 => /opt/cecisw/arch/easybuild/2016b/software/GCCcore/5.4.0/lib64/../lib64/libgcc_s.so.1 (0x00002aaaaebfb000)
libsz.so.2 => /opt/cecisw/arch/easybuild/2016b/software/Szip/2.1-foss-2016b/lib/libsz.so.2 (0x00002aaaaec12000)
/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00002aaaaec25000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002aaaaee3c000)
libpsm2.so.2 => /usr/lib64/libpsm2.so.2 (0x00002aaaaf055000)
libfabric.so.1 => /usr/lib64/libfabric.so.1 (0x00002aaaaf2bb000)
libopen-rte.so.12 => /opt/cecisw/arch/easybuild/2016b/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib/libopen-rte.so.12 (0x00002aaaaf617000)
libopen-pal.so.13 => /opt/cecisw/arch/easybuild/2016b/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib/libopen-pal.so.13 (0x00002aaaaf710000)
libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00002aaaaf7c0000)
libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00002aaaaf9c6000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00002aaaafbde000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002aaaafde6000)
libhwloc.so.5 => /opt/cecisw/arch/easybuild/2016b/software/hwloc/1.11.3-GCC-5.4.0-2.26/lib/libhwloc.so.5 (0x00002aaaaffe9000)
libnuma.so.1 => /opt/cecisw/arch/easybuild/2016b/software/numactl/2.0.11-GCC-5.4.0-2.26/lib/libnuma.so.1 (0x00002aaab0022000)
libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200 (0x00002aaab002d000)
libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00002aaab029a000)
libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1 (0x00002aaab04bb000)
libslurmfull.so => /usr/lib64/slurm/libslurmfull.so (0x00002aaab0711000)
libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00002aaab0adb000)
libuuid.so.1 => /usr/lib64/libuuid.so.1 (0x00002aaab0cea000)
ldd SOLID/syrthes gives :
Code: Select all
linux-vdso.so.1 => (0x00002aaaaaacd000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00002aaaaaccf000)
libple.so.2 => /home/ucl/tfl/rvanco/usr/local/lib/libple.so.2 (0x00002aaaaaae5000)
libmpi.so.40 => /opt/cecisw/arch/easybuild/2018b/software/OpenMPI/3.1.1-GCC-7.3.0-2.30/lib/libmpi.so.40 (0x00002aaaaaaf8000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00002aaaaafd1000)
/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
libmpi.so.12 => /opt/cecisw/arch/easybuild/2016b/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib/libmpi.so.12 (0x00002aaaab39f000)
libopen-rte.so.40 => /opt/cecisw/arch/easybuild/2018b/software/OpenMPI/3.1.1-GCC-7.3.0-2.30/lib/libopen-rte.so.40 (0x00002aaaaac05000)
libopen-pal.so.40 => /opt/cecisw/arch/easybuild/2018b/software/OpenMPI/3.1.1-GCC-7.3.0-2.30/lib/libopen-pal.so.40 (0x00002aaaab714000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00002aaaab7de000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002aaaab9e6000)
libhwloc.so.5 => /opt/cecisw/arch/easybuild/2016b/software/hwloc/1.11.3-GCC-5.4.0-2.26/lib/libhwloc.so.5 (0x00002aaaabbe9000)
libnuma.so.1 => /opt/cecisw/arch/easybuild/2016b/software/numactl/2.0.11-GCC-5.4.0-2.26/lib/libnuma.so.1 (0x00002aaaaacc0000)
libpciaccess.so.0 => /opt/cecisw/arch/easybuild/2016b/software/X11/20160819-foss-2016b/lib/libpciaccess.so.0 (0x00002aaaabc22000)
libxml2.so.2 => /opt/cecisw/arch/easybuild/2016b/software/libxml2/2.9.4-foss-2016b/lib/libxml2.so.2 (0x00002aaaabc2b000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00002aaaabd93000)
libz.so.1 => /opt/cecisw/arch/easybuild/2016b/software/zlib/1.2.8-foss-2016b/lib/libz.so.1 (0x00002aaaabf97000)
liblzma.so.5 => /opt/cecisw/arch/easybuild/2016b/software/XZ/5.2.2-foss-2016b/lib/liblzma.so.5 (0x00002aaaabfad000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00002aaaabfd3000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00002aaaac1f0000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002aaaac407000)
libpsm2.so.2 => /usr/lib64/libpsm2.so.2 (0x00002aaaac620000)
libfabric.so.1 => /usr/lib64/libfabric.so.1 (0x00002aaaac887000)
libopen-rte.so.12 => /opt/cecisw/arch/easybuild/2016b/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib/libopen-rte.so.12 (0x00002aaaacbe3000)
libopen-pal.so.13 => /opt/cecisw/arch/easybuild/2016b/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib/libopen-pal.so.13 (0x00002aaaaccdc000)
libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00002aaaacd8d000)
libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00002aaaacf93000)
libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200 (0x00002aaaad1ac000)
libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00002aaaad419000)
libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1 (0x00002aaaad63a000)
libgcc_s.so.1 => /opt/cecisw/arch/easybuild/2016b/software/GCCcore/5.4.0/lib64/libgcc_s.so.1 (0x00002aaaad891000)
libslurmfull.so => /usr/lib64/slurm/libslurmfull.so (0x00002aaaad8a8000)
libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00002aaaadc73000)
libuuid.so.1 => /usr/lib64/libuuid.so.1 (0x00002aaaade82000)
Best regards,
Rodolphe
Attachments
job_69955655.out.log
(2.84 KiB) Downloaded 276 times
job_69955655.err.log
(46.47 KiB) Downloaded 258 times
Rodolphe
Posts: 18 Joined: Sun Mar 14, 2021 12:59 pm
Post
by Rodolphe » Fri Jun 18, 2021 3:54 pm
Hello,
As you suspected, it was a problem of MPI libraries that were different. I did re-installed Syrthès with the same MPI libraries than Saturne and now the run_solver.log file does appear in the RESU_COUPLING directory when I launch a computation.
But I still have a problem, the computation still stops due to another error now :
Code: Select all
/home/users/r/v/rvanco/ceci/code_saturne-6.0.6/libple/src/ple_locator.c:2882: Erreur fatale.
Locator trying to use distant space dimension 3
with local space dimension 2
Pile d'appels :
1: 0x2aaaac74c5c0 <ple_locator_extend_search+0x250> (libple.so.2)
2: 0x2aaaac753f3e <ple_locator_set_mesh+0x29e> (libple.so.2)
3: 0x2aaaab085132 <+0x1b0132> (libsaturne-6.0.so)
4: 0x2aaaab086a41 <cs_syr4_coupling_init_mesh+0x51> (libsaturne-6.0.so)
5: 0x2aaaab08a292 <cs_syr_coupling_init_meshes+0x22> (libsaturne-6.0.so)
6: 0x2aaaaacd3942 <cs_run+0x5e2> (libcs_solver-6.0.so)
7: 0x2aaaaacd3225 <main+0x175> (libcs_solver-6.0.so)
8: 0x2aaaae103555 <__libc_start_main+0xf5> (libc.so.6)
9: 0x4017d9 <> (cs_solver)
Fin de la pile
I joined the listing files for fluid and solid domain as well as run_solver.log file.
Best regards,
Rodolphe
Attachments
listing_fluid.txt
(16.57 KiB) Downloaded 274 times
run_solver.log
(16.57 KiB) Downloaded 273 times
listing_solid.txt
(6.44 KiB) Downloaded 276 times
Yvan Fournier
Posts: 4173 Joined: Mon Feb 20, 2012 3:25 pm
Post
by Yvan Fournier » Mon Jun 21, 2021 9:47 am
Hello,
If the mesh on the Syrthes side is 3D and not 2D, do not force a projection in the coupling definition on the code_saturne side. The tutorial uses this projection because the solid mesh is 2D.
Best regards,
Yvan