I am working with a fortran program (this repository), which I compile using the newest intel LLVM compiler. This works fine when I don't supply any flags, but when I compile with -DCMAKE_BUILD_TYPE=DEBUG, I get a segfault:
MemorySanitizer:DEADLYSIGNAL
==3432==ERROR: MemorySanitizer: SEGV on unknown address 0x000000000416 (pc 0x000000000416 bp 0x7ffd5d255845 sp 0x7ffd5d255798 T3432)
==3432==Hint: pc points to the zero page.
==3432==The signal is caused by a READ memory access.
==3432==Hint: address points to the zero page.
MemorySanitizer:DEADLYSIGNAL
MemorySanitizer: nested bug in the same thread, aborting.
When we use gcc or the legacy intel compiler, this works fine. gdb shows that this is related to the intel_mpi initialization:
(gdb) run
Starting program: /DAMASK/bin/DAMASK_grid
warning: Error disabling address space randomization: Operation not permitted
back[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
trace
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000416 in ?? ()
(gdb) backtrace
#0 0x0000000000000416 in ?? ()
#1 0x00007f6f3290db4f in ucm_bistro_apply_patch () from /lib/x86_64-linux-gnu/libucm.so.0
#2 0x00007f6f3290e191 in ucm_bistro_patch () from /lib/x86_64-linux-gnu/libucm.so.0
#3 0x00007f6f3290e3ea in ucm_mmap_install () from /lib/x86_64-linux-gnu/libucm.so.0
#4 0x00007f6f3290e792 in ucm_library_init () from /lib/x86_64-linux-gnu/libucm.so.0
#5 0x00007f6f316b2b13 in ?? () from /lib/x86_64-linux-gnu/libucs.so.0
#6 0x00007f6f4aa8e47e in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffcb20ecc58, env=env@entry=0x716000000000) at ./elf/dl-init.c:70
#7 0x00007f6f4aa8e568 in call_init (env=0x716000000000, argv=0x7ffcb20ecc58, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
#8 _dl_init (main_map=0x719000004b00, argc=1, argv=0x7ffcb20ecc58, env=0x716000000000) at ./elf/dl-init.c:117
#9 0x00007f6f32efbaf5 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:182
#10 0x00007f6f4aa95ff6 in dl_open_worker (a=0x7ffcb20e3d50) at ./elf/dl-open.c:808
#11 dl_open_worker (a=a@entry=0x7ffcb20e3d50) at ./elf/dl-open.c:771
#12 0x00007f6f32efba98 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#13 0x00007f6f4aa9634e in _dl_open (file=<optimized out>, mode=-2147483646, caller_dlopen=0x44ed61 <__interceptor_dlopen+401>, nsid=-2, argc=1, argv=<optimized out>,
env=0x716000000000) at ./elf/dl-open.c:883
#14 0x00007f6f32e1763c in dlopen_doit (a=a@entry=0x7ffcb20e3fc0) at ./dlfcn/dlopen.c:56
#15 0x00007f6f32efba98 in __GI__dl_catch_exception (exception=exception@entry=0x7ffcb20e3f20, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#16 0x00007f6f32efbb63 in __GI__dl_catch_error (objname=0x7ffcb20e3f78, errstring=0x7ffcb20e3f80, mallocedp=0x7ffcb20e3f77, operate=<optimized out>, args=<optimized out>)
at ./elf/dl-error-skeleton.c:227
#17 0x00007f6f32e1712e in _dlerror_run (operate=operate@entry=0x7f6f32e175e0 <dlopen_doit>, args=args@entry=0x7ffcb20e3fc0) at ./dlfcn/dlerror.c:138
#18 0x00007f6f32e176c8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71
#19 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
#20 0x000000000044ed61 in __interceptor_dlopen ()
#21 0x00007f6f3080a044 in ofi_reg_dl_prov () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#22 0x00007f6f3080a9e7 in fi_ini () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#23 0x00007f6f3080b3ac in fi_getinfo@@FABRIC_1.7 () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#24 0x00007f6f30811f09 in fi_getinfo@FABRIC_1.3 () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#25 0x00007f6f34fed908 in find_provider (hints=0x708000000300) at ../../src/mpid/ch4/netmod/ofi/ofi_init.c:2904
#26 open_fabric () at ../../src/mpid/ch4/netmod/ofi/ofi_init.c:2725
#27 MPIDI_OFI_mpi_init_hook (rank=rank@entry=0, size=size@entry=1, appnum=appnum@entry=-1, tag_bits=tag_bits@entry=0x7ffcb20e5470,
init_comm=init_comm@entry=0x7f6f3e0e86a0 <MPIR_Comm_direct>) at ../../src/mpid/ch4/netmod/ofi/ofi_init.c:1624
#28 0x00007f6f34cef375 in MPID_Init (requested=<optimized out>, provided=provided@entry=0x7f6f3e10a9a8 <MPIR_ThreadInfo>) at ../../src/mpid/ch4/src/ch4_init.c:1663
#29 0x00007f6f34f0edca in MPIR_Init_thread (argc=argc@entry=0x0, argv=argv@entry=0x0, user_required=<optimized out>, provided=provided@entry=0x7ffcb20e5a0c)
at ../../src/mpi/init/initthread.c:191
#30 0x00007f6f34f0ea7d in PMPI_Init (argc=argc@entry=0x0, argv=argv@entry=0x0) at ../../src/mpi/init/init.c:143
#31 0x00007f6f3eaffecf in pmpi_init_ (ierr=0x7ffcb20e7dd8) at ../../src/binding/fortran/mpif_h/initf.c:275
#32 0x0000000000b31454 in parallelization::parallelization_init () at /DAMASK/src/parallelization.f90:77
#33 0x0000000000a5144d in materialpoint::materialpoint_initall () at /DAMASK/src/materialpoint.f90:48
#34 0x0000000001cbf2eb in damask_grid () at /DAMASK/src/grid/DAMASK_grid.f90:124
When I use valgrind, it just generates the following warnings and then crashes:
# valgrind bin/DAMASK_grid
==2232== Memcheck, a memory error detector
==2232== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2232== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2232== Command: bin/DAMASK_grid
==2232==
==2232== Warning: set address range perms: large range [0x10000000000, 0x100000000000) (defined)
==2232== Warning: set address range perms: large range [0x100000000000, 0x110000000000) (noaccess)
==2232== Warning: set address range perms: large range [0x110000000000, 0x200000000000) (defined)
Killed
However, when we use valgrind --tool=massif ./bin/DAMASK_grid, the program executes fine.
What could be causing this large memory allocation? Is it safe to assume that this is the reason behind the subsequent segfault?