1

I am working with a fortran program (this repository), which I compile using the newest intel LLVM compiler. This works fine when I don't supply any flags, but when I compile with -DCMAKE_BUILD_TYPE=DEBUG, I get a segfault:

MemorySanitizer:DEADLYSIGNAL
==3432==ERROR: MemorySanitizer: SEGV on unknown address 0x000000000416 (pc 0x000000000416 bp 0x7ffd5d255845 sp 0x7ffd5d255798 T3432)
==3432==Hint: pc points to the zero page.
==3432==The signal is caused by a READ memory access.
==3432==Hint: address points to the zero page.
MemorySanitizer:DEADLYSIGNAL
MemorySanitizer: nested bug in the same thread, aborting.

When we use gcc or the legacy intel compiler, this works fine. gdb shows that this is related to the intel_mpi initialization:

(gdb) run
Starting program: /DAMASK/bin/DAMASK_grid
warning: Error disabling address space randomization: Operation not permitted
back[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
trace
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000416 in ?? ()
(gdb) backtrace
#0  0x0000000000000416 in ?? ()
#1  0x00007f6f3290db4f in ucm_bistro_apply_patch () from /lib/x86_64-linux-gnu/libucm.so.0
#2  0x00007f6f3290e191 in ucm_bistro_patch () from /lib/x86_64-linux-gnu/libucm.so.0
#3  0x00007f6f3290e3ea in ucm_mmap_install () from /lib/x86_64-linux-gnu/libucm.so.0
#4  0x00007f6f3290e792 in ucm_library_init () from /lib/x86_64-linux-gnu/libucm.so.0
#5  0x00007f6f316b2b13 in ?? () from /lib/x86_64-linux-gnu/libucs.so.0
#6  0x00007f6f4aa8e47e in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7ffcb20ecc58, env=env@entry=0x716000000000) at ./elf/dl-init.c:70
#7  0x00007f6f4aa8e568 in call_init (env=0x716000000000, argv=0x7ffcb20ecc58, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
#8  _dl_init (main_map=0x719000004b00, argc=1, argv=0x7ffcb20ecc58, env=0x716000000000) at ./elf/dl-init.c:117
#9  0x00007f6f32efbaf5 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:182
#10 0x00007f6f4aa95ff6 in dl_open_worker (a=0x7ffcb20e3d50) at ./elf/dl-open.c:808
#11 dl_open_worker (a=a@entry=0x7ffcb20e3d50) at ./elf/dl-open.c:771
#12 0x00007f6f32efba98 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#13 0x00007f6f4aa9634e in _dl_open (file=<optimized out>, mode=-2147483646, caller_dlopen=0x44ed61 <__interceptor_dlopen+401>, nsid=-2, argc=1, argv=<optimized out>,
    env=0x716000000000) at ./elf/dl-open.c:883
#14 0x00007f6f32e1763c in dlopen_doit (a=a@entry=0x7ffcb20e3fc0) at ./dlfcn/dlopen.c:56
#15 0x00007f6f32efba98 in __GI__dl_catch_exception (exception=exception@entry=0x7ffcb20e3f20, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#16 0x00007f6f32efbb63 in __GI__dl_catch_error (objname=0x7ffcb20e3f78, errstring=0x7ffcb20e3f80, mallocedp=0x7ffcb20e3f77, operate=<optimized out>, args=<optimized out>)
    at ./elf/dl-error-skeleton.c:227
#17 0x00007f6f32e1712e in _dlerror_run (operate=operate@entry=0x7f6f32e175e0 <dlopen_doit>, args=args@entry=0x7ffcb20e3fc0) at ./dlfcn/dlerror.c:138
#18 0x00007f6f32e176c8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71
#19 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
#20 0x000000000044ed61 in __interceptor_dlopen ()
#21 0x00007f6f3080a044 in ofi_reg_dl_prov () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#22 0x00007f6f3080a9e7 in fi_ini () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#23 0x00007f6f3080b3ac in fi_getinfo@@FABRIC_1.7 () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#24 0x00007f6f30811f09 in fi_getinfo@FABRIC_1.3 () from /opt/intel/oneapi/mpi/2021.13/opt/mpi/libfabric/lib/libfabric.so.1
#25 0x00007f6f34fed908 in find_provider (hints=0x708000000300) at ../../src/mpid/ch4/netmod/ofi/ofi_init.c:2904
#26 open_fabric () at ../../src/mpid/ch4/netmod/ofi/ofi_init.c:2725
#27 MPIDI_OFI_mpi_init_hook (rank=rank@entry=0, size=size@entry=1, appnum=appnum@entry=-1, tag_bits=tag_bits@entry=0x7ffcb20e5470,
    init_comm=init_comm@entry=0x7f6f3e0e86a0 <MPIR_Comm_direct>) at ../../src/mpid/ch4/netmod/ofi/ofi_init.c:1624
#28 0x00007f6f34cef375 in MPID_Init (requested=<optimized out>, provided=provided@entry=0x7f6f3e10a9a8 <MPIR_ThreadInfo>) at ../../src/mpid/ch4/src/ch4_init.c:1663
#29 0x00007f6f34f0edca in MPIR_Init_thread (argc=argc@entry=0x0, argv=argv@entry=0x0, user_required=<optimized out>, provided=provided@entry=0x7ffcb20e5a0c)
    at ../../src/mpi/init/initthread.c:191
#30 0x00007f6f34f0ea7d in PMPI_Init (argc=argc@entry=0x0, argv=argv@entry=0x0) at ../../src/mpi/init/init.c:143
#31 0x00007f6f3eaffecf in pmpi_init_ (ierr=0x7ffcb20e7dd8) at ../../src/binding/fortran/mpif_h/initf.c:275
#32 0x0000000000b31454 in parallelization::parallelization_init () at /DAMASK/src/parallelization.f90:77
#33 0x0000000000a5144d in materialpoint::materialpoint_initall () at /DAMASK/src/materialpoint.f90:48
#34 0x0000000001cbf2eb in damask_grid () at /DAMASK/src/grid/DAMASK_grid.f90:124

When I use valgrind, it just generates the following warnings and then crashes:

# valgrind bin/DAMASK_grid
==2232== Memcheck, a memory error detector
==2232== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2232== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2232== Command: bin/DAMASK_grid
==2232==
==2232== Warning: set address range perms: large range [0x10000000000, 0x100000000000) (defined)
==2232== Warning: set address range perms: large range [0x100000000000, 0x110000000000) (noaccess)
==2232== Warning: set address range perms: large range [0x110000000000, 0x200000000000) (defined)
Killed

However, when we use valgrind --tool=massif ./bin/DAMASK_grid, the program executes fine. What could be causing this large memory allocation? Is it safe to assume that this is the reason behind the subsequent segfault?

1 Answer 1

0

It's likely that you are just running out of memory and getting OOM killed. Valgrind substantially increases the amount of memory needed. Maybe consider using a machine with more RAM or increasing the swap.

The Valgrind warnings are just warnings. It's seeing large mmaps and telling you about them.

Try a more recent Valgrind.

Otherwise please log a bug at https://bugs.kde.org. More Valgrind developers will see it there.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.