Environment:
- Hardware: BlueField-2, model MBF2H516A-CEEOT
- OS: Linux version 5.15.0-1060-bluefield (buildd@bos03-arm64-114)
- DOCA SDK: 2.10.0087
Description:
I'm trying to run the doca_switch sample application as described in the official documentation:
https://docs.nvidia.com/doca/sdk/doca+switch+application+guide/index.html
I successfully launched the program using the following command:
./doca_switch -- -p 03:00.0 -r vf[0-1] -l 60
Once inside the switch program's CLI, I executed the following Command Set 1:
create fwd type=port,port_id=0xffff
create pipe port_id=0,name=p0_to_vf1,root_enable=1,fwd=1
create fwd type=port,port_id=1
add entry pipe_queue=0,fwd=1,pipe_id=1012
query entry_id=3415432416174012788
At the query entry_id step, the program crashes with a segmentation fault.
I used gdb to debug and found the crash occurs inside doca_flow_resource_query_entry() from /opt/mellanox/doca/lib/aarch64-linux-gnu/libdoca_flow.so.2.
The function takes two arguments: entry and stats.
I checked with print stats and print *stats in gdb—they are valid and readable.
entry also has a valid memory address, so I suspect the crash might be due to a NULL or improperly initialized field in entry.
To test this, I tried a more complete setup using Command Set 2:
create fwd type=port,port_id=0xffff
create pipe port_id=0,name=p0_to_vf1,root_enable=1,fwd=1
create entry_match outer.eth.src_mac=11:22:33:44:55:66,outer.eth.dst_mac=66:55:44:33:22:11
create actions encap_src_ip_type=ipv4
create monitor flags=0x3,cir=100,cbs=100
create fwd type=port,port_id=1
add entry pipe_id=16057732822350369178,pipe_queue=0,monitor=1,fwd=1
query entry_id=3415432416174012788
This also resulted in a segmentation fault at the same function.
In gdb, I printed actions and found it's uninitialized (all fields are zero).
This was the same in both command sets, which may be expected since I didn't assign values to actions.
However, monitor appears to be correctly passed and contains different values in each case, indicating it was successfully associated with the entry.
Question:
- Is this segmentation fault caused by incorrect command usage on my side?
- Or is this a bug in the library?
- How can I resolve or debug this further?