0

We are using FPGA cards with PCI express drivers to move data around with DMA engines. This all works fine for a single card in a machine, however with two cards it fails. As an initial investigation, I have narrowed an error down to the add_timer function that is used to set up the polling mechanism. When insmod adds the driver modules, a stack trace is produced as the poll_timer routine is the same for both instances. The code has been reduced to

static int  dat_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
    struct timer_list * timer = &poll_timer;
    int i;

    /* Start polling routine */
    log_normal(KERN_INFO "DEBUG ADD TIMER: Starting poll routine with %x\n", pdev);
    init_timer(timer);

    // random number added so that expires value is different for both instances of timer
    get_random_bytes(&i, 1);
    timer->expires=jiffies+HZ+i;
    timer->data=(unsigned long) pdev;
    timer->function = poll_routine;

    log_verbose("DEBUG ADD TIMER: Timer expires %x\n", timer->expires);
    log_verbose("DEBUG ADD TIMER: Timer data %x\n", timer->data);
    log_verbose("DEBUG ADD TIMER: Timer function %x\n", timer->function);

    // ***** THIS IS WHERE STACK TRACE OCCURS (WHEN CALLED FOR SECOND TIME)
    add_timer(timer);

    log_verbose("DEBUG ADD TIMER: Value of HZ is %d\n", HZ);
    log_verbose("DEBUG ADD TIMER: End of probe\n");

    return 0;
}

the stack trace produces
list_add corruption. prev->next should be next (ffffffff81f76228), but was (null). (prev=ffffffffa050a3c0). and
list_add double add: new=ffffffffa050a3c0, prev=ffffffffa050a3c0, next=ffffffff81f76228.

Looking at the printk statements, it is clear that the add_timer is trying to add the same routine to the linked list. Is this correct?

DEBUG ADD TIMER: Timer expires fffd9cd3
DEBUG ADD TIMER: Timer data 6c0ac000
DEBUG ADD TIMER: Timer function **a0508150**
DEBUG ADD TIMER: Value of HZ is 1000
DEBUG ADD TIMER: End of probe
DEBUG ADD TIMER: Starting poll routine with 6c0ad000
DEBUG ADD TIMER: Timer expires fffd9c7d
DEBUG ADD TIMER: Timer data 6c0ad000
DEBUG ADD TIMER: Timer function **a0508150**

So my question(s) is(are), how should I configure the timer for multiple instantations of the same driver? (Assuming that is what is happening when multiple boards are inserted into the machine).

full stack trace

DEBUG ADD TIMER: Inserting driver into kernel.
DEBUG ADD TIMER: Starting poll routine with 6c0ac000
DEBUG ADD TIMER: Timer expires fffd9cd3
DEBUG ADD TIMER: Timer data 6c0ac000
DEBUG ADD TIMER: Timer function a0508150
DEBUG ADD TIMER: Value of HZ is 1000
DEBUG ADD TIMER: End of probe
DEBUG ADD TIMER: Starting poll routine with 6c0ad000
DEBUG ADD TIMER: Timer expires fffd9c7d
DEBUG ADD TIMER: Timer data 6c0ad000
DEBUG ADD TIMER: Timer function a0508150
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2201 at lib/list_debug.c:33 __list_add+0xa0/0xd0()
list_add corruption. prev->next should be next (ffffffff81f76228), but was           (null). (prev=ffffffffa050a3c0).
Modules linked in: xdma_v7(POE+) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crc32c_intel eeepc_wmi ghash_clmulni_intel asus_wmi ftdi_sio iTCO_wdt snd_hda_codec sparse_keymap raid0 iTCO_vendor_support
 snd_hda_core rfkill sb_edac ipmi_ssif video mxm_wmi edac_core snd_hwdep mei_me snd_seq snd_seq_device ipmi_msghandler snd_pcm mei acpi_pad tpm_infineon lpc_ich mfd_core snd_timer tpm_tis shpchp tpm snd soundcore i2c_i801 wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb serio_raw ptp pps_core dca i2c_algo_bit
CPU: 0 PID: 2201 Comm: insmod Tainted: P           OE   4.1.8-100.fc21.x86_64 #1
Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8 WS/Z10PE-D8 WS, BIOS 1001 03/17/2015
 0000000000000000 00000000ec73155d ffff880457123928 ffffffff81792065
 0000000000000000 ffff880457123980 ffff880457123968 ffffffff810a163a
 0000000000000246 ffffffffa050a3c0 ffffffff81f76228 ffffffffa050a3c0
Call Trace:
 [<ffffffff81792065>] dump_stack+0x45/0x57
 [<ffffffff810a163a>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff810a16c5>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff810f8250>] ? vprintk_emit+0x3b0/0x560
 [<ffffffff813c7c30>] __list_add+0xa0/0xd0
 [<ffffffff81108412>] __internal_add_timer+0xb2/0x130
 [<ffffffff811084bf>] internal_add_timer+0x2f/0xb0
 [<ffffffff8110a1ca>] mod_timer+0x12a/0x210
 [<ffffffff8110a2c8>] add_timer+0x18/0x30
 [<ffffffffa050810f>] dat_probe+0xbf/0x100 [xdma_v7]
 [<ffffffff813f6da5>] local_pci_probe+0x45/0xa0
 [<ffffffff812a8da2>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0
 [<ffffffff813f8109>] pci_device_probe+0xf9/0x150
 [<ffffffff814e7e59>] driver_probe_device+0x209/0x4b0
 [<ffffffff814e81db>] __driver_attach+0x9b/0xa0
 [<ffffffff814e8140>] ? __device_attach+0x40/0x40
 [<ffffffff814e5973>] bus_for_each_dev+0x73/0xc0
 [<ffffffff814e772e>] driver_attach+0x1e/0x20
 [<ffffffff814e72e0>] bus_add_driver+0x180/0x250
 [<ffffffffa000a000>] ? 0xffffffffa000a000
 [<ffffffff814e89d4>] driver_register+0x64/0xf0
 [<ffffffff813f662c>] __pci_register_driver+0x4c/0x50
 [<ffffffffa000a02c>] dat_init+0x2c/0x1000 [xdma_v7]
 [<ffffffff81002148>] do_one_initcall+0xd8/0x210
 [<ffffffff812094f9>] ? kmem_cache_alloc_trace+0x1a9/0x230
 [<ffffffff817911bc>] ? do_init_module+0x28/0x1cc
 [<ffffffff817911f5>] do_init_module+0x61/0x1cc
 [<ffffffff811270bb>] load_module+0x20db/0x2550
 [<ffffffff81122990>] ? store_uevent+0x70/0x70
 [<ffffffff8122e860>] ? kernel_read+0x50/0x80
 [<ffffffff81127766>] SyS_finit_module+0xa6/0xe0
 [<ffffffff8179892e>] system_call_fastpath+0x12/0x71
---[ end trace 340e5d7ba2d89081 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2201 at lib/list_debug.c:36 __list_add+0xcb/0xd0()
list_add double add: new=ffffffffa050a3c0, prev=ffffffffa050a3c0, next=ffffffff81f76228.
Modules linked in: xdma_v7(POE+) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crc32c_intel eeepc_wmi ghash_clmulni_intel asus_wmi ftdi_sio iTCO_wdt snd_hda_codec sparse_keymap raid0 iTCO_vendor_support
 snd_hda_core rfkill sb_edac ipmi_ssif video mxm_wmi edac_core snd_hwdep mei_me snd_seq snd_seq_device ipmi_msghandler snd_pcm mei acpi_pad tpm_infineon lpc_ich mfd_core snd_timer tpm_tis shpchp tpm snd soundcore i2c_i801 wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb serio_raw ptp pps_core dca i2c_algo_bit
CPU: 0 PID: 2201 Comm: insmod Tainted: P        W  OE   4.1.8-100.fc21.x86_64 #1
Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8 WS/Z10PE-D8 WS, BIOS 1001 03/17/2015
 0000000000000000 00000000ec73155d ffff880457123928 ffffffff81792065
 0000000000000000 ffff880457123980 ffff880457123968 ffffffff810a163a
 0000000000000246 ffffffffa050a3c0 ffffffff81f76228 ffffffffa050a3c0
Call Trace:
 [<ffffffff81792065>] dump_stack+0x45/0x57
 [<ffffffff810a163a>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff810a16c5>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff810f8250>] ? vprintk_emit+0x3b0/0x560
 [<ffffffff813c7c5b>] __list_add+0xcb/0xd0
 [<ffffffff81108412>] __internal_add_timer+0xb2/0x130
 [<ffffffff811084bf>] internal_add_timer+0x2f/0xb0
 [<ffffffff8110a1ca>] mod_timer+0x12a/0x210
 [<ffffffff8110a2c8>] add_timer+0x18/0x30
 [<ffffffffa050810f>] dat_probe+0xbf/0x100 [xdma_v7]
 [<ffffffff813f6da5>] local_pci_probe+0x45/0xa0
 [<ffffffff812a8da2>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0
 [<ffffffff813f8109>] pci_device_probe+0xf9/0x150
 [<ffffffff814e7e59>] driver_probe_device+0x209/0x4b0
 [<ffffffff814e81db>] __driver_attach+0x9b/0xa0
 [<ffffffff814e8140>] ? __device_attach+0x40/0x40
 [<ffffffff814e5973>] bus_for_each_dev+0x73/0xc0
 [<ffffffff814e772e>] driver_attach+0x1e/0x20
 [<ffffffff814e72e0>] bus_add_driver+0x180/0x250
 [<ffffffffa000a000>] ? 0xffffffffa000a000
 [<ffffffff814e89d4>] driver_register+0x64/0xf0
 [<ffffffff813f662c>] __pci_register_driver+0x4c/0x50
 [<ffffffffa000a02c>] dat_init+0x2c/0x1000 [xdma_v7]
 [<ffffffff81002148>] do_one_initcall+0xd8/0x210
 [<ffffffff812094f9>] ? kmem_cache_alloc_trace+0x1a9/0x230
 [<ffffffff817911bc>] ? do_init_module+0x28/0x1cc
 [<ffffffff817911f5>] do_init_module+0x61/0x1cc
 [<ffffffff811270bb>] load_module+0x20db/0x2550
 [<ffffffff81122990>] ? store_uevent+0x70/0x70
 [<ffffffff8122e860>] ? kernel_read+0x50/0x80
 [<ffffffff81127766>] SyS_finit_module+0xa6/0xe0
 [<ffffffff8179892e>] system_call_fastpath+0x12/0x71
---[ end trace 340e5d7ba2d89082 ]---
DEBUG ADD TIMER: Value of HZ is 1000
DEBUG ADD TIMER: End of probe
4
  • init_timer, like many other initialization functions, is not thread-safe (it initializes lock object, which will be used in others, thread-safe operations on the timer). Move init_timer to module's initialization code, and then use mod_timer for modify timeout instead of add_timer. Commented Feb 19, 2016 at 13:47
  • Many thanks, however after modifying my code and trying this out, I am not sure it is the solution. It became apparent that __init is only called once when the module is inserted, but we need two instances of the timer, one for each of the active modules. Commented Feb 22, 2016 at 11:39
  • So you need own instance of the timer for each module, like @Ian Abbott answers. Commented Feb 22, 2016 at 20:24
  • Yes.. Somehow his post got overlooked when posting my initial reply - and I didn't get any emails for the updates. Commented Feb 23, 2016 at 16:49

1 Answer 1

0

The problem is that the second call to dat_probe is clobbering the poll_timer variable that was initialized and queued by the first call to dat_probe. You are clobbering the pointers in the kernel's timer list.

You need to get rid of the poll_timer variable and give each device its own dynamically allocated private data structure containing its own struct timer_list member. Call pci_set_drvdata to set the private data pointer for the PCI device. The other PCI driver functions can call pci_get_drvdata to retrieve that pointer.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.