BeagleBone Black freezes

4.1k views Asked by At

We are currently developing an application for BeagleBone Black (using the standard Angstrom distro). It runs quite happily for a while (5-10 minutes) under GDB (controlled by Netbeans remotely) but at some relatively random point in time will freeze - the heartbeat LEDs stop flickering and a complete reboot is required.

One possibility is that it is simply the number of (USB) devices that is causing this. We are connected by an FTDI serial link to my development PC (there is a client application that talks to my BBB server). There is a 4-way FTDI hub with multiple devices (3 currently) on that, a further single FTDI connection with another bit of hardware attached. Also, two I2C devices. Plus mouse and keyboard.

Of course I have no evidence other than hearsay that it's USB causing the problem. My software is not causing any signals, the log file tells me very little more. I have run the system monitor application to see if I'm leaking memory but it seems well behaved and stable (though CPU did creep up). I'd like to find a way to get to the bottom of what's failing, and would appreciate some assistance.

2

There are 2 answers

1
Julian Gold On BEST ANSWER

Finally, the bottom of the rabbit-hole:

http://e2e.ti.com/support/arm/sitara_arm/f/791/t/308549

It would appear that there is a problem in the TI silicon, specifically the interrupt controller, that causes a "babble" interrupt to fire when the USB gets overly busy. This causes an attempted reset of the host and the application correspondingly dies. This explains why the issue exists in both Angrstrom and Debian - it's not a stack / driver issue at all, but a problem with the TI chip. Ouch! We are probably going to have to drop BBB as our platform of choice because of this.

The output from the debug serial console confirms this to be the case for our application:

_handle_irq+0x39/0x58)
[  466.343796] [<c0008551>] (omap3_intc_handle_irq+0x39/0x58) from [<c045b95b>]
(__irq_svc+0x3b/0x5c)
[  466.359334] Exception stack(0xd2759cf8 to 0xd2759d40)
[  466.368332] 9ce0:                                                       00000000 c0849ac0
[  466.382735] 9d00: 00000000 00000000 c07a2080 00000000 d2758000 00000002 d2759db0 00000003
[  466.397178] 9d20: c0812610 d2758000 b405025a d2759d40 c0031241 c0030f4e 40000133 ffffffff
[  466.411686] [<c045b95b>] (__irq_svc+0x3b/0x5c) from [<c0030f4e>] (__do_softirq+0x46/0x174)
[  466.426346] [<c0030f4e>] (__do_softirq+0x46/0x174) from [<c0031241>] (irq_exit+0x29/0x50)
[  466.440833] [<c0031241>] (irq_exit+0x29/0x50) from [<c000c8cf>] (handle_IRQ+0x3f/0x5c)
[  466.454864] [<c000c8cf>] (handle_IRQ+0x3f/0x5c) from [<c0008551>]        (omap3_intc_handle_irq+0x39/0x58)
[  466.470777] [<c0008551>] (omap3_intc_handle_irq+0x39/0x58) from [<c045b95b>](__irq_svc+0x3b/0x5c)
[  466.486319] Exception stack(0xd2759db0 to 0xd2759df8)
[  466.495351] 9da0:                                     00000002 00000000 00007d00 00000000
[  466.509782] 9dc0: c07c81d0 c07c81d0 c07c75dc 00007d02 0000007d 00000003 c0812610 de5f4b40
[  466.524147] 9de0: 00000100 d2759df8 c0025b2d c0025bea 00000133 ffffffff
[  466.536019] [<c045b95b>] (__irq_svc+0x3b/0x5c) from [<c0025bea>] (omap3_noncore_dpll_set_rate+0x1f2/0x330)
[  466.553005] [<c0025bea>] (omap3_noncore_dpll_set_rate+0x1f2/0x330) from [<c0383273>]  (clk_change_rate+0x1b/0x52)
[  466.570813] [<c0383273>] (clk_change_rate+0x1b/0x52) from [<c03832fb>] (clk_set_rate+0x51/0x72)
[  466.586199] [<c03832fb>] (clk_set_rate+0x51/0x72) from [<c034ba29>] (cpu0_set_target+0xf9/0x198)
[  466.601754] [<c034ba29>] (cpu0_set_target+0xf9/0x198) from [<c0348c5d>] (__cpufreq_driver_target+0x4d/0x70)
[  466.618890] [<c0348c5d>] (__cpufreq_driver_target+0x4d/0x70) from [<c034b33b>] (dbs_check_cpu+0x123/0x134)
[  466.635897] [<c034b33b>] (dbs_check_cpu+0x123/0x134) from [<c034ad31>] (od_dbs_timer+0x4d/0xb0)
[  466.651283] [<c034ad31>] (od_dbs_timer+0x4d/0xb0) from [<c003c8c5>] (process_one_work+0x1b5/0x2c0)
[  466.667088] [<c003c8c5>] (process_one_work+0x1b5/0x2c0) from [<c003cca3>] (worker_thread+0x19b/0x258)
[  466.683355] [<c003cca3>] (worker_thread+0x19b/0x258) from [<c003fb8f>] (kthread+0x67/0x74)
[  466.698026] [<c003fb8f>] (kthread+0x67/0x74) from [<c000c0dd>] (ret_from_fork+0x11/0x34)
[  466.712148] drm_kms_helper: panic occurred, switching back to text console
[  407.924892] CAUTION: musb: Babble Interrupt Occurred
[  407.965570] CAUTION: musb: Babble Interrupt Occurred
[  408.026994]  gadget: high-speed config #1: Multifunction with RNDIS
[  413.918684] musb_g_ep0_irq 710: SetupEnd came in a wrong ep0stage wait
1
Julian Gold On

So it looks like plugging a mouse into a USB hub and sticking that on a BBB can cause this problem if there are other devices on the hub doing IO. A colleague informs me there are issues with such things on Raspberry Pi too. Having unplugged the mouse, the software ran for well over an hour with no freeze. Plugging it back in, there was a freeze after about 10 minutes. Removing the mouse, running again, and it's been going for half an hour again with no issue.