I’ve been hit by the “periodic boot failure” issue of the Beaglebone Black (aka BBB) reported by quite a few on the net. For most users this is an inconvenient annoyance, but for people, like me, using the platform in embedded applications, this issue causes a serious stability issue of the whole system, when 100% reliable boot is not achievable.
After having hunches about instability caused by the intermittent experiences during development, where the board was seen failing boot on power on, and not getting much more help from the net than “try the recommended power supply” (which btw. I can’t use because I live in a country where main sockets are non-US, even a bit non-European standard) I decided to make a systematic test to get the basic facts straight.
I settled on trying to establish some reasonable statistics about the error’s frequency on plain BBBs to have a reference against testing whether a theory put forward on the mailinglist (here and here) about the uboot bootloader being confused by noise being interpreted as valid data on UART0_RXD (pin E15) of the AM3358x (see near U15 page 4, of the BBB REV B schematics) as the cause of the failure.
Results
This is the results (test report detailed below), I’m posting a writeup in the Beagleboard mailinglist (Edit: my post here), so hopefully you’ll find further discussion about this issue there soon.
Failure rates
Plain BBB (DUT#2+3), Element14 & mbest branded
- DUT#2: Element14 branded: 4 fails/ 120 boots = 1/30 = 3.33%
- DUT#3: mbest branded: 2 fail / 40 boots = 1/20 = 5.00
- Overall: 6 fails / 160 boots = 3/80 = 3.75%
Modified BBB (DUT#1), CircuitCo branded
- U15 removed, pull down on UART0_RXD : 0 fails / 40 boots = 0.00%
- U15 removed, no pull down on UART0_RX: 0 fails / 40 boots = 0.00%
- Overall: 0 fails / 80 boots = 0.00%
Interpretation
The two differing Element14 branded BBB products I have access to, but both PCB REV B6, exhibits a somewhat varying boot failure rate. But overall the boards fail to boot in almost 4 of 100 boots.
Investigating the theory relating to noise on UART0_RXD seems to have paid off, as first removing U15 (SN74LVC2G241: Dual Buffer/Driver With 3-State Outputs) for the purpose of adding a pull down on its pad 6 (which is connected to UART0_RXD) alleviated the problem altogether. But also the experiment of removing the pull down and redo the test, showed that the act of removing U15 itself caused the boot to always succeed.
Unfortunately, in hindsight, I was too quick to grab the soldering iron, because I should have verified and quantized the occurrence of the failure on the actual board being modified. A shame It didn’t occur to me before modification, but I’ll be more than willing to try to remove U15 on DUT#2, which has had the highest failure rate, if discussions prove that it is a reasonable theory of the root cause of the failure. That is why I continued testing it through to 120 boots, to get more samples for improving statistics in the event that I pull U15 from it later.
Effect of removing U15
The success of removing U15 could be caused by the now floating AM3388 input UART0_RXD (pin E15) which presumably has a default weak internal pull up/down (the AM335x TRM says reset value is pad-dependent (register conf_uart0_rxd in Table 9.7 p. 1366 and Section 9.3.1.50 p. 1420), which I haven’t yet figured out the exact meaning of) stabilizing the signal
The activity when U15 is in place is somehow exhibited on output 1Y (pin 6) , probably because it is not stable, and thus has an erratic state, during the first moments of the chips power up sequence. This erratic behaviour can in fortunate/unfortunate circumstances be interpreted as valid bits and resulting bytes by the uart rxd cirtcuitry, which also can happen to be latched into the uart fifo rx buffer, waiting for uboot to read them when its code is executed looking for a user interrupt.
I’ll put in the disclaimer on this thesis, that I haven’t yet studied U15 in detail, but it is advertised as both a level converter, ESD protection and power live-insertion/partial-power-down suggesting it does something in reaction to it’s power condition.
Also the recommendation on page 1 of its datasheet; “To ensure the high-impedance state during power up or power down, OE (active low) should be tied to VCC through a pullup resistor, and OE (active high) should be tied to GND through a pulldown resistor”, seems not to be followed in the BBB circutry, as the OEs are hardwired in to be always active (opposite of the recommendation in the power up/down condition). If this is actually a problem, I need to do further analysis to establish.
Pictures
-
-
All the DUTs.
-
-
Jentec PSU used (from D-Link USB hub).
-
-
U15 removed, pull down added.
-
-
Closeup of U15 removed, pull down added.
Detailed test report
(formatted in nice emacs org-mode)
* BBB boot lockup test report
** Device under test #1:
Modified Beaglebone Black produced by CircuitCo (PCB REV B6, serial 007142901445, marked
"beaglebone"+ beagle logo and "beagleboard.org").
Modified by adding hard pulldown resistor on TI AM3358 pin E15 (uart0 rx). Specifically
U15 was removed and terminal pin 6 (1Y=UART0_RX) was shorted to J1 pin 1 through a 82k5
ohm resistor.
** Device under test #2:
Unmodified Beaglebone Black (BBB) produced by Element 14 (PCB REV B6, serial
EM-400524+XA6001961, marked "Element 14").
** Device under test #3:
Unmodified Beaglebone Black (BBB) produced by mbest (PCB REV B6, serial
EM-400441+XA3001688).
** Power supply
Jentec Technologies CF1805-E, output 5V 3A. Danish plug.
Sourced from a D-Link DUB-H7 USB 2.0 HUB.
** Test 1 procedure
Each Beaglebone was tested by consequtively applying power by inserting the
plug into the mains socket while keeping the DC barrel connector inserted and
verifying that the power led light up, and then noting whether boot from SD-card
succeeded or failed. Then removing the PSU from the mains connector waiting 5
seconds and repeat.
The power supply and SD-card used was the same for all three DUTs.
Results can be seen in section Test 1 results.
** Test 2 procedure
After a short analysis of test 1 results I decided to try to remove the resistor,
to see if the behavious was restored.
Otherwise test procedure was identical to test 1.
Results can be seen in section Test 2 results.
** Test 1 results (pull down and reference boards)
| Boot no | DUT#1 | DUT#2 | DUT#3 | Note |
| 1 | boot | boot | boot | |
| 2 | boot | boot | boot | |
| 3 | boot | boot | boot | |
| 4 | boot | boot | boot | |
| 5 | boot | boot | boot | |
| 6 | boot | boot | boot | |
| 7 | boot | boot | boot | |
| 8 | boot | boot | boot | |
| 9 | boot | boot | boot | |
| 10 | boot | boot | boot | |
| 11 | boot | no boot | boot | DUT#2: pwr sw=lock, rst sw=boot |
| 12 | boot | boot | boot | |
| 13 | boot | boot | boot | |
| 14 | boot | boot | boot | |
| 15 | boot | boot | boot | |
| 16 | boot | boot | boot | |
| 17 | boot | boot | boot | |
| 18 | boot | boot | boot | |
| 19 | boot | boot | boot | |
| 20 | boot | no boot | boot | DUT#2: pwr sw=no boot, rst sw=boot |
| 21 | boot | boot | boot | |
| 22 | boot | boot | boot | |
| 23 | boot | boot | boot | |
| 24 | boot | boot | boot | |
| 25 | boot | boot | boot | |
| 26 | boot | boot | boot | |
| 27 | boot | boot | boot | |
| 28 | boot | boot | boot | |
| 29 | boot | boot | boot | |
| 30 | boot | boot | boot | DUT#3: pause before comencing test 31 |
| 31 | boot | boot | no boot | DUT#3: pwr sw=no boot, rst sw=boot |
| 32 | boot | no boot | boot | DUT#2: pwr sw=no boot, rst sw=boot |
| 33 | boot | boot | boot | |
| 34 | boot | boot | boot | |
| 35 | boot | boot | boot | |
| 36 | boot | boot | no boot | DUT#3: pwr sw=no boot, rst sw=boot |
| 37 | boot | boot | boot | |
| 38 | boot | boot | boot | |
| 39 | boot | boot | boot | |
| 40 | boot | boot | boot | |
| 41 | | boot | | |
| 42 | | boot | | |
| 43 | | boot | | |
| 44 | | boot | | |
| 45 | | boot | | |
| 46 | | boot | | |
| 47 | | boot | | |
| 48 | | boot | | |
| 49 | | boot | | |
| 50 | | boot | | |
| 51 | | boot | | |
| 52 | | boot | | |
| 53 | | boot | | |
| 53 | | boot | | |
| 54 | | boot | | |
| 55 | | boot | | |
| 56 | | boot | | |
| 57 | | boot | | |
| 58 | | boot | | |
| 59 | | boot | | |
| 60 | | boot | | |
| 61 | | boot | | |
| 62 | | boot | | |
| 63 | | boot | | |
| 64 | | boot | | |
| 65 | | boot | | |
| 66 | | boot | | |
| 67 | | boot | | |
| 68 | | boot | | |
| 69 | | boot | | |
| 70 | | boot | | |
| 71 | | boot | | |
| 72 | | boot | | |
| 73 | | boot | | |
| 74 | | boot | | |
| 75 | | boot | | |
| 76 | | boot | | |
| 77 | | boot | | |
| 78 | | boot | | |
| 79 | | boot | | |
| 80 | | boot | | |
| 81 | | boot | | |
| 82 | | boot | | |
| 83 | | boot | | |
| 84 | | boot | | |
| 85 | | boot | | |
| 86 | | boot | | |
| 87 | | boot | | |
| 88 | | boot | | |
| 89 | | boot | | |
| 90 | | boot | | |
| 91 | | boot | | |
| 92 | | boot | | |
| 93 | | boot | | |
| 94 | | boot | | |
| 95 | | boot | | |
| 96 | | boot | | |
| 97 | | boot | | |
| 98 | | boot | | |
| 99 | | boot | | |
| 100 | | boot | | |
| 101 | | boot | | |
| 102 | | boot | | |
| 103 | | boot | | |
| 104 | | boot | | |
| 105 | | boot | | |
| 106 | | no boot | | |
| 107 | | boot | | |
| 108 | | boot | | |
| 109 | | boot | | |
| 110 | | boot | | |
| 111 | | boot | | |
| 112 | | boot | | |
| 113 | | boot | | |
| 114 | | boot | | |
| 115 | | boot | | |
| 116 | | boot | | |
| 117 | | boot | | |
| 118 | | boot | | |
| 119 | | boot | | |
| 120 | | boot | | |
General DUT#3 behaviour: slower boot, pause after power on, and visible delay while
lighting USRLED1-3 until SD-card boots. Might be caused by a different uboot edition
than DUT#1 and DUT#2.
** Test 2 results (DUT#1 pulldown removed)
| Boot no. | DUT#1 |
| 1 | boot |
| 2 | boot |
| 3 | boot |
| 4 | boot |
| 5 | boot |
| 6 | boot |
| 7 | boot |
| 8 | boot |
| 9 | boot |
| 10 | boot |
| 11 | boot |
| 12 | boot |
| 13 | boot |
| 14 | boot |
| 15 | boot |
| 16 | boot |
| 17 | boot |
| 18 | boot |
| 19 | boot |
| 20 | boot |
| 21 | boot |
| 22 | boot |
| 23 | boot |
| 24 | boot |
| 25 | boot |
| 26 | boot |
| 27 | boot |
| 28 | boot |
| 29 | boot |
| 30 | boot |
| 31 | boot |
| 32 | boot |
| 33 | boot |
| 34 | boot |
| 35 | boot |
| 36 | boot |
| 37 | boot |
| 38 | boot |
| 39 | boot |
| 40 | boot |