boot process hangs

Started by meo, September 27, 2023, 03:05:15 PM

Previous topic - Next topic

meo

hello!

We have strange boot problems with SOM A13-512 (Rev G.) boards. It seems that some boards are more affected than others.
The boot process hangs when the kernel should be loaded (serial logs attached).

Do you have an idea how we can track down the cause of this failure?

best regards

U-Boot SPL 2021.04+olimex-1-20211223.094223 (Dec 23 2021 - 09:43:25 +0000)
DRAM: 512 MiB
CPU: 1008000000Hz, AXI/AHB/APB: 3/2/2
A13 Board no eeprom found!
 PMIC not found board is A13-SOM-512
Trying to boot from MMC1


U-Boot 2021.04-00021-g618581895a-dirty (Feb 19 2023 - 12:06:19 +0100) Allwinner Technology

CPU:   Allwinner A13 (SUN5I)
ID:    A13-SOM-512 Rev.AI2C:   ready
DRAM:  512 MiB
MMC:   mmc@1c0f000: 0
Loading Environment from EXT4... *** Warning - bad CRC, using default environment

Loading Environment from FAT... ** No device specified **
In:    serial
Out:   serial
Err:   serial
Boot-delay: 2 sec
Net:   No ethernet found.
starting USB...
No working controllers found
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found U-Boot script /boot/boot.scr
2022 bytes read in 2 ms (987.3 KiB/s)
## Executing script at 43100000
Checking for /uEnv.txt...
Checking for /boot/uEnv.txt...
1105 bytes read in 2 ms (539.1 KiB/s)
Loaded environment from /boot/uEnv.txt
Loading FIT image...
13768363 bytes read in 1088 ms (12.1 MiB/s)
## Loading kernel from FIT Image at 58000000 ...
   Using 'config-4788' configuration
   Trying 'kernel-1' kernel subimage
     Description:  Linux kernel 5.10.105-olimex
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x580000d4
     Data Size:    4497976 Bytes = 4.3 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x40080000
     Entry Point:  0x40080000
     Hash algo:    crc32
     Hash value:   dde0af52
     Hash algo:    sha1
     Hash value:   6d550f71f5d8830984cf401ae797bd0ac8118a94
   Verifying Hash Integrity ... crc32+ sha1+ OK
## Loading ramdisk from FIT Image at 58000000 ...
   Using 'config-4788' configuration
   Trying 'ramdisk-1' ramdisk subimage
     Description:  Ramdisk for kernel 5.10.105-olimex
     Type:         RAMDisk Image
     Compression:  Unknown Compression
     Data Start:   0x5844a440
     Data Size:    9206714 Bytes = 8.8 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x4fe00000
     Entry Point:  0x4fe00000
     Hash algo:    crc32
     Hash value:   d37fb8b3
     Hash algo:    sha1
     Hash value:   fb50ef60d567af417a0ce925a4712ad7a1f6c48d
   Verifying Hash Integrity ... crc32+ sha1+ OK
   Loading ramdisk from 0x5844a440 to 0x4fe00000
WARNING: 'compression' nodes for ramdisks are deprecated, please fix your .its file!
## Loading fdt from FIT Image at 58000000 ...
   Using 'config-4788' configuration
   Trying 'fdt-3' fdt subimage
     Description:  unavailable
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0x58d1b774
     Data Size:    22680 Bytes = 22.1 KiB
     Architecture: ARM
     Load Address: 0x4fa00000
     Hash algo:    crc32
     Hash value:   d8f1ecd7
     Hash algo:    sha1
     Hash value:   52893f540ea34ff11a7e2386f6bec286aa9cad85
   Verifying Hash Integrity ... crc32+ sha1+ OK
   Loading fdt from 0x58d1b774 to 0x4fa00000
   Booting using the fdt blob at 0x4fa00000
   Loading Kernel Image
   Loading Ramdisk to 49738000, end 49fffbba ... OK
   Loading Device Tree to 4972f000, end 49737897 ... OK
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/act-led.dtbo'...
889 bytes read in 4 ms (216.8 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/meo.dtbo'...
2466 bytes read in 5 ms (481.4 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/mmc-led.dtbo'...
885 bytes read in 4 ms (215.8 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/rgb-led.dtbo'...
1797 bytes read in 4 ms (438.5 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/spi0-spidev.dtbo'...
408 bytes read in 3 ms (132.8 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/spi2_2cs.dtbo'...
1204 bytes read in 4 ms (293.9 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/sun5i-a13-spi0.dtbo'...
997 bytes read in 4 ms (243.2 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/sun5i-a13-uart2_cts_rst.dtbo'...
643 bytes read in 4 ms (156.3 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/sun5i-a13-uart3.dtbo'...
549 bytes read in 4 ms (133.8 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/w1-gpio.dtbo'...
765 bytes read in 4 ms (186.5 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/w5500-spi0.dtbo'...
1167 bytes read in 4 ms (284.2 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/w5500-spi2.dtbo'...
1259 bytes read in 4 ms (306.6 KiB/s)
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/meo-watchdog.dtbo'...
410 bytes read in 4 ms (99.6 KiB/s)

Starting kernel ...



LubOlimex

When the board stops suddenly without error messages I usually suspect the power is insufficient. First double and triple check the powering and the ground line and maybe try more powerful or better quality supply and ensure ground line is solid. Cabling and connections can affect the power availability to the board, if there is no sufficient contact area or bent cabling the power throughput might be limited. The main power line is especially important if you power other hardware trough the A13-SOM board.

Notice that some hardware might cause problems on the power line. Some boards powered off A13-SOM might have high current draw, some other devices can influence the board as a whole. For example, I've seen cases where boards won't boot due to the influence of poor USB-serial cable. Maybe try disconnecting one or more boards from your setup when testing a board that hangs on boot. Now I have few specific questions:

1. Did it happen previously with older revision boards or it appeared with revision G boards?

2. Does it happen every time with some of the boards? Aka if you try to boot 10 times certain boards will fail to boot 10 times. What is the percentage of occurrence?

3. Did you build own kernel? If you did build own kernel do you have LPAE enabled (it should be disabled)?

4. Can you identify few boards that experience that issue more frequently and try to replicate the issue with latest unmodified Olimage image, you can get it from here:

https://images.olimex.com/release/a13/

For this test disconnect the peripherals and see how it goes just for booting kernel. Repeat the boot up multiple times on boards that previously frequently stopped when using your image and having the peripherals attached. Keep a list of boards and tries and let us know of the results from this test.
Technical support and documentation manager at Olimex

meo

Hi!
Thanks for your response!

1. we only have rev G boards.
2. no, the are no boards, that always fail. The failure concerns about 30%-40% of the boards and they fail every 2-10th boot.
3. no, we did not modify the kernel. We created our image from A13-OLinuXino-bullseye-base-20230515-130040.img
4. OK, we will repeat the tests with an unmodified image.

meo

We also get an crc32 hash error sometimes:

Checking for /boot/uEnv.txt...
1210 bytes read in 3 ms (393.6 KiB/s)
Loaded environment from /boot/uEnv.txt
Loading FIT image...
13768363 bytes read in 1169 ms (11.2 MiB/s)
## Loading kernel from FIT Image at 58000000 ...
   Using 'config-4788' configuration
   Trying 'kernel-1' kernel subimage
     Description:  Linux kernel 5.10.105-olimex
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x580000d4
     Data Size:    4497976 Bytes = 4.3 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x40080000
     Entry Point:  0x40080000
     Hash algo:    crc32
     Hash value:   dde0af52
     Hash algo:    sha1
     Hash value:   6d550f71f5d8830984cf401ae797bd0ac8118a94
   Verifying Hash Integrity ... crc32 error!
Bad hash value for 'hash-1' hash node in 'kernel-1' image node
Bad Data Hash
ERROR: can't get kernel image!
SCRIPT FAILED: continuing...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
Scanning disk [email protected]...
Found 2 disks
No EFI system partition
Applying overlay: '/usr/lib/olinuxino-overlays/sun5i-a13/act-led.dtbo'...
889 bytes read in 4 ms (216.8 KiB/s)
No FDT memory address configured. Please configure
the FDT address via "fdt addr <address>" command.
Aborting!
Failed to apply overlay.
Restoring the original FDT blob...
BootOrder not defined
EFI boot manager: Cannot load any image

## Loading kernel from FIT Image at 58000000 ...
   Using 'config-4788' configuration
   Trying 'kernel-1' kernel subimage
     Description:  Linux kernel 5.10.105-olimex
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x580000d4
     Data Size:    4497976 Bytes = 4.3 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x40080000
     Entry Point:  0x40080000
     Hash algo:    crc32
     Hash value:   dde0af52
     Hash algo:    sha1
     Hash value:   6d550f71f5d8830984cf401ae797bd0ac8118a94
   Verifying Hash Integrity ... crc32 error!

   Using 'config-4788' configuration
   Trying 'kernel-1' kernel subimage
     Description:  Linux kernel 5.10.105-olimex
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x580000d4
     Data Size:    4497976 Bytes = 4.3 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x40080000
     Entry Point:  0x40080000
     Hash algo:    crc32
     Hash value:   dde0af52
     Hash algo:    sha1
     Hash value:   6d550f71f5d8830984cf401ae797bd0ac8118a94
   Verifying Hash Integrity ... crc32+ sha1+ OK

LubOlimex

Hmm, I found this about the integrity (too many packages in rootfs):

https://forum.openwrt.org/t/bpi-r3-bad-hash-value-for-hash-1-hash-node-in-rootfs-1-image-node/156393/14

If it is that, the problem should not appear if you use Olimex-made images.

But it can also be card corruption. Do you disconnect the power supply suddenly? Does your setup use back-up battery to prevent hardware shutdowns?
Technical support and documentation manager at Olimex

meo

You are right, the problem does not occur with the unmodified Olimex image! It must be a problem with our uboot loader.
We will try to create a new uboot config from scratch.
thank you!

LubOlimex

More like check the sizes, maybe you load too much stuff into a small parititon!
Technical support and documentation manager at Olimex