My name is Philipp C. Heckel and I write about nerdy things.
This site moved here recently from blog.philippheckel.com!

USB disk causes blinking cursor at boot; how to “fix” the MBR bootstrap code


Linux, Virtualization

USB disk causes blinking cursor at boot; how to “fix” the MBR bootstrap code


Have you ever rebooted your computer only to see a black screen with a blinking cursor? If you have a USB drive attached, chances are the blinking cursor is caused by invalid bootstrap code in the Master Boot Record (MBR) on that drive which has caused the normal boot execution to stop without returning control to the BIOS. If you have physical access to the machine, simply remove the USB drive and/or change the boot order to pick the OS disk first.

If you have no physical access, things are a bit more tricky: This exact thing happened to me at work the other day. Unfortunately, it didn’t happen to my computer, but to a few dozen of our customer backup appliances during their scheduled upgrade/reboot. Now, while dozens out of over 60k isn’t that much, our customers rely on these devices, so it’s not acceptable to have them not boot properly.

In this short post, I’ll demonstrate how to reproduce the blinking cursor problem, and how to “fix” the MBR to ensure the computer still boots, regardless of the boot order.


Contents


1. Diagnosing a blinking cursor

Assuming that your system is booting via BIOS (not UEFI), you’ve surely encountered the infamous blinking cursor at least once:

When we encountered it a few months ago when a few of our customer devices refused to reboot, we were a bit puzzled at first. How could one possibly diagnose something like that without having physical access to the device itself? With lots of on site customer help, we quickly narrowed things down to an attached USB flash drive, and we immediately suspected that the BIOS is trying to boot from that drive.

But how was that possible? None of the partitions in the MBR on that drive was marked bootable, and there wasn’t even a bootable OS on the USB drive. How does the BIOS decide which partition to boot from anyway? Won’t it just look at the MBR partition table and pick the first bootable partition?

No. Apparently not.

As we found out quickly, the BIOS in fact doesn’t really care about the partition table at all. All it does is to list all attached drives, and go through them one by one according to the boot order. If it finds a drive with an MBR signature 0x55aa at offset 0x1fe(see image below in green), it simply begins executing whatever code resides at offset 0x00 of the disk (in red):

Wow isn’t that surprising? It surely was to me. I thought the BIOS was supposed to be a bit smarter than that. What that means is that really any disk in a boot position before your OS drive can render your computer unbootable if its MBR bootstrap code is not properly programmed, or has been overwritten with garbage.

In our case, we found that certain tools are apparently under the impression that the first few bytes of a disk are “unused”, because they don’t contain the partition table and use this area to store their own metadata.

Since these first 446 bytes do in fact contain the MBR bootstrap code, they are incredibly important to the BIOS during the boot process. If whatever is stored there is not valid x86 code, the BIOS will fail to execute it and drop to the blinking cursor of doom.

2. Reproducing the problem

Now that we knew what the problem is we needed a quick way to reproduce it, so that we could be certain whatever fix we developed would work. The easiest way to do this is by using a virtual machine in KVM/QEMU, a live Linux ISO and a raw disk image with “broken” or non-existent MBR bootstrap code.

First, download a live Linux distro. Something like Tiny Core Linux is more than enough for now:

Then, we’ll create our “broken” USB disk: To do that, we’ll create a 1 MB sparse file using truncate -s 1M and we’ll create an MBR with a single partition using fdisk. I was a bit lazy here and shortened the fdisk command by using it non-interactively like that; but you can do all of this via the interactive console: o creates the MBR, np11 creates the partition, and w writes the changes.

That’s really it to reproduce the problem. If you now attach the Tiny Core ISO as a CD-ROM and the image file as a USB device, you’ll see that KVM won’t boot the Tiny Core live Linux, even though the disk with the MBR has no bootable partitions whatsoever.

3. Interrupt 18h to the rescue!

At first we thought we can’t really do anything about this case. We can detect it now programatically, but what could we possibly do about it?

Well, it turns out that at the very end of the BIOS Boot Specification in Appendix D.2 (yes, I read the entire 46 pages …), it says:

If an O/S is either not present, or otherwise not able to load, execute an INT 18h instruction so that control can be returned to the BIOS. Currently, hard drive boot sectors do this, but floppy diskette boot sectors execute an INT 19h instead of INT 18h. The BIOS Boot Specification defines INT 18h as the recovery vector for failed boot attempts.

Hurray! Exactly what we need. We need to tell the BIOS to try the next disk. The equivalent of “nothing to see here, move along”.

Well, let’s try it then. Let’s write an incredibly complex assembly program containing one instruction int 18h (to call interrupt 18h) and write it to the beginning of the disk:

That’s it already. This essentially nukes the bootability of any MBR based disk, which is exactly what we needed. If you look at the disk in a hex editor like dhex, you’ll see the interrupt call at the very beginning as 0xcd18:

Now you can try to boot the VM again, using the exact same parameters, and you’ll see that this time it skips the attached USB disk, and it’ll boot the live Linux instead:

If everything worked as expected, you should see the Tiny Core Linux boot loader screen:

X. Bonus

If you’re like me and the whole world of MBR bootstrap code is new to you, you’ll probably start experimenting with the skipdisk.asm file I provided. If you really want to, you can replace the entire bootstrap code area and do all sorts of things in there.

There is a wonderful wikibook on x86 bootloaders in assembly that I can recommend.

I found a good starting point also Dan Luedtke’s boot loader on Github. I modified his boot loader, ran nasm, and then used dd bs=446 count=1 conv=notrunc to replace the first 446 bytes of my disk. A great learning exercise.

Leave a comment

I'd very much like to hear what you think of this post. Feel free to leave a comment. I usually respond within a day or two, sometimes even faster. I will not share or publish your e-mail address anywhere.