8. Recovery Strategies

Data Recovery

Ubuntu uses software called TestDisk to recover its lost partitions. This is licensed under GNU Public License & consider as open source software.
TestDisk utility is not designed to recover lost partitions, but also to make non-booting disks bootable again when the disk is infected by faulty software or by human errors such as fortuitously erasing the Partition Table. (Sams 2008, p.246)

TestDisk questions from our OS or BIOS to find the Hard Disks or their characteristics specially LBA size and CHS geometry. Then the TestDisk does a quick check to find out the structure of our disk & compares it with our partition table to figure out entry errors. If the Partition Table has entry errors, TestDisk is able to repair them. TestDisk can search for partitions and create a new Table or even a new MBR if necessary when we have missing partitions or a completely empty Partition Table in our system. (Marshall 2001, p.36)
User has the chance to select any desirable partition from just before the drive failed to boot or the partition(s) were lost. TestDisk may show partition data which is simply from the small portions of a partition that had been deleted and overwritten long ago especially after initiating a detailed search for lost partitions.
TestDisk can be used to collect detailed information about a non-booting drive for further analysis. This has the features for work with both novice & expert users. Also expert users may find TestDisk as a handy tool for performing onsite recovery. (John 2004, p.485)

Checking hard drive in Linux

In DOS, you can run a Surface Scan in Scandisk. Linux does not have anything called Surface Scan, however. In Linux, it is called checking for bad blocks.

What is a block? It is exactly hard to define, but it can name as a chunk of data on the hard drive. Suppose we have a 40 gig partition & we divide it into a whole bunch of indexed blocks that might be like 4096 bytes each. Block 0 is the first 4096 bytes, block 1 the second 4096 bytes, and so on. An important thing we should know is that the "blocks" are a part of the file system. At time of formatting, a block size is chosen for the file system. The partition itself does not have a block size, but the file system does. (Charles 2006, p.45)

If part of your hard drive is messed up, the block or blocks that contain those bad elements should be marked bad. Basically, this means the block number is added to a list of bad blocks. Then we give the list to the file system on the partition. The file system stores it somewhere and remembers not to use those bad blocks. If you use e2fsck, the process of giving the list to the file system is automated. That is preferable since it prevents errors. (Check Hard Drive - Linux 2007)

The two general ways to find bad blocks

The first way is try to read every block. If one of the reads causes the hard drive to throw an error, then the block in question is marked bad (Stevens 2008, p.106). However this is not the best way, because sometimes the hard drive can have a bad part of the disk that doesn't throw an error when read for some reason.

The second way is to write data to every block on the hard disk & make sure it is the same when it is read back. It is possible to do this without erasing the data in our partition, but it makes it take longer. This method is done in DOS Surface Scan.

Usable programs

In Linux we will find only one program that is used to check disk for bad blocks. It is called bad blocks. We should only use this program directly, though when you are checking a blank partition, or a non ext2 or ext3 file system. When checking an ext2 or ext3 file system partition, you should use e2fsck, which runs bad blocks in the background. (Sams 2008, p.319)

Use of e2fsck

You should use this when checking an ext2 or ext3 file system. These 2 methods automatically save the bad blocks found into the file system so that those parts of the hard drive are no longer used.

Read-only method

e2fsck -c -C /dev/hda1 ---OR--- e2fsck -c -C -y /dev/hda1 (This answers yes to all questions, so it is sure to finish by itself.)

Non-destructive read/write method

e2fsck -c -c -C /dev/hda1 ---OR--- e2fsck -c -c -C -y /dev/hda1 (This answers yes to all questions, so it is sure to finish by itself.)
Note: File system must NOT be mounted. You therefore have to use a rescue CD if you need to check the root file system. (John 2004, p.342)

Using bad blocks

You should use this when checking a blank partition. You can also use it on a partition with a non ext2 or ext3 file system. There might be an equivalent of e2fsck for your file system, though, so you might try that. When you use bad blocks, the bad blocks list for your partition will not be saved in the file system automatically. It is possible to save the bad blocks list, and then have the file system read in that list. The problem is, you must set the block size in bad blocks to be the block size the file system will be, or currently is. Otherwise the block numbers will not correspond to the blocks in that file system. I'm not going to describe how to import the block list into the file system. You can read the man files for that information. (William 2004, p.172)

Destructive read/write method

badblocks -b 4096 -p 4 -c 16384 -w -s /dev/hda1

The number after -b is the block size. 4096 means 4096 bytes. You don't need to change this unless you're using the bad blocks list for something.

The number after -p is the number of passes it should run on the hard drive. The 4 means it will stop testing the hard drive after it has tested the entire hard drive 4 times without the bad blocks list changing. So if it finds new bad blocks on third pass, and none after that, it will have done 7 passes all together. If you don't want to do multiple passes, you can skip this switch to save time.
The number after -c is the number of blocks it tests at a time. The default is 16. The -b number * the -c number * 2 equals the number of bytes of RAM it will use. You should probably use as much of your available memory as possible to save time. Just make sure you don't use too much. You certainly wouldn't want this data to be swapped. If you run out of physical and swap memory, the program will just crash. The above settings use 128 Megs of RAM. (Destructive Read - Random Access Memory System 1998)