Ubuntu server checking disk for errors. Checking and repairing disks in Linux

Any computer is a complex device that consists of many components and no one is immune from failures of any of them. In this article, we will look at how to promptly recognize one of the serious problems with information storage devices, be it a hard drive or a flash drive, and how a disk is checked for bad sectors in Linux.

Any drive consists of many small blocks (sectors) that store information in the form of zeros or ones (bits). If, for some reason, the operating system cannot write a bit of information to a certain sector, then it can be considered “broken”.

A sector can become damaged for various reasons:

  • Manufacturing defects
  • Turn off the computer's power while recording information.
  • Physical wear and tear of the drive.

A small number of bad sectors are found on almost any drive. But it is worth paying attention if their number increases over time. This may indicate the imminent physical death of the drive and it’s time for you to think about replacing it.

Let's look at which Linux utilities we can use to check a disk for bad Linux sectors.

Checking the drive for bad sectors using badblocks.

Badblocks is a standard Linux utility for checking for bad sectors. It is installed by default in almost any distribution and can be used to check both a hard drive and an external drive.

First, let's look at what drives are connected to our system and what partitions they have. To do this, we need another standard Linux utility - fdisk.

Naturally, you need to execute commands with superuser rights:

Parameter -l we tell fdisk that we need to show the list of partitions and exit.

Now that we know what partitions we have, we can check them for bad sectors. To do this we will use the badblocks utility as follows:

$ sudo badblocks -v /dev/sda1 > badsectors.txt

To check, we specify the following parameters:

  • -v— detailed output of information about the test results.
  • /dev/sda1- the section that we want to check for bad sectors.
  • > badsectors.txt— output the result of the command to the badsectors.txt file.

If, as a result, bad sectors were found, then we need to instruct the operating system not to write information to them in the future. To do this, we need Linux utilities for working with file systems:

  • e2fsck. If we fix a partition with Linux file systems (ext2,ext3,ext4).
  • fsck. If we patch a file system other than ext.

Enter the following commands:

$ sudo e2fsck -l badsectors.txt /dev/sda1

Or, if our file system is not ext:

$ sudo fsck -l badsectors.txt /dev/sda1

Parameter -l we tell the utility to use the list of bad sectors from the file badsectors.txt, which we obtained earlier when checking with the badblocks utility.

Checking a drive for bad sectors in Linux in smartmontools

Now let's look at a more modern and reliable way to check a disk for bad linux sectors. Modern ATA/SATA, SCSI/SAS, SSD drives have a built-in self-monitoring system S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology, Self-monitoring, analysis and reporting technology), which monitors drive parameters and will help determine the deterioration of drive parameters in the early stages. To work with S.M.A.R.T in Linux there is a smartmontools utility.

Let's install it first. If your distribution is based on Debian\Ubuntu, then enter:

$ sudo apt install smartmontools

If you have a distribution based on RHEL\CentOS, then enter:

$ sudo yum install smartmontools

A computer is a device whose operation is based on the interaction of many components. Over time, they can cause malfunctions. One of the common reasons for the machine to not work properly is bad sectors on the disk, so it needs to be tested periodically. Linux provides all the possibilities for this.

What are broken blocks and why do they appear?

A block (sector) is a small disk cell on which information is stored in the form of bits (0 and 1). When the system fails to write the next bit into a cell, it is called a bad sector. There can be several reasons for the occurrence of such blocks:

  • manufacturing defects;
  • turning off the power while recording information;
  • physical wear of the disk.

Initially, almost all media contain violations. Over time, their number may increase, which indicates that the device will soon fail. In Linux, there are several ways to test a disk for errors.

Linux disk check

The Linux kernel runs several operating systems, including Ubuntu and Debian. The disk check procedure is universal and suitable for each of them. It’s worth considering that it’s time to test the media when the disk system is under heavy load, the speed of working with the media (writing/reading) has decreased significantly, or these procedures even cause errors.

Many people are familiar with the program on Windows – Victoria HDD. The developers took care of writing its analogues for Linux.

Badblocks

Badblocks is a disk utility available by default on Ubuntu and other Linux distributions. The program allows you to test both the hard drive and external drives.

Important! All terminal commands given in this article begin with the sudo parameter, since they require superuser rights to execute.

Before testing a disk in Linux, you should check which drives are connected to the system using the fdisk-l utility. It will also show the partitions available on them.

Now you can begin direct testing for bad sectors. The work of Badblocks is organized as follows:

badblocks -v /dev/sdk1 > bsector.txt

The following commands and operands are used in the entry:

  • -v – displays a detailed report on the scan performed; ·
  • /dev/sdk 1– section to be checked;
  • bsector.txt – writes the results to a text file.

If bad blocks are found when checking the disk, you need to run the fsck or e2fsck utility, depending on the file system used. They will limit the recording of information in non-working sectors. For ext2, ext3, or ext4 file systems, run the following command:

fsck -l bsector.txt /dev/sdk1

Otherwise:

fsck -l bsector.txt /dev/sdk1

The -l parameter tells the program that bad blocks are listed in the bsector.txt file, and these are the ones that need to be excluded.

GParted

The utility checks the Linux file system without resorting to a text interface.

The tool is not natively included in operating system distributions, so it must be installed by running the command:

apt-get install gparted

The main application window displays available drives. The fact that it’s time to test the carrier is clear from the exclamation mark located next to its name. The check is started by clicking on the “Check for errors” item in the “Section” submenu located on the panel at the top. The required disk is pre-selected. Once the scan is complete, the utility will display the result.

Checking HDDs and other storage devices with the GParted application is available for users of Ubuntu, FreeBSD, Centos, Debian and other distributions running on the Linux kernel.

Smartmontools

The tool allows you to test the file system with greater reliability. Modern hard drives have a built-in self-monitoring module S.M.A.R.T., which analyzes drive data and helps identify a malfunction at the initial stage. Smartmontools is designed to work with this module.

The installation is started via the terminal:

  • apt install smartmontools – for Ubuntu/Debian;
  • yum install smartmontools – for CentOS.

To view information about the status of the hard drive, enter the following line:

smartctl –H /dev/sdk1

Checking for errors takes varying amounts of time, depending on the disk size. Upon completion, the program will display a result about the presence of bad sectors or their absence.

The utility has other parameters: -a, --all, -x, --xall. For additional information, call up help:

Safecopy

When the need arises to test a hard drive in Linux, you should be prepared for any result.

The Safecopy application copies data from a damaged device to a working one. The source can be either hard drives or removable media. This tool ignores I/O errors, read errors, and bad blocks while continuing to work continuously. The execution speed is the maximum possible that the computer provides.

Comment! The utility is not intended to recover deleted files. It retrieves information stored in bad sectors.

To install Safecopy on Linux, enter the following line into the terminal:

The scan is started with the command:

safecopy /dev/sdk1 /home/files/

Here the first path denotes the damaged disk, the second - the directory where the files will be saved.

The program is able to create an image of the file system of an unstable storage device.

What to do if an error is detected in the Ubuntu system program

Installing new software or changing system settings may cause a "System program error detected" message. Many people ignore it because it does not affect their overall work.

The problem is usually encountered by users of Ubuntu version 16.04. In this case, there is no need to test the HDD, since the problem is most likely a software failure. The message notifies you of an unexpected termination of the program and prompts you to send a report to the developers. If you agree, a browser window will open where you need to fill out a 4-step form. This option causes difficulties and does not guarantee that the error will disappear.

The second method will help avoid the message from appearing only if it is called by the same program. To do this, when you receive the next notification, you need to check the “Do not show again for this program” option.

The third method is to disable the Apport utility, which in Linux is responsible for collecting information and sending reports. This approach will completely eliminate pop-up windows with errors. It is possible to disable only the display of notifications, leaving the collection service running. To do this you need to do:

gsettings set com.ubuntu.update-notifier show-apport-crashes false

Data will continue to be collected in the /var/crash folder. They need to be cleaned periodically to prevent them from filling up disk space:

To completely disable Apport services, enter the following entry into the terminal:

gksu gedit /etc/default/apport

In the text that appears, the value of the enable field changes from 1 to 0. In the future, to enable the service again, the default settings are returned.

Conclusion

To prevent file loss, it is recommended to periodically test your hard drive and removable media. Linux offers several approaches to solving the problem. You can choose from a list of utilities that identify bad sectors and ensure the transfer of information to a normally functioning device.

If there's one thing you really don't want to encounter in your operating system, it's definitely the unexpected failure of your hard drives. With backup and RAID storage technology, you can get all your data back in place very quickly, but losing a hardware device can have a big impact on your budget, especially if you didn't plan for it.

To avoid such problems you can use smartmontools. It is a software package for managing and monitoring storage devices using Self-Monitoring Analysis and Reporting Technology or simply SMART.

Most modern ATA / SATA, SCSI / SAS storage devices provide a SMART interface. The purpose of SMART is to monitor the reliability of the hard drive to identify various errors and respond in a timely manner to their occurrence. Smartmontools consists of two utilities - smartctl and smartd. Together they provide a powerful monitoring and warning system for possible HDD failures in Linux. Next we will look at checking a Linux hard drive in detail.

The smartmontools package is available in the official repositories of most Linux distributions, so installation is reduced to executing one command. On Debian and Debian-based systems, run:

aptitude install smartmontools

And for Red Hat:

yum install smartmontools

Now you can proceed to diagnosing your Linux hard drive.

Checking the hard drive in smartctl

First, find out what hard drives are connected to your system:

ls -l /dev | grep -E "sd|hd"

The output will be something like this:

Here - sdx is the name of the HDD device connected to the computer.

To display information about a specific hard drive (device model, S/N, firmware version, ATA version, SMART interface availability) Run smartctl with the info option and the hard drive name. For example, for /dev/sda:

smartctl --info /dev/sda

While you may not pay attention to the ATA version, it is one of the most important factors when looking for a replacement device. Each new version of ATA is compatible with previous ones. For example, old ATA-1 and ATA-2 devices will work fine on ATA-6 and ATA-7 interfaces, but not vice versa. When the ATA versions of the device and interface do not match, the hardware's capabilities will not be fully realized. In this case, it is best to choose an ATA-7 hard drive for replacement.

You can run a scan of your ubuntu hard drive with the command:

smartctl -s on -a /dev/sda

Here's the option -s turns on the SMART flag on the specified device. You can remove it if SMART support is already enabled. Disk information is divided into several sections. In section READ SMART DATA contains general information about the health of the hard drive.

START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment rest result: PASSED

This test can be passed ( PASSED) or not ( FAILED). In the latter case, failure is inevitable, start backing up data from this disk.

The next thing you can look at when you need HDD diagnostics in Linux is the SMART attribute table.

The SMART table records the parameters defined for a specific disk by the developer, as well as the failure threshold for these parameters. The table is filled in automatically and updated based on the disk firmware.

  • ID #- Attribute ID, usually a decimal number between 1 and 255;
  • ATTRIBUTE_NAME- Attribute name;
  • FLAG- attribute processing flag;
  • VALUE- This field represents the normal value for the state of this attribute in the range from 1 to 253, 253 is the best state, 1 is the worst. Depending on the properties, the initial value can be from 100 to 200;
  • WORST- the worst value of all time;
  • THRESH- the lowest value, after passing which you need to report that the disk is unsuitable for use;
  • TYPE- attribute type, can be Pre-fail or Old_age. All attributes are considered critical by default, that is, if the disk does not pass the test for one of the attributes, then it is already considered FAILED, but the old_age attributes are not critical;
  • UPDATED- shows the attribute update frequency;
  • WHEN_FAILED- will be set to FAILING_NOW if the attribute value is less than or equal to THRESH, or to "-" if higher. In the case of FAILING_NOW, it is better to perform a backup as soon as possible, especially if the attribute type is Pre-fail.
  • RAW_VALUE- value determined by the manufacturer.

Now you are thinking, yes smartctl is a good tool, but I don’t have the ability to run it manually every time, it would be nice to automate this whole thing so that the program runs periodically and informs me about the scan results. And this is possible using smartd.

Setting up smartd and smartctl for real-time diagnostics and monitoring

Real-time hdd diagnostics in Linux are very easy to set up. First edit the smartd configuration file - /etc/smartd.conf. Add the following line:

nano /etc/smartd.conf

/dev/sda -m [email protected]-M test

  • -m - email address for sending verification results. This can be a local user address, a superuser address, or an external address if the server is configured to send email;
  • -M- frequency of sending letters. once - send only one message about disk problems. daily- send messages every day if a problem is found. diminishing- send messages every other day if a problem is discovered. test- send a test message when starting smartd. exec- executes the specified program to the place where mail is sent.

Save changes and restart smartd. You should receive an email with the following content:

You can also schedule tests according to your schedule, to do this, use the -s option and a regular expression like "T/MM/DD/DN/HH", where:

  • T- test type:
  • L- long test;
  • S- short test;
  • C- movement test (ATA);
  • O- offline (test).

The remaining characters determine the date and time of the test:

  • MM- month of the year;
  • DD- day of the month;
  • HH- One o'clock;
  • DN- day of the week (from 1 - Monday 7 - Sunday;
  • MM, DD and HH- indicated with two decimal digits.

The dot means all possible values, the expression in brackets (A|B|C) means one of three options, the expression in square brackets means a range (from 1 to 5).

For example, to perform a full scan of your Linux hard drive every weekday at 1 pm, add the following line to smartd.conf:

DEVICESCAN -s (L /../../ / 13)

conclusions

If you want to quickly check the mechanical operation of a hard drive, view its physical condition, or perform a more or less complete scan of the disk surface, use smartmontools. Don't forget to scan regularly, you'll thank yourself later. Have you done this before? Will you do it? Or do you use other methods? Write in the comments!

Translation source.

From time to time you need to check your hard drive. I believe that there is nothing more valuable than information on a hard drive, well, of course, not counting our lives, and it will be oh, what a shame when your family photos, videos, necessary abstracts and work reports, passwords and any other important data disappear. How to check a hard drive in Linux, and in our case in Ubuntu, and what programs exist for testing our helpers and saviors - hard drives? You should check the hard drive not from the system installed on it, but from LiveCD/USB. One such valuable build would be Parted Magic, although this can also be done from an Ubuntu CD/USB. This is a complete ammunition kit for working with hard drives. Here you have GParted, for resizing HDD partitions (analogous to Acronis Disc Director), and CloneZilla, for creating exact copies of your system disks or partitions with subsequent recovery, and GSmartControl - for reporting on the status of your disk and much more. So let's begin the review of programs for checking the hard drive in Ubuntu.

Console program Badblocks.

To find out how your hard drive or drives are partitioned and select a partition to check, run the command:

sudo fdisk -l

To start scanning for bad sectors, just run the command in the Terminal:

sudo badblocks -sv /dev/sdb1

Where:

/dev/sdb1- this is the section being checked,

-s— will display information about scanning as a percentage, the presence or absence of “bad” sectors, etc.,

-v— will display detailed information about the check.

If you need to get a text report, then you need to run the following command:

sudo badblocks -s /dev/sdb1 > errors.txt

Instead of /dev/sdb1 you must indicate the desired partition of your hard drive, and a text file will appear in your Home directory errors.txt with a report. If there are still bad things, then it is advisable to mark them so that the system does not access them while working with the disk. To mark bad sectors, run the command:

sudo e2fsck -l errors.txt /dev/sdb1

Key -l allows the program to use the errors.txt file to work with bad sectors. But you can avoid the above two commands and run just one:

sudo e2fsck -ct /dev/sdb1

The e2fsck program is part of the E2fsprogs software package, which includes badblock, and the key -c makes it possible to use the badblock utility to search for bad sectors.

To check the file system (ext 2/ext 3/ext 4) run the following command:

e2fsck -y /dev/partition of your disk or the entire disk

Key -y tells the utility to answer all questions positively.

Other commonly used options:

-p,-a automatically “repair” the file system without any questions asked.
-f forced (forced) check. The check will happen in any case, even if the file system did not need it.
-c launches the badblocks program to find and mark “bad” sectors on the disk;
-v Detailed information about the verification will be displayed.

Although fsck can be used instead of e2fsck. But everyone is free to choose what is better or more convenient for him.

Disks program.

Ubuntu has a great program Discs, which provides information on all connected devices in the system (hard drives, flash drives, CD/DVD drives, etc.) By running it, you can find out the S.M.A.R.T. data. on the disk of interest.

Program GSmartControl

And finally, I want to recommend the program GSmartControl, which is a graphical shell (GUI) for the console program - smartctl. You can find it in the Ubuntu Application Center, or install it through the Terminal with the command:

sudo apt-get install gsmartcontrol

The program shows complete information on S.M.A.R.T. data. You can find out more about each item .

Well, you’ve learned how to check a hard drive in Linux. May this information serve you well! Good luck!

I also had to face this problem. My one friend who has installed Ubuntu on an old ASUS laptop, and who simply does not want to turn on his brain at least sometimes, came to me with such a problem. The new Ubuntu 12.10 is installed on his laptop and very often the system simply does not want to boot, throwing it into a black screen or freezing on a purple background. But recently this message started popping up, something like “The operating system was unable to boot. Select the desired key for further actions...” And then there is a description of what needs to be pressed. I don’t remember exactly which keys the system suggests pressing, but the meaning is that to automatically correct errors, press such and such a key, for manual debugging another, and to ignore this message you are asked to press a third button. Automatic error correction did not lead to anything and the loading of the operating system never reached its logical conclusion. So I decided to try the famous team fsck.

First you need to boot from either a bootable USB flash drive with Ubuntu (Lubuntu, Xubuntu, Kubuntu, etc.) or from an Ubuntu Live CD. Now we need to find out which Ubuntu partition we need to scan to fix the file system. Launch Terminal (Ctrl-Alt-T) and execute the command:

sudo fdisk -l

This command will show us all the disks and flash drives that are mounted to the system. I'll give an example with my personal computer, and not with a friend's laptop. Here's what I got:

ubuntu@ubuntu:~$ sudo fdisk -l

Disk /dev/sda: 640.1 GB, 640135028736 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors



Disk identifier: 0x0009d6f7


/dev/sda1 * 2048 61442047 30720000 83 Linux
/dev/sda2 61442048 73730031 6143992 82 Linux swap / Solaris
/dev/sda3 73730048 1250263039 588266496 83 Linux

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xb9ff6f01

Device Boot Start End Blocks Id System
/dev/sdb1 * 16065 100197404 50090670 83 Linux
/dev/sdb2 105322201 976771071 435724435+ 5 Extended
/dev/sdb3 100197405 105322139 2562367+ 82 Linux swap / Solaris
/dev/sdb5 105322203 832110591 363394194+ 7 HPFS/NTFS/exFAT
/dev/sdb6 832112640 860755218 14321289+ 83 Linux
/dev/sdb7 860758016 862613503 927744 82 Linux swap / Solaris
/dev/sdb8 862615552 976771071 57077760 83 Linux

Partition table entries are not in disk order

Disk /dev/sdc: 8115 MB, 8115978240 bytes
250 heads, 62 sectors/track, 1022 cylinders, total 15851520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xc3072e18

Device Boot Start End Blocks Id System
/dev/sdc1 * 32 15847625 7923797 b W95 FAT32

As you can see from the command output sudo fdisk -l, I have 2 hard drives (sda) 640 GB and (sdb) 500 GB, as well as a flash drive (sdc) 8 GB, from which I actually booted. I know that my base system with Ubuntu 12.04 is located on the sda ​​disk, and the partition with the operating system is called sda1.

Now that we know the partition that needs to be scanned, we can actually start checking it. In the Terminal:

sudo fsck -y -f -c /dev/sda1

If you see an error, you most likely need to unmount this partition:

sudo umount /dev/sda1

Command keys and parameters fsck:

y- always answer yes to all questions (there is an alternative: key p - starts checking in fully automatic mode);

f- forced check of the file system (even if the file system is marked as fully functional)

c- looks for bad blocks, and then marks them accordingly

/dev/sda1- device or partition that needs to be checked. Although the team may have a different appearance. For example:

sudo fsck -p /dev/sda1

In this case, only the -p switch has been added. You just read about all the fsck command keys and add exactly the keys you need. To find out about all the program's capabilities, enter in the Terminal:

man fsck

This is what the Terminal produced after checking:

ubuntu@ubuntu:~$ sudo fsck -y -f -c /dev/sda1
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
Checking for bad blocks (read-only test): 0.00% done, 0:00 elapsed. (0/0/0 errdone
/dev/sda1: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information