The 3TB HDD is a Servant of Chaos ~ Reviving a Dead 3TB HDD by Repairing Bad Sectors ~

9 min

language: ja bn en es hi pt zh-cn zh-tw

Hello, this is Munou.
I was steeped in despair in the previous article, but I'm not a man who gives up easily.
Summer isn't over. Not until my homework is done______

How It Happened

Yesterday, my SSH connection suddenly stopped, and it felt like GNU/Linux had stalled, so after restarting, I checked the syslog.

Aug 27 04:03:55 localhost kernel: [2768521.366336] EXT4-fs (sdc1): error count since last fsck: 52  
Aug 27 21:19:27 haturatu kernel: [   28.915465] EXT4-fs (sdc1): warning: mounting fs with errors, running e2fsck is recommended  
Aug 27 21:24:30 haturatu kernel: [  332.803482] EXT4-fs (sdc1): error count since last fsck: 53  
Aug 27 22:28:45 haturatu kernel: [    5.694598] EXT4-fs (sdc1): warning: mounting fs with errors, running e2fsck is recommended  
Aug 27 22:33:55 haturatu kernel: [  316.412146] EXT4-fs (sdc1): error count since last fsck: 53  
Aug 28 08:19:17 haturatu kernel: [    5.710007] EXT4-fs (sdc1): warning: mounting fs with errors, running e2fsck is recommended  
Aug 28 08:24:39 haturatu kernel: [  316.427213] EXT4-fs (sdc1): error count since last fsck: 53  

Huh?

Oh?

Hmm?

The HDD, not returning for Obon

Attempting fsck

For now, connected via USB-HDD.

$ sudo fsck -f -y /dev/sdb  

After checking the disk with lsblk, I ran it for now

727810) +(77856769--77859749) +(77987841--77990269) +(78118913--78121629) +(78249985--78252434) +(78381057--78383008) +(78512129--78515588) +(78643201--78644782) +(78774273--78776000) +(78905345--78907598) +(79036417--79038955) +(79167489--79169769) +(79298561--79300360) +(79429633--79431425) +(79560705--79562857) +(79691777--79695179) +(79822849--79857556) +(79953921--79960927) +(80084993--80113787) +140247041  
Fix? yes  
  
Padding at end of inode bitmap is not set. Fix? yes  
  
Error reading block 32768 (入力/出力エラーです).  Ignore error? yes  
  
Force rewrite? yes  
  
Error writing block 32768 (入力/出力エラーです).  Ignore error? yes  
  
Error reading block 98304 (入力/出力エラーです).  Ignore error? yes  

I knew it!!!!!
Yes, it's the despair-inducing I/O error I've seen before.

Checking with dmesg

I'll omit some details, but I checked the relevant drive entries with grep.

$ sudo dmesg | grep ' I/O error,' | grep "sdb"  
[227123.296980] I/O error, dev sdb, sector 62916608 op 0x0:(READ) flags 0x83700 phys_seg 1 prio class 0  
[227175.622027] I/O error, dev sdb, sector 78880 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0  
[227175.622087] I/O error, dev sdb, sector 62916624 op 0x0:(READ) flags 0x83700 phys_seg 14 prio class 0  
[227459.286465] I/O error, dev sdb, sector 373295104 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0  
[227490.240900] I/O error, dev sdb, sector 377489408 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0  

Ahhh
My heart is about to break, Patrasche.

Googling

I'll rely on the wisdom of past engineers.

(I tried repairing bad sectors that occurred on a Linux HDD)[https://web.archive.org/web/20240828125146/https://neocat.hatenablog.com/entry/2019/10/21/061645]

I read through it and vaguely understood the mechanism.

In such cases, if you write to this sector, the HDD's firmware should replace it with an alternate sector.

Apparently. Anyway, while panicking, I kept inputting the sectors that appeared in the dmesg output.

$ cat secter.sh   
#!/bin/bash  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((62916608 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((62916624 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((373295104 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((406849536 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((411043840 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((415238144 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((419432448 / 8)) count=1  
sudo dd if=/dev/zero of=/dev/sdc bs=4k seek=$((423626752 / 8)) count=1  

I'm omitting some details here too, but please note that the drive name changed from sdb to sdc to sdd as I kept trying fsck and unplugging/plugging it in due to I/O errors.
For now, I ran a script with commands for all the sectors that appeared in dmesg.
I'm sure there's a more concise way to do it, but at times like these, efficiency is the last thing on my mind. Even if it's a messy approach, I'd rather do it with desperate determination than risk accidentally wiping all the drive data.

And then, to fsck...

While running it on the same sectors several times,
Bad sectors playing hide and seek on my hard drive
I had tried smartctl earlier, as it's supposed to check for bad sectors, but it didn't work. However, after repeating the aforementioned messy battle a few times, it started working, so I ran sudo smartctl -a /dev/sda, sudo smartctl -t short /dev/sda successfully, and then repeated the messy process again...

$ sudo fsck -f -y /dev/sdd1  
fsck from util-linux 2.40.2  
e2fsck 1.47.1 (20-May-2024)  
Pass 1: Checking inodes, blocks, and sizes  
Pass 2: Checking directory structure  
Pass 3: Checking directory connectivity  
Pass 4: Checking reference counts  
Pass 5: Checking group summary information  
Block bitmap differences:  +(7864320--7872543) +(7929856--7946227) +(7946240--7962599) +(46661632--46669855) +(50855936--50864159) +(51380224--51388447) +(51388796--51404769) +(51404800--51412989) +(51904512--51928992) +(51929026--51929058) +(51929088--51937274) +(52428800--52453339) +(52453376--52461542) +(52953088--52962248) +(52964143--52977649) +(52977664--52985847)  

Fix? yes

Waaaaahhhhhhhhhhhhhh!!!!!

Revived!!!!

Image
I haven't fully checked which files are corrupted yet, but most of the 1TB of data has returned.

And so, it was a day when I keenly realized the greatness of the wisdom of our predecessors.
It was truly an unexpected day to find out my HDD died in the morning and then manage to recover it on the same day.
That's all for now.
See you again.

Related Posts