[467466 views]

[]

[toggle ads]

Odi's astoundingly incomplete notes

New entries | Code

ata failed command: FLUSH CACHE

I got bitten by this problem in a 2.6.31 and .32 Linux kernel:
Apr 28 16:21:53 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 28 16:21:53 kernel: ata2.00: failed command: FLUSH CACHE
Apr 28 16:21:53 kernel: ata2.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0tag 0
Apr 28 16:21:53 kernel: res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Apr 28 16:21:53 kernel: ata2.00: status: { DRDY }
Apr 28 16:21:55 kernel: ata2: soft resetting link
Apr 28 16:21:55 kernel: ata2: soft resetting link
Apr 28 16:21:55 kernel: ata2: nv_mode_filter: 0x3f39f&0x3f39f->0x3f39f, BIOS=0x3f000 (0xc700c6c0) ACPI=0x3f01f (20:60:0x1f)
Apr 28 16:21:55 kernel: ata2: nv_mode_filter: 0x739f&0x739f->0x739f, BIOS=0x7000 (0xc700c6c0) ACPI=0x701f (20:60:0x1f)
Apr 28 16:21:55 kernel: ata2.00: configured for UDMA/100
Apr 28 16:21:55 kernel: ata2.00: configured for UDMA/100
Apr 28 16:21:55 kernel: ata2.01: configured for UDMA/33
Apr 28 16:21:55 kernel: ata2.01: configured for UDMA/33
Apr 28 16:21:55 kernel: ata2.00: device reported invalid CHS sector 0
Apr 28 16:21:55 kernel: ata2.00: device reported invalid CHS sector 0
Apr 28 16:21:55 kernel: end_request: I/O error, dev sdb, sector 58604962
Apr 28 16:21:55 kernel: md: super_written gets error=-5, uptodate=0
Apr 28 16:21:55 kernel: md: super_written gets error=-5, uptodate=0
Apr 28 16:21:55 kernel: raid1: Disk failure on sdb3, disabling device.
Apr 28 16:21:55 kernel: raid1: Operation continuing on 1 devices.
Apr 28 16:21:55 kernel: ata2: EH complete
Apr 28 16:21:55 kernel: ata2: EH complete
Apr 28 16:21:55 kernel: RAID1 conf printout:
Apr 28 16:21:55 kernel: RAID1 conf printout:
Apr 28 16:21:55 kernel: --- wd:1 rd:2
Apr 28 16:21:55 kernel: --- wd:1 rd:2
Apr 28 16:21:55 kernel: disk 0, wo:0, o:1, dev:sda3
Apr 28 16:21:55 kernel: disk 0, wo:0, o:1, dev:sda3
Apr 28 16:21:55 kernel: disk 1, wo:1, o:0, dev:sdb3
Apr 28 16:21:55 kernel: disk 1, wo:1, o:0, dev:sdb3
Apr 28 16:21:55 kernel: RAID1 conf printout:
Apr 28 16:21:55 kernel: RAID1 conf printout:
Apr 28 16:21:55 kernel: --- wd:1 rd:2
Apr 28 16:21:55 kernel: --- wd:1 rd:2
Apr 28 16:21:55 kernel: disk 0, wo:0, o:1, dev:sda3
Apr 28 16:21:55 kernel: disk 0, wo:0, o:1, dev:sda3
Apr 28 16:21:55 mdadm[5734]: Fail event detected on md device /dev/md1, component device /dev/sdb3
Apparently a cache flush timed out. This leads to a failure of the RAID and the disk goes offline. Just removing and re-adding it with mdadm of course fixes the RAID again. But it occurred a couple of times and is annoying.

The machine is a media PC with an nVidia Corporation nForce2 chipset. Two IDE disks are attached to the PATA bus and running as md RAID-1 volumes. The disks are really old and may not be the best quality (thus the RAID-1...).

There is a patch that may actually address exactly this problem, by simply retrying the command.The patch is in 2.6.34 and will be in 2.6.33.4 and 2.6.32.13.

posted on 2010-05-11 08:40 CEST in Code | 0 comments | permalink