File system corrupt after re-adding software RAID 1 member after test. Why?

A colleague and me set up a software RAID 1 with mdadm consisting of two physical disks, with two partitions on the virtual device. The set up went fine, and booting directly from one of the RAID disks yielded:

# cat /proc/mdstat 

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0] sdb1[1]

      92094464 blocks super 1.2 [2/2] [UU]



md1 : active (auto-read-only) raid1 sda2[0] sdb2[2]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

To test our setup, we then shut the machine down, disconnected one of the disks, and restarted. The system came up fine, naturally in a degraded state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[1]

      92094464 blocks super 1.2 [2/1] [_U]



md1 : active (auto-read-only) raid1 sda2[2]

      4069376 blocks super 1.2 [2/1] [_U]



unused devices: <none>

Next, we shut the machine down again, reconnected the disconnected disk, and disconnected the other disk. Again, everything went fine, with the following expected state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sda2[0]

      4069376 blocks super 1.2 [2/1] [U_]



unused devices: <none>

Finally, we shut down for the final time, reconnected everything, but what we got was this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



md127 : active raid1 sdb1[1]

      92094464 blocks super 1.2 [2/1] [_U]



unused devices: <none>

As you can see, the first partition (second entry, they were swapped for some reason) is in a degraded state (the second is not, but that's just a swap partition). We weren't particularly worried by this. After all, it's expected that the two partitions aren't exactly equal any more after the simulated alternating failure of the disks. We added the missing partition like this:

# mdadm --manage /dev/md127 --add /dev/sda1

mdadm: re-added /dev/sda1

We expected for the partition on /dev/sda to sync (be overwritten by) the one on /dev/sdb. Instead, we ended up with a corrupt file system (numerous errors within seconds).

After this experience, I rebooted from a third disk, reinitialised the file system on /dev/md127 (with the -c option to mkfs.ext4 for good measure), and rebooted back into the now again functioning RAID. Then once more, we shut down, disconnected one disk, booted, shut down again, reconnected the disk, and this time we also left the other disk connected, and booted. Now we got this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Now we're afraid the same thing will happen again if we just use the --add option as above.

I have two questions:

What caused the file system corruption after simulating the alternating failure? My guess is that is has something to do with both disks diverging from the state just before the first disconnection, and this somehow tricked mdadm --add in not doing a resync. What would have been the correct sequence of commands to tell mdadm to use the mounted state as authoritative and sync the added disk to it?

In our current situation (one simulated failure and then reconnect, i. e. only one of the disks diverged from the state just before disconnection), what is the proper way to re-add the missing device? Can I just use the add command as above and it will resync? Why didn't it resync automatically?

If it helps, here is the current output from mdadm --examine:

# mdadm --examine /dev/sda1

/dev/sda1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : 46077734:6a094293:96f92dc3:0a09706e



    Update Time : Tue Feb  5 13:36:59 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 139d1d09 - correct

         Events : 974





   Device Role : Active device 0

   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)

# mdadm --examine /dev/sdb1

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : dcffbed3:147347dc:b64ebb8d:97ab5956



    Update Time : Tue Feb  5 10:47:41 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : e774af76 - correct

         Events : 142





   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

asked Feb 5 at 12:56

Alexander Klauer

1013

Did you allow a significant amount of time between each to test to allow the RAID to rebuild itself?

– Ramhound
Feb 5 at 13:02

@Ramhound No. Between the 1st and 2nd test, there was nothing to rebuild. Should we have waited after the 2nd test before executing --add? OTOH, would a member that is no longer considered part of the array by the kernel even be rebuilt?

– Alexander Klauer
Feb 5 at 13:07

add a comment |

# cat /proc/mdstat 

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0] sdb1[1]

      92094464 blocks super 1.2 [2/2] [UU]



md1 : active (auto-read-only) raid1 sda2[0] sdb2[2]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

To test our setup, we then shut the machine down, disconnected one of the disks, and restarted. The system came up fine, naturally in a degraded state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[1]

      92094464 blocks super 1.2 [2/1] [_U]



md1 : active (auto-read-only) raid1 sda2[2]

      4069376 blocks super 1.2 [2/1] [_U]



unused devices: <none>

Next, we shut the machine down again, reconnected the disconnected disk, and disconnected the other disk. Again, everything went fine, with the following expected state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sda2[0]

      4069376 blocks super 1.2 [2/1] [U_]



unused devices: <none>

Finally, we shut down for the final time, reconnected everything, but what we got was this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



md127 : active raid1 sdb1[1]

      92094464 blocks super 1.2 [2/1] [_U]



unused devices: <none>

# mdadm --manage /dev/md127 --add /dev/sda1

mdadm: re-added /dev/sda1

We expected for the partition on /dev/sda to sync (be overwritten by) the one on /dev/sdb. Instead, we ended up with a corrupt file system (numerous errors within seconds).

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Now we're afraid the same thing will happen again if we just use the --add option as above.

I have two questions:

What caused the file system corruption after simulating the alternating failure? My guess is that is has something to do with both disks diverging from the state just before the first disconnection, and this somehow tricked mdadm --add in not doing a resync. What would have been the correct sequence of commands to tell mdadm to use the mounted state as authoritative and sync the added disk to it?

In our current situation (one simulated failure and then reconnect, i. e. only one of the disks diverged from the state just before disconnection), what is the proper way to re-add the missing device? Can I just use the add command as above and it will resync? Why didn't it resync automatically?

If it helps, here is the current output from mdadm --examine:

# mdadm --examine /dev/sda1

/dev/sda1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : 46077734:6a094293:96f92dc3:0a09706e



    Update Time : Tue Feb  5 13:36:59 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 139d1d09 - correct

         Events : 974





   Device Role : Active device 0

   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)

# mdadm --examine /dev/sdb1

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : dcffbed3:147347dc:b64ebb8d:97ab5956



    Update Time : Tue Feb  5 10:47:41 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : e774af76 - correct

         Events : 142





   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

asked Feb 5 at 12:56

Alexander Klauer

1013

Did you allow a significant amount of time between each to test to allow the RAID to rebuild itself?

– Ramhound
Feb 5 at 13:02

@Ramhound No. Between the 1st and 2nd test, there was nothing to rebuild. Should we have waited after the 2nd test before executing --add? OTOH, would a member that is no longer considered part of the array by the kernel even be rebuilt?

– Alexander Klauer
Feb 5 at 13:07

add a comment |

# cat /proc/mdstat 

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0] sdb1[1]

      92094464 blocks super 1.2 [2/2] [UU]



md1 : active (auto-read-only) raid1 sda2[0] sdb2[2]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

To test our setup, we then shut the machine down, disconnected one of the disks, and restarted. The system came up fine, naturally in a degraded state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[1]

      92094464 blocks super 1.2 [2/1] [_U]



md1 : active (auto-read-only) raid1 sda2[2]

      4069376 blocks super 1.2 [2/1] [_U]



unused devices: <none>

Next, we shut the machine down again, reconnected the disconnected disk, and disconnected the other disk. Again, everything went fine, with the following expected state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sda2[0]

      4069376 blocks super 1.2 [2/1] [U_]



unused devices: <none>

Finally, we shut down for the final time, reconnected everything, but what we got was this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



md127 : active raid1 sdb1[1]

      92094464 blocks super 1.2 [2/1] [_U]



unused devices: <none>

# mdadm --manage /dev/md127 --add /dev/sda1

mdadm: re-added /dev/sda1

We expected for the partition on /dev/sda to sync (be overwritten by) the one on /dev/sdb. Instead, we ended up with a corrupt file system (numerous errors within seconds).

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Now we're afraid the same thing will happen again if we just use the --add option as above.

I have two questions:

What caused the file system corruption after simulating the alternating failure? My guess is that is has something to do with both disks diverging from the state just before the first disconnection, and this somehow tricked mdadm --add in not doing a resync. What would have been the correct sequence of commands to tell mdadm to use the mounted state as authoritative and sync the added disk to it?

In our current situation (one simulated failure and then reconnect, i. e. only one of the disks diverged from the state just before disconnection), what is the proper way to re-add the missing device? Can I just use the add command as above and it will resync? Why didn't it resync automatically?

If it helps, here is the current output from mdadm --examine:

# mdadm --examine /dev/sda1

/dev/sda1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : 46077734:6a094293:96f92dc3:0a09706e



    Update Time : Tue Feb  5 13:36:59 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 139d1d09 - correct

         Events : 974





   Device Role : Active device 0

   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)

# mdadm --examine /dev/sdb1

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : dcffbed3:147347dc:b64ebb8d:97ab5956



    Update Time : Tue Feb  5 10:47:41 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : e774af76 - correct

         Events : 142





   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

asked Feb 5 at 12:56

Alexander Klauer

1013

# cat /proc/mdstat 

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0] sdb1[1]

      92094464 blocks super 1.2 [2/2] [UU]



md1 : active (auto-read-only) raid1 sda2[0] sdb2[2]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

To test our setup, we then shut the machine down, disconnected one of the disks, and restarted. The system came up fine, naturally in a degraded state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[1]

      92094464 blocks super 1.2 [2/1] [_U]



md1 : active (auto-read-only) raid1 sda2[2]

      4069376 blocks super 1.2 [2/1] [_U]



unused devices: <none>

Next, we shut the machine down again, reconnected the disconnected disk, and disconnected the other disk. Again, everything went fine, with the following expected state:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sda2[0]

      4069376 blocks super 1.2 [2/1] [U_]



unused devices: <none>

Finally, we shut down for the final time, reconnected everything, but what we got was this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



md127 : active raid1 sdb1[1]

      92094464 blocks super 1.2 [2/1] [_U]



unused devices: <none>

# mdadm --manage /dev/md127 --add /dev/sda1

mdadm: re-added /dev/sda1

We expected for the partition on /dev/sda to sync (be overwritten by) the one on /dev/sdb. Instead, we ended up with a corrupt file system (numerous errors within seconds).

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 

md127 : active raid1 sda1[0]

      92094464 blocks super 1.2 [2/1] [U_]



md1 : active (auto-read-only) raid1 sdb2[2] sda2[0]

      4069376 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Now we're afraid the same thing will happen again if we just use the --add option as above.

I have two questions:

What caused the file system corruption after simulating the alternating failure? My guess is that is has something to do with both disks diverging from the state just before the first disconnection, and this somehow tricked mdadm --add in not doing a resync. What would have been the correct sequence of commands to tell mdadm to use the mounted state as authoritative and sync the added disk to it?

In our current situation (one simulated failure and then reconnect, i. e. only one of the disks diverged from the state just before disconnection), what is the proper way to re-add the missing device? Can I just use the add command as above and it will resync? Why didn't it resync automatically?

If it helps, here is the current output from mdadm --examine:

# mdadm --examine /dev/sda1

/dev/sda1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : 46077734:6a094293:96f92dc3:0a09706e



    Update Time : Tue Feb  5 13:36:59 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 139d1d09 - correct

         Events : 974





   Device Role : Active device 0

   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)

# mdadm --examine /dev/sdb1

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 726d9204:889a4c89:b7a1bdb9:a77d8130

           Name : testhost:0  (local to host testhost)

  Creation Time : Mon Feb  4 14:39:21 2019

     Raid Level : raid1

   Raid Devices : 2



 Avail Dev Size : 184188928 (87.83 GiB 94.30 GB)

     Array Size : 92094464 (87.83 GiB 94.30 GB)

    Data Offset : 131072 sectors

   Super Offset : 8 sectors

   Unused Space : before=130984 sectors, after=0 sectors

          State : clean

    Device UUID : dcffbed3:147347dc:b64ebb8d:97ab5956



    Update Time : Tue Feb  5 10:47:41 2019

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : e774af76 - correct

         Events : 142





   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

hard-drive software-raid raid-1 mdadm

asked Feb 5 at 12:56

Alexander Klauer

1013

asked Feb 5 at 12:56

Alexander Klauer

1013

asked Feb 5 at 12:56

Alexander Klauer

1013

asked Feb 5 at 12:56

Alexander Klauer

1013

asked Feb 5 at 12:56

Alexander Klauer

1013

Did you allow a significant amount of time between each to test to allow the RAID to rebuild itself?

– Ramhound
Feb 5 at 13:02

@Ramhound No. Between the 1st and 2nd test, there was nothing to rebuild. Should we have waited after the 2nd test before executing --add? OTOH, would a member that is no longer considered part of the array by the kernel even be rebuilt?

– Alexander Klauer
Feb 5 at 13:07

add a comment |

Did you allow a significant amount of time between each to test to allow the RAID to rebuild itself?

– Ramhound
Feb 5 at 13:02

@Ramhound No. Between the 1st and 2nd test, there was nothing to rebuild. Should we have waited after the 2nd test before executing --add? OTOH, would a member that is no longer considered part of the array by the kernel even be rebuilt?

– Alexander Klauer
Feb 5 at 13:07

Did you allow a significant amount of time between each to test to allow the RAID to rebuild itself?

– Ramhound
Feb 5 at 13:02

@Ramhound No. Between the 1st and 2nd test, there was nothing to rebuild. Should we have waited after the 2nd test before executing --add? OTOH, would a member that is no longer considered part of the array by the kernel even be rebuilt?

– Alexander Klauer
Feb 5 at 13:07

add a comment |

1 Answer
1

active

oldest

votes

I found out what went wrong. I found this in the documentation of mdadm:

When a device is added to an active array, mdadm checks to see if it has metadata on it which suggests that it was recently a member of the array. If it does, it tries to "re-add" the device. If there have been no changes since the device was removed, or if the array has a write-intent bitmap which has recorded whatever changes there were, then the device will immediately become a full member of the array and those differences recorded in the bitmap will be resolved.

(emphasis mine)

Since both devices diverged from the time of initial disconnection, their recorded changes were mutually incompatible, shredding the filesystem.

The solution for such a case is to call mdadm --zero-superblock on the missing device before adding it. This will force a clean rebuild.

The second case, where only one of the devices diverged, was probably harmless, though I haven't tried it. In a case of actual failure, where you have to replace the physical drive, you should also be fine since there is no metadata in the first place.

answered Feb 5 at 15:32

Alexander Klauer

1013

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1402239%2ffile-system-corrupt-after-re-adding-software-raid-1-member-after-test-why%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I found out what went wrong. I found this in the documentation of mdadm:

When a device is added to an active array, mdadm checks to see if it has metadata on it which suggests that it was recently a member of the array. If it does, it tries to "re-add" the device. If there have been no changes since the device was removed, or if the array has a write-intent bitmap which has recorded whatever changes there were, then the device will immediately become a full member of the array and those differences recorded in the bitmap will be resolved.

(emphasis mine)

Since both devices diverged from the time of initial disconnection, their recorded changes were mutually incompatible, shredding the filesystem.

The solution for such a case is to call mdadm --zero-superblock on the missing device before adding it. This will force a clean rebuild.

answered Feb 5 at 15:32

Alexander Klauer

1013

add a comment |

I found out what went wrong. I found this in the documentation of mdadm:

When a device is added to an active array, mdadm checks to see if it has metadata on it which suggests that it was recently a member of the array. If it does, it tries to "re-add" the device. If there have been no changes since the device was removed, or if the array has a write-intent bitmap which has recorded whatever changes there were, then the device will immediately become a full member of the array and those differences recorded in the bitmap will be resolved.

(emphasis mine)

Since both devices diverged from the time of initial disconnection, their recorded changes were mutually incompatible, shredding the filesystem.

The solution for such a case is to call mdadm --zero-superblock on the missing device before adding it. This will force a clean rebuild.

answered Feb 5 at 15:32

Alexander Klauer

1013

add a comment |

I found out what went wrong. I found this in the documentation of mdadm:

When a device is added to an active array, mdadm checks to see if it has metadata on it which suggests that it was recently a member of the array. If it does, it tries to "re-add" the device. If there have been no changes since the device was removed, or if the array has a write-intent bitmap which has recorded whatever changes there were, then the device will immediately become a full member of the array and those differences recorded in the bitmap will be resolved.

(emphasis mine)

Since both devices diverged from the time of initial disconnection, their recorded changes were mutually incompatible, shredding the filesystem.

The solution for such a case is to call mdadm --zero-superblock on the missing device before adding it. This will force a clean rebuild.

answered Feb 5 at 15:32

Alexander Klauer

1013

I found out what went wrong. I found this in the documentation of mdadm:

When a device is added to an active array, mdadm checks to see if it has metadata on it which suggests that it was recently a member of the array. If it does, it tries to "re-add" the device. If there have been no changes since the device was removed, or if the array has a write-intent bitmap which has recorded whatever changes there were, then the device will immediately become a full member of the array and those differences recorded in the bitmap will be resolved.

(emphasis mine)

Since both devices diverged from the time of initial disconnection, their recorded changes were mutually incompatible, shredding the filesystem.

The solution for such a case is to call mdadm --zero-superblock on the missing device before adding it. This will force a clean rebuild.

answered Feb 5 at 15:32

Alexander Klauer

1013

answered Feb 5 at 15:32

Alexander Klauer

1013

answered Feb 5 at 15:32

Alexander Klauer

1013

answered Feb 5 at 15:32

Alexander Klauer

1013

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly