Reflinks with BTRFS and ZFS start off with behavior similar to hardlinks, but when you modify one of the files it doesn’t change the other one. Hard links share the same inode and edits to one file must affect the other.
Example:
File A: 10GB
File B: Reflink’d to File A
10GB total space used, 10GB shared between A and B.
Modify 1GB of File A
1GB of new blocks created, 11GB total now used. 9 GB still shared between A and B.
The way I like to think of it is: Hard links are the same file, but with two valid paths. This is two different files that share the same data only as long as they are the same.
But…hasn’t ZFS always had copy-on-write functionality? How is this new feature different?
CoW refers to how data is written to the disk. With CoW, data is never written back over the same block it came from, it’s always written to a new block, then the old block is unreserved after. In this way, data moves around on a CoW disk frequently. This attribute notably makes the filesystem very resilient against power loss, since data can’t be corrupted during a partial write. Only if the write is completed successfully will the old data be removed.
Could you elaborate on this a bit? To my understanding hard links already are the same file, it is only the inode that’s different.
Reflinks with BTRFS and ZFS start off with behavior similar to hardlinks, but when you modify one of the files it doesn’t change the other one. Hard links share the same inode and edits to one file must affect the other.
Example:
File A: 10GB
File B: Reflink’d to File A
10GB total space used, 10GB shared between A and B.
Modify 1GB of File A
1GB of new blocks created, 11GB total now used. 9 GB still shared between A and B.
The way I like to think of it is: Hard links are the same file, but with two valid paths. This is two different files that share the same data only as long as they are the same.
But…hasn’t ZFS always had copy-on-write functionality? How is this new feature different?
Edit: I think I found my answer here: https://www.ithands-on.com/2020/09/linux-101-zfs-filesystem-cow-system.html and I think I have fundamentally misunderstood what COW meant for years. Huh.
CoW refers to how data is written to the disk. With CoW, data is never written back over the same block it came from, it’s always written to a new block, then the old block is unreserved after. In this way, data moves around on a CoW disk frequently. This attribute notably makes the filesystem very resilient against power loss, since data can’t be corrupted during a partial write. Only if the write is completed successfully will the old data be removed.
Ha! Neat. Thank you!