
During the 6.6 merge window, the multi-grain timestamp feature was introduced but ultimately reverted after it caused regressions in tools like make
and rsync
. With its reappearance in the 6.13 merge window, it is worth examining what the feature entails, the problems it aims to solve, and the challenges it encountered during earlier 6.6 merge window.
Inode Timestamps
Linux filesystems maintain a set of timestamps to track key events for each file:
Access Time (atime): Updated when the file is read.
Modification Time (mtime): Updated when the file’s contents are modified.
Change Time (ctime): Updated when the file’s metadata (such as permissions or ownership) changes.
These timestamps are stored in the file’s inode. One notable behavior is that any mtime
update also implicitly triggers a ctime
update. However, the resolution of these timestamps is coarse—typically at the granularity of a jiffy (around milliseconds). While this level of precision suffices for most applications, it presents challenges for certain filesystems and use cases.
Problems with existing coarse-grained timestamps
Network File System (NFS), especially versions like NFSv3, is one area where this limitation becomes problematic. When a server experiences frequent file updates within a single jiffy, the client cannot reliably determine whether its cached file contents have become stale. Modern NFS implementations aim to cache file contents more aggressively to improve performance, but this requires accurate mechanisms to invalidate stale data. Since NFS clients rely on mtime
and ctime
comparisons to detect changes on the server side, coarse-grained timestamps hinder their effectiveness. Similar issues affect backup applications like rsync
, which also depend on precise timestamps to detect file modifications.
A natural question arises: why not simply switch to higher-resolution timestamps across the board?
The answer lies in filesystem performance. Updates to mtime
and ctime
involve changes to inode metadata. If every read or write operation triggered frequent, fine-grained timestamp updates, it would significantly increase the volume of metadata writes. These metadata updates are often journalled to ensure filesystem integrity, adding further overhead. This tradeoff explains the kernel's reliance on coarse-grained timestamps by default: they strike a balance between functionality and performance.
For example, the on-disk structure of ext4
inodes (struct ext4_inode
) reflects this design, with dedicated fields for atime
, mtime
, and ctime
timestamps.
/*
* Structure of an inode on the disk
*/
struct ext4_inode {
__le16 i_mode; /* File mode */
__le16 i_uid; /* Low 16 bits of Owner Uid */
__le32 i_size_lo; /* Size in bytes */
__le32 i_atime; /* Access time */
__le32 i_ctime; /* Inode Change time */
__le32 i_mtime; /* Modification time */
<...>
};
Coarse-grained timestamps strike a balance by reducing the frequency of metadata updates, thereby improving performance. However, as discussed NFS and certain applications require more finer grained timestamp updates. This is where the multi-grain timestamp updates can be helpful.
Multi-grain Timestamp
The feature addresses this limitation by dynamically adjusting the resolution of timestamps. When an inode’s attributes are being actively observed via ->getattr()
, the kernel uses a higher-resolution timestamp for mtime
and ctime
. For inodes that are not being actively monitored, coarse-grained timestamps remain in use. This adaptive approach provides finer granularity where needed while preserving the performance benefits of coarser timestamps for other cases.
Problems with multi-grain 6.6 implementation
The initial implementation, however, exposed a significant problem. If two files, f1
and f2
, are modified in close succession, but f1
receives a fine-grained timestamp while f2
is updated with a coarse-grained timestamp, the result could imply that f2
was modified before f1
, violating the VFS ordering guarantees. This issue was one of the primary reasons for reverting the feature during the 6.6 merge window.
6.13 multi-grain timestamp fix and filesystem documentation
After discussions at LSFMM 2024, Jeff revisited the feature and proposed a rather simple fix to the problem. Christian Brauner described the fix in the pull request as:
To prevent this, a floor value is maintained for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when later coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead.
This approach preserves the integrity of VFS ordering guarantees while allowing the kernel to use multi-grain timestamps effectively. Jeff Layton has since converted most major filesystems to support the feature.
The documentation outlines how filesystems can opt into multi-grain timestamps with minimal changes. i.e.
For most filesystems, it's sufficient to just set the
FS_MGTIME
flag in thefstype->fs_flags
in order to opt-in, providing the ctime is only ever set viainode_set_ctime_current()
. If the filesystem has a->getattr
routine that doesn't call generic_fillattr, then it should callfill_mg_cmtime()
to fill those values. Forsetattr
, it should usesetattr_copy()
to update the timestamps, or otherwise mimic its behavior.
Conclusion
The inclusion of multi-grain timestamps in kernel 6.13 is a significant milestone for Linux filesystem community. For filesystems like NFS, this feature will enable more efficient caching behavior, improving performance and addressing long-standing issues with timestamp granularity.
References:
[1]: https://lore.kernel.org/all/20241115-vfs-mgtime-1dd54cc6d322@brauner/
[2]: https://lwn.net/Articles/975863/
[3]: https://lwn.net/Articles/946394/
[4]: https://lore.kernel.org/all/20240711-mgtime-v5-0-37bb5b465feb@kernel.org/
[5]: https://www.kernel.org/doc/html/next/filesystems/multigrain-ts.htm