Log File System
Intro
Optimized for writing, because we have large memory , the read will be cached. The disk traffic will become dominated by writes.
A log-structured file system writes all new information to disk in a sequential structure called the log.
Workload
Small files accesses.
Sync / Async
UFS writes file blocks asynchronously, but file system metadata structures such as directories and inodes are written synchronously. For small file workloads, the disk traffic is dominated by the synchronous metadata writes.
Core Idea
LFS improves write performance by buffering a sequence of file system changes in the file cache and then writing all the changes to disk sequentially in a single disk write operation. It converts many small synchronous random writes of traditional file systems into large asynchronous sequential transfers.
Inode map
The inodes of LFS are not placed at fixed positions like FFS. There is an inode map to maintain the current location of each inode.
Free space management: segments
The free space will be fragmented into many small extents corresponding to the files that were deleted or overwritten.
To solve this, there are two ways.
- Threading: Leave the live data in place and thread the log through the free extents.
- Copying: Copy live data out of the log in order to leave large free extents for writing.
LFS combines the two. The disk is divided into large fixed-size extents called segments. Any given segment is always written sequentially from its beginning to its end, and all live data must be copied out of a segment before the segment can be rewritten.
Segment cleaning
Mechanism
Read a number of segments into memory, identify the live data, and write the live data back to a smaller number of clean segments.
To identify which blocks of each segment are live, a segment summary block is added as part of each segment. The segment summary block contains the file number and block number for the block.
No free-block list or bitmap is needed in Sprite LFS.
Policy
- When should the segment cleaner execute?
- How many segments should it clean at a time?
- Which segments should be cleaned?
- How should the live blocks be grouped when they are written out?
We use write cost to compare cleaning policies. 1.0 is perfect, 10 means that only 1/10 of the disk's maximum bandwidth is actually used for writing new data.
Trade-off between performance and space utilization. It is not unique to log structured file system.
Greedy policy
Always choose the least-utilized segments to clean. When writing out live data the cleaner did not attempt to re-organize the data.
Greedy policy + age sort
Live blocks are sorted by age before writing them out again. This means that long lived (cold) blocks tend to be segregated in different segments from short lived (hot) data.
Benefit-cost policy:
$benefit / cost = free space generated * age of data / cost$
The logic behind is that hot and cold segments must be treated differently by the cleaner. Free space in a cold segment is more valuable than free space in a hot segment because once a cold segment has been cleaned it will take a long time before it re-accumulates the unusable free space.
In order to support this cost-benefit cleaning policy, LFS maintains a data structure called the segment usage table. For each segment, the table records the number of live bytes in the segment and the most recent modified time of any block in the segment.
Crash recovery
Checkpoints
A checkpoint is a position in the log at which all of the file system structures are consistent and complete.
LFS uses a two-phase process to create a checkpoint. First, it writes out all modified information to the log. Second, it writes a checkpoint region to a special fixed position on disk. The checkpoint region contains the address of all the blocks in the inode map and segment usage table, plus the current time and a pointer to the last segment written.
Roll-forward
After crashes, LFS scans through the log segments that were written after the last checkpoint. This is called roll-forward.