ref: 36e61f8fc87848e3ec686f1a6f5fd649c94dc8e1
dir: /sys/doc/fs/p6/
.SH Cache/WORM Driver .PP The cache/WORM (cw) driver is by far the largest and most complicated device driver in the file server. There are four devices involved in the cw driver. It implements a read/write pseudo-device (the cw-device) and a read-only pseudo-device (the dump device) by performing operations on its two constituent devices the read-write c-device and the write-once-read-many w-device. The block numbers on the four devices are distinct, although the .I cw addresses, dump addresses, and the .I w addresses are highly correlated. .PP The cw-driver uses the w-device as the stable storage of the file system at the time of the last dump. All newly written and a large number of recently used exact copies of blocks of the w-device are kept on the c-device. The c-device is much smaller than the w-device and so the subset of w-blocks that are kept on the c-device are mapped through a hash table kept on a partition of the c-device. .PP The map portion of the c-device consists of blocks of buckets of entries. The declarations follow. .Ex enum { BKPERBLK = 10, CEPERBK = (BUFSIZE - BKPERBLK*sizeof(Off)) / (sizeof(Centry)*BKPERBLK), }; .Ee .Ex typedef struct { ushort age; short state; Off waddr; } Centry; .Ee .Ex typedef struct { long agegen; Centry entry[CEPERBK]; } Bucket; .Ee .Ex Bucket bucket[BKPERBLK]; .Ee There is exactly one entry structure for each block in the data partition of the c-device. A bucket contains all of the w-addresses that have the same hash code. There are as many buckets as will fit in a block and enough blocks to have the required number of entries. The entries in the bucket are maintained in FIFO order with an age variable and an incrementing age generator. When the age generator is about to overflow, all of the ages in the bucket are rescaled from zero. .PP The following steps go into converting a w-address into a c-address. The bucket is found by .Ex bucket_number = w-address % total_buckets; getbuf(c-device, bucket_offset + bucket_number/BKPERBLK); .Ee After the desired bucket is found, the desired entry is found by a linear search within the bucket for the entry with the desired .CW waddr . .PP The state variable in the entry is one of the following. .Ex enum { Cnone = 0, Cdirty, Cdump, Cread, Cwrite, Cdump1, }; .Ee Every w-address has a state. Blocks that are not in the c-device have the implied state .CW Cnone . The .CW Cread state is for blocks that have the same data as the corresponding block in the w-device. Since the c-device is much faster than the w-device, .CW Cread blocks are kept as long as possible and used in preference to reading the w-device. .CW Cread blocks may be discarded from the c-device when the space is needed for newer data. The .CW Cwrite state is when the c-device contains newer data than the corresponding block on the w-device. This happens when a .CW Cnone , .CW Cread , or .CW Cwrite block is written. The .CW Cdirty state is when the c-device contains new data and the corresponding block on the w-device has never been written. This happens when a new block has been allocated from the free space on the w-device. .PP The .CW Cwrite and .CW Cdirty blocks are created and never removed. Unless something is done to convert these blocks, the c-device will gradually fill up and stop functioning. Once a day, or by command, a .I dump of the cw-device is taken. The purpose of a dump is to queue the writes that have been shunted to the c-device to be written to the w-device. Since the w-device is a WORM, blocks cannot be rewritten. Blocks that have already been written to the WORM must be relocated to the unused portion of the w-device. These are precisely the blocks with .CW Cwrite state. .PP The dump algorithm is as follows: .IP a) The tree on the cw-device is walked as long as the blocks visited have been modified since the last dump. These are the blocks with state .CW Cwrite and .CW Cdirty . It is possible to restrict the search to within these blocks since the directory containing a modified file must have been accessed to modify the file and accessing a directory will set its modified time thus causing the block containing it to be written. The directory containing that directory must be modified for the same reason. The tree walk is thus drastically restrained and the tree walk does not take much time. .IP b) All .CW Cwrite blocks found in the tree search are relocated to new blank blocks on the w-device and converted to .CW Cdump state. All .CW Cdirty blocks are converted to .CW Cdump state without relocation. At this point, all modified blocks in the cw-device have w-addresses that point to unwritten WORM blocks. These blocks are marked for later writing to the w-device with the state .CW Cdump . .IP c) All open files that were pointing to modified blocks are reopened to point at the corresponding reallocated blocks. This causes the directories leading to the open files to be modified. Thus the invariant discussed in a) is maintained. .IP d) The background dumping process will slowly go through the map of the c-device and write out all blocks with .CW Cdump state. .PP The dump takes a few minutes to walk the tree and mark the blocks. It can take hours to write the marked blocks to the WORM. If a marked block is rewritten before the old copy has been written to the WORM, it must be forced to the WORM before it is rewritten. There is no problem if another dump is taken before the first one is finished. The newly marked blocks are just added to the marked blocks left from the first dump. .PP If there is an error writing a marked block to the WORM then the .CW dump state is converted to .CW Cdump1 and manual intervention is needed. (See the .CW cwcmd .CW mvstate command in .I fs (8)). These blocks can be disposed of by converting their state back to .CW Cdump so that they will be written again. They can also be converted to .CW Cwrite state so that they will be allocated new addresses at the next dump. In most other respects, a .CW Cdump1 block behaves like a .CW Cwrite block.