q Lab: file system

Lab: file system

In this lab you will add large files and mmap to the xv6 file system.

Large files

In this assignment you'll increase the maximum size of an xv6 file. Currently xv6 files are limited to 268 blocks, or 268*BSIZE bytes (BSIZE is 1024 in xv6). This limit comes from the fact that an xv6 inode contains 12 "direct" block numbers and one "singly-indirect" block number, which refers to a block that holds up to 256 more block numbers, for a total of 12+256=268. You'll change the xv6 file system code to support a "doubly-indirect" block in each inode, containing 256 addresses of singly-indirect blocks, each of which can contain up to 256 addresses of data blocks. The result will be that a file will be able to consist of up to 256*256+256+11 blocks (11 instead of 12, because we will sacrifice one of the direct block numbers for the double-indirect block).

Preliminaries

Modify your Makefile's CPUS definition so that it reads:

CPUS := 1

XXX doesn't seem to speedup things

Add

QEMUEXTRA = -snapshot

right before QEMUOPTS

The above two steps speed up qemu tremendously when xv6 creates large files.

mkfs initializes the file system to have fewer than 1000 free data blocks, too few to show off the changes you'll make. Modify param.h to set FSSIZE to:

    #define FSSIZE       20000  // size of file system in blocks

Download big.c into your xv6 directory, add it to the UPROGS list, start up xv6, and run big. It creates as big a file as xv6 will let it, and reports the resulting size. It should say 140 sectors.

What to Look At

The format of an on-disk inode is defined by struct dinode in fs.h. You're particularly interested in NDIRECT, NINDIRECT, MAXFILE, and the addrs[] element of struct dinode. Look Figure 7.3 in the xv6 text for a diagram of the standard xv6 inode.

The code that finds a file's data on disk is in bmap() in fs.c. Have a look at it and make sure you understand what it's doing. bmap() is called both when reading and writing a file. When writing, bmap() allocates new blocks as needed to hold file content, as well as allocating an indirect block if needed to hold block addresses.

bmap() deals with two kinds of block numbers. The bn argument is a "logical block" -- a block number relative to the start of the file. The block numbers in ip->addrs[], and the argument to bread(), are disk block numbers. You can view bmap() as mapping a file's logical block numbers into disk block numbers.

Your Job

Modify bmap() so that it implements a doubly-indirect block, in addition to direct blocks and a singly-indirect block. You'll have to have only 11 direct blocks, rather than 12, to make room for your new doubly-indirect block; you're not allowed to change the size of an on-disk inode. The first 11 elements of ip->addrs[] should be direct blocks; the 12th should be a singly-indirect block (just like the current one); the 13th should be your new doubly-indirect block.

You don't have to modify xv6 to handle deletion of files with doubly-indirect blocks.

If all goes well, big will now report that it can write sectors. It will take big minutes to finish. XXX this runs for a while!

Hints

Make sure you understand bmap(). Write out a diagram of the relationships between ip->addrs[], the indirect block, the doubly-indirect block and the singly-indirect blocks it points to, and data blocks. Make sure you understand why adding a doubly-indirect block increases the maximum file size by 256*256 blocks (really -1), since you have to decrease the number of direct blocks by one).

Think about how you'll index the doubly-indirect block, and the indirect blocks it points to, with the logical block number.

If you change the definition of NDIRECT, you'll probably have to change the size of addrs[] in struct inode in file.h. Make sure that struct inode and struct dinode have the same number of elements in their addrs[] arrays.

If you change the definition of NDIRECT, make sure to create a new fs.img, since mkfs uses NDIRECT too to build the initial file systems. If you delete fs.img, make on Unix (not xv6) will build a new one for you.

If your file system gets into a bad state, perhaps by crashing, delete fs.img (do this from Unix, not xv6). make will build a new clean file system image for you.

Don't forget to brelse() each block that you bread().

You should allocate indirect blocks and doubly-indirect blocks only as needed, like the original bmap().

Memory-mapped files

In this assignment you will implement the core of the systems calls mmap and munmap; see the man pages for an explanation what they do (run man 2 mmap in your terminal). The test program mmaptest tells you what should work.

Here are some hints about how you might go about this assignment:

Start with adding the two systems calls to the kernel, as you done for other systems calls (e.g., sigalarm), but don't implement them yet; just return an error. run mmaptest to observe the error.
Keep track for each process what mmap has mapped. You will need to allocate a struct vma to record the address, length, permissions, etc. for each virtual memory area (VMA) that maps a file. Since the xv6 kernel doesn't have a memory allocator in the kernel, you can use the same approach has for struct file: have a global array of struct vmas and have for each process a fixed-sized array of VMAs (like the file descriptor array).
Implement mmap: allocate a VMA, add it to the process's table of VMAs, fill in the VMA, and find a hole in the process's address space where you will map the file. You can assume that no file will be bigger than 1GB. The VMA will contain a pointer to a struct file for the file being mapped; you will need to increase the file's reference count so that the structure doesn't disappear when the file is closed (hint: see filedup). You don't have worry about overlapping VMAs. Run mmaptest: the first mmap should succeed, but the first access to the mmaped- memory will fail, because you haven't updated the page fault handler.
Modify the page-fault handler from the lazy-allocation and COW labs to call a VMA function that handles page faults in VMAs. This function allocates a page, reads a 4KB from the mmap-ed file into the page, and maps the page into the address space of the process. To read the page, you can use readi, which allows you to specify an offset from where to read in the file (but you will have to lock/unlock the inode passed to readi). Don't forget to set the permissions correctly on the page. Run mmaptest; you should get to the first munmap.
Implement munmap: find the struct vma for the address and unmap the specified pages (hint: use uvmunmap). If munmap removes all pages from a VMA, you will have to free the VMA (don't forget to decrement the reference count of the VMA's struct file); otherwise, you may have to shrink the VMA. You can assume that munmap will not split a VMA into two VMAs; that is, we don't unmap a few pages in the middle of a VMA. If an unmapped page has been modified and the file is mapped MAP_SHARED, you will have to write the page back to the file. RISC-V has a dirty bit (D) in a PTE to record whether a page has ever been written too; add the declaration to kernel/riscv.h and use it. Modify exit to call munmap for the process's open VMAs. Run mmaptest; you should mmaptest, but probably not forktest.
Modify fork to copy VMAs from parent to child. Don't forget to increment reference count for a VMA's struct file. In the page fault handler of the child, it is OK to allocate a new page instead of sharing the page with the parent. The latter would be cooler, but it would require more implementation work. Run mmaptest; make sure you pass both mmaptest and forktest.

Run usertests to make sure you didn't break anything.

Optional challenges:

If two processes have the same file mmap-ed (as in forktest), share their physical pages. You will need reference counts on physical pages.
The solution above allocates a new physical page for each page read from the mmap-ed file, even though the data is also in kernel memory in the buffer cache. Modify your implementation to mmap that memory, instead of allocating a new page. This requires that file blocks be the same size as pages (set BSIZE to 4096). You will need to pin mmap-ed blocks into the buffer cache. You will need worry about reference counts.
Remove redundancy between your implementation for lazy allocation and your implementation of mmapp-ed files. (Hint: create an VMA for the lazy allocation area.)
Modify exec to use a VMA for different sections of the binary so that you get on-demand-paged executables. This will make starting programs faster, because exec will not have to read any data from the file system.
Implement on-demand paging: don't keep a process in memory, but let the kernel move some parts of processes to disk when physical memory is low. Then, page in the paged-out memory when the process references it.