Lab: mmap

In this lab you will use mmap on Linux to demand-page a very large table and add memory-mapped files to xv6.

Using mmap on Linux

This assignment will make you more familiar with how to manage virtual memory in user programs using the Unix system call interface. You can do this assignment on any operating system that supports the Unix API (a Linux Athena machine, your laptop with Linux or MacOS, etc.).

Download the mmap homework assignment and look it over. The program maintains a very large table of square root values in virtual memory. However, the table is too large to fit in physical RAM. Instead, the square root values should be computed on demand in response to page faults that occur in the table's address range. Your job is to implement the demand faulting mechanism using a signal handler and UNIX memory mapping system calls. To stay within the physical RAM limit, we suggest using the simple strategy of unmapping the last page whenever a new page is faulted in.

To compile mmap.c, you need a C compiler, such as gcc. On Athena, you can type:

$ add gnu

Once you have gcc, you can compile mmap.c as follows:

$ gcc mmap.c -lm -o mmap

Which produces a mmap file, which you can run:

$ ./mmap
page_size is 4096
Validating square root table contents...
oops got SIGSEGV at 0x7f6bf7fd7f18

When the process accesses the square root table, the mapping does not exist and the kernel passes control to the signal handler code in handle_sigsegv(). Modify the code in handle_sigsegv() to map in a page at the faulting address, unmap a previous page to stay within the physical memory limit, and initialize the new page with the correct square root values. Use the function calculate_sqrts() to compute the values. The program includes test logic that verifies if the contents of the square root table are correct. When you have completed your task successfully, the process will print “All tests passed!”.

You may find that the man pages for mmap() and munmap() are helpful references.

$ man mmap
$ man munmap

Implement memory-mapped files in xv6

In this assignment you will implement memory-mapped files in xv6. The test program mmaptest tells you what should work.

Here are some hints about how you might go about this assignment:

Start with adding the two systems calls to the kernel, as you done for other systems calls (e.g., sigalarm), but don't implement them yet; just return an error. run mmaptest to observe the error.
Keep track for each process what mmap has mapped. You will need to allocate a struct vma to record the address, length, permissions, etc. for each virtual memory area (VMA) that maps a file. Since the xv6 kernel doesn't have a memory allocator in the kernel, you can use the same approach has for struct file: have a global array of struct vmas and have for each process a fixed-sized array of VMAs (like the file descriptor array).
Implement mmap: allocate a VMA, add it to the process's table of VMAs, fill in the VMA, and find a hole in the process's address space where you will map the file. You can assume that no file will be bigger than 1GB. The VMA will contain a pointer to a struct file for the file being mapped; you will need to increase the file's reference count so that the structure doesn't disappear when the file is closed (hint: see filedup). You don't have worry about overlapping VMAs. Run mmaptest: the first mmap should succeed, but the first access to the mmaped- memory will fail, because you haven't updated the page fault handler.
Modify the page-fault handler from the lazy-allocation and COW labs to call a VMA function that handles page faults in VMAs. This function allocates a page, reads a 4KB from the mmap-ed file into the page, and maps the page into the address space of the process. To read the page, you can use readi, which allows you to specify an offset from where to read in the file (but you will have to lock/unlock the inode passed to readi). Don't forget to set the permissions correctly on the page. Run mmaptest; you should get to the first munmap.
Implement munmap: find the struct vma for the address and unmap the specified pages (hint: use uvmunmap). If munmap removes all pages from a VMA, you will have to free the VMA (don't forget to decrement the reference count of the VMA's struct file); otherwise, you may have to shrink the VMA. You can assume that munmap will not split a VMA into two VMAs; that is, we don't unmap a few pages in the middle of a VMA. If an unmapped page has been modified and the file is mapped MAP_SHARED, you will have to write the page back to the file. RISC-V has a dirty bit (D) in a PTE to record whether a page has ever been written too; add the declaration to kernel/riscv.h and use it. Modify exit to call munmap for the process's open VMAs. Run mmaptest; you should mmaptest, but probably not forktest.
Modify fork to copy VMAs from parent to child. Don't forget to increment reference count for a VMA's struct file. In the page fault handler of the child, it is OK to allocate a new page instead of sharing the page with the parent. The latter would be cooler, but it would require more implementation work. Run mmaptest; make sure you pass both mmaptest and forktest.

Run usertests to make sure you didn't break anything.

Optional challenges:

If two processes have the same file mmap-ed (as in forktest), share their physical pages. You will need reference counts on physical pages.
The solution above allocates a new physical page for each page read from the mmap-ed file, even though the data is also in kernel memory in the buffer cache. Modify your implementation to mmap that memory, instead of allocating a new page. This requires that file blocks be the same size as pages (set BSIZE to 4096). You will need to pin mmap-ed blocks into the buffer cache. You will need worry about reference counts.
Remove redundancy between your implementation for lazy allocation and your implementation of mmapp-ed files. (Hint: create an VMA for the lazy allocation area.)
Modify exec to use a VMA for different sections of the binary so that you get on-demand-paged executables. This will make starting programs faster, because exec will not have to read any data from the file system.
Implement on-demand paging: don't keep a process in memory, but let the kernel move some parts of processes to disk when physical memory is low. Then, page in the paged-out memory when the process references it. Port your linux program from the first assignment to xv6 and run it.