Pages Project
For this project you will use pseudo files from Linux’s /proc
filesystem to determine the current virtual to physical page mapping for the stack and the heap of a process.
Development Environment
You will write 3 C programs for this project. While it’s possible to edit the C files within the host system and then copy them to the VM, it will be easier to write your code within an editor within the VM.
The first editor I used in class is called gedit, and you can access it in Ubuntu by clicking on the icon in the upper left and searching for “text editor”. You can launch the terminal, and any other applications, through this interface as well. Other Linux distributions will likely come with an editor as well.
You can clone the project directly into your VM with git. It’s likely that your VM came with git already installed. If not, you can install it from the terminal. In Ubuntu and other Debian-based distributions, you can install software with the command apt-get
. You need to run apt-get
as root (the admin user) to install software. You can run a command as root using the sudo
command:
sudo apt-get install git
Other distributions will have similar means of installing applications.
You will also probably already have the GCC compiler installed. Try running gcc
in the terminal. If it says the command is not found, you will need to install it as well. On Ubuntu and Debian you can install a bundle of development tools that includes GCC like so:
sudo apt-get install build-essential
Once you have everything installed you can edit in your editor and compile and run your programs in the terminal. Here is how to compile the program in stack_allocate.c
to the executable stack_allocate
:
gcc stack_allocate.c -o stack_allocate
You can use the up and down arrows in the terminal to cycle through your command history so you don’t have to type the entire command every time you want to compile. To then run the program stack_allocate
:
./stack_allocate
To read documentation about different functions you can use the man
command. To pull up documentation about fread()
, do this:
man fread
To quit the man
viewer, press q
. To search through a man page, press /
, type what you want to search for, and press enter. Skip to the next searh result with n
.
Of course you can refer to online documentation about these functions as well, but if you just need a quick lookup to remind yourself what the parameters for a specific function are, it’s often faster to use the command line.
/proc
Files
Pseudo files that expose information about a specific process are stored in /proc/<pid>/
where <pid>
is the process ID (PID) of the process we care about. For example, /proc/582/
contains information about the process with the PID 582.
The maps
Pseudo File
Reading /proc/<pid>/maps
gives you the current ranges of virtual addresses that are mapped for that process. Here is an example:
00400000-00401000 r-xp 00000000 00:30 29253 /home/WOOAD/nsommer/maps
00600000-00601000 r--p 00000000 00:30 29253 /home/WOOAD/nsommer/maps
00601000-00602000 rw-p 00001000 00:30 29253 /home/WOOAD/nsommer/maps
0216d000-0218e000 rw-p 00000000 00:00 0 [heap]
7f181bd65000-7f181bf03000 r-xp 00000000 00:20 245330 /lib64/libc-2.19.so
7f181bf03000-7f181c103000 ---p 0019e000 00:20 245330 /lib64/libc-2.19.so
7f181c103000-7f181c107000 r--p 0019e000 00:20 245330 /lib64/libc-2.19.so
7f181c107000-7f181c109000 rw-p 001a2000 00:20 245330 /lib64/libc-2.19.so
7f181c109000-7f181c10d000 rw-p 00000000 00:00 0
7f181c10d000-7f181c12e000 r-xp 00000000 00:20 245322 /lib64/ld-2.19.so
7f181c312000-7f181c315000 rw-p 00000000 00:00 0
7f181c32b000-7f181c32d000 rw-p 00000000 00:00 0
7f181c32d000-7f181c32e000 r--p 00020000 00:20 245322 /lib64/ld-2.19.so
7f181c32e000-7f181c32f000 rw-p 00021000 00:20 245322 /lib64/ld-2.19.so
7f181c32f000-7f181c330000 rw-p 00000000 00:00 0
7ffe195dd000-7ffe195fe000 rw-p 00000000 00:00 0 [stack]
7ffe195fe000-7ffe19600000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
The only lines that we care about for this project are the [stack]
and [heap]
lines. In the example above, the virtual address range 7ffe195dd000-7ffe195fe000
is currently reserved for the stack. The first address in the range is the first address for the stack and the second address is the first byte past the stack addresses.
The pagemap
Pseudo File
The pseudo file /proc/<pid>/pagemap
is used to get the current virtual to physical page mapping for a specific page. The information for each virtual page is packed into 8 bytes (64 bits). To read the mapping information for a specific virtual page you must seek to byte virtual_page * 8
For example, to access the information about virtual page 256
, you need to seek to position 256 * 8
in the file and then read the next 64 bits.
For certain versions of the kernel this pseudo file must be read as root or it will always give you a physical page frame number of 0.
More information about the structure of the information is here: https://www.kernel.org/doc/Documentation/vm/pagemap.txt
Memory Allocating Programs
To help test your pages.c
program, write programs in stack_allocate.c
and heap_allocate.c
. Both of these take a single command line argument which is the number of pages worth of memory to allocate. For example, if your page size is 4 KB and you run stack_allocate
like this:
./stack_allocate 2
Then the program must allocate 8 KB of memory on the stack by creating an array.
heap_allocate
must work in a similar way, but allocate memory on the heap rather than on the stack. I have found that you need to malloc()
page-by-page in order to get the results we expect. That is, if you run heap_allocate
like this:
./heap_allocate 2
You should allocate 2 buffers, each getpagesize()
bytes large.
Recall that pages have a “present” bit, which is 1 if the page is in main memory and the page table has a mapping for that page. The pages won’t have their present bits set until you access them. Have the program write something to the array or buffer. Write to at least 1 element per page so they are all present.
Have stack_allocate
and heap_allocate
print out their PID so that you can then run pages
with that PID and ensure that you see the appropriate number of mapped pages. End both of these programs with a call to getchar()
so that they wait until you press enter to exit.
The pages
Program
This program must print the current virtual to physical page mappings for the stack and the heap of a process. If the program is run like this:
sudo ./pages
it must print out the mappings for itself. If given a PID like so:
sudo ./pages 5329
it must print out the mappings for the process with that PID. If given a PID that does not exist, exit gracefully.
For each virtual page number in each range, check to see if the present bit is set. If not, skip the page. If it is set, output the virtual page number and the physical page number, or "swapped"
if the physical page is swapped out to disk.
Here is some example output:
Heap starting at 0x1b1f000, ending at 0x1b40000
0x1b1f -> 0x51e41
0x1b20 -> 0x51e3f
Stack starting at 0x7ffe5f3b4000, ending at 0x7ffe5f3d5000
0x7ffe5f3d3 -> 0x51e51
0x7ffe5f3d4 -> 0x86337
C Suggestions
- Use
getpid()
to get your current PID - Use
snprintf()
to insert the PID into the paths to the/proc
files - Use
getline()
to read lines of/proc/<pid>/maps
- Use
strstr()
to see if[heap]
or[stack]
are in each line - Use
sscanf()
to read the start and end of the address range on one of the lines. The format specifier%lx
will work to read a hex representation into auint64_t
- Open the
pagemap
file in"rb"
mode since you’ll be reading binary data rather than text - Use
fseek()
andfread()
to read data from/proc/<pid>/pagemap
- Read the
pagemap
information into auint64_t
so you are sure you have a 64-bit variable. This requiresstdint.h
- Use
getpagesize()
to get the page size of your system
Submission
Push all 3 of your programs to git-keeper. For full credit you must push some work on the project by the end of the day on Friday, November 10, and you must meet all of the following requirements:
stack_allocate.c
- Prints out its own PID
- Gets a positive integer
n
as a command line argument - Allocates
n * getpagesize()
bytes on the stack and ensures all the pages are present - Waits for input from the user before exiting
heap_allocate.c
- Prints out its own PID
- Gets a positive integer
n
as a command line argument - Allocates
n
buffers on the heap, each with a size ofgetpagesize()
, and ensures all the pages are present - Waits for input from the user before exiting
pages.c
- Gets a PID from the command line, or uses its own PID if none is provided.
- For each range of addresses marked
[heap]
in the process’smaps
file it prints out the address range and then prints out the virtual to physical page mapping for all pages from the range that are present in memory. - For the range of addresses marked
[stack]
in the process’smaps
file it prints out the address range and then prints out the virtual to physical page mapping for all pages from the range that are present in memory.