TCLocks repository contains the source code of TCLocks and the benchmarks we used in the paper. This is the directory structure of the TCLocks repo.
TCLocks
├── src : TCLocks source code
| ├── kernel
| | ├── linux-5.14.16 : Kernel with TCLocks
| | ├── rcuht : Hash-table nano-benchmark
| └── userspace : Userspace implementation of TCLocks
| | ├── litl
| └── defaults.sh : Default parameters used for benchmarks
| ├── benchmarks : benchmark sets used in the paper
| | ├── will-it-scale
| | ├── fxmark
| | ├── vbench
| | ├── leveldb-1.20
└── scripts : script to run experiments
Different branches contain the source code of Linux 5.14.16 with different locks.
The experiments in this artifact are designed to run on a machine with multiple sockets and tens of CPUs. The result shown in the paper is evaluated on a 8-socket, 224-core machine equipped with Intel Xeon Platinum 8276L CPUs. The machine runs Ubuntu 20.04 with Linux 5.4.0 and hyperthreading is disabled.
The easiest way to reproduce TCLocks is to use QEMU with our ready-to-use disk image. In this guide, we introduce how to quickly setup an environment to run TCLocks using the disk image. There is also a section guides how to create the disk image from scratch. Until otherwise noted, all the steps are executed on the host machine.
Clone the TCLocks repo.
$ git clone https://github.com/rs3lab/TCLocks.git
$ cd TCLocks
Download the compresssed disk image here. Once you finish downloading the file, uncompress it using following command and move it to the scripts repo.
$ wget https://zenodo.org/record/7860633/files/tclocks-vm.img.gz?download=1
$ sudo apt install pigz
$ unpigz tclocks-vm.img.gz
$ mv tclocks-vm.img TCLocks/scripts/
Install gcc-9. Add the following line to /etc/apt/sources.list
$ "deb [arch=amd64] http://archive.ubuntu.com/ubuntu focal main universe"
Then install gcc using the following command :
$ sudo apt update && sudo apt install gcc-9
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 40
Choose gcc-9 when running the following command :
$ sudo update-alternatives --config gcc
Check if gcc version is correct :
$ gcc -v
Install tools required to build a kernel.
$ sudo apt-get install build-essential libncurses5 libncurses5-dev bin86 \
kernel-package libssl-dev bison flex libelf-dev
The following script builds bzImage for kernels for all four locks: stock, cna, shfllock and tclocks
$ cd TCLocks/scripts/
$ ./build-all-kernel.sh
Start qemu with script under TCLocks/scripts/run-vm.sh
.
The command starts a virtual machine with 128G of memory and 8 NUMA sockets each
equipped with 28 cores, results in 224 cores in total.
Adjust the number of cores and sockets based on the host machine.
Its preferable to have the same configuration as the host machine.
Update the path to the vm image for your environment.
The script opens port 4444
for ssh and 5555
for qmp.
The guest will be start with default 5.14.16-stock
kernel.
The provided disk image contains one 50GB partition holding Ubuntu 20.04 and TCLocks repo.
There is single user ubuntu
with password ubuntu
, who has sudo power.
Use port 4444 to ssh into the machine.
$ ssh ubuntu@localhost -p 4444
The home directory already contains TCLocks
repo.
Script provided for running experiments require passwordless-ssh to the virtual machine. You can set it up using the following command :
$ ssh-keygen -t ed25519
Assuming the public key is in ~/.ssh/id_ed25519.pub :
$ ssh-copy-id -i ~/.ssh/id_ed25519.pub -p 4444 ubuntu@localhost
The port 5555 is for qmp
which allows us to observe NUMA effect with vCPU by
pinning each vCPU to physical cores. This step must be done before measuring numbers.
This step is automatically done by the included experiment script.
Run the pin-vcpu.py
script to pin the cores. Here, num_vm_cores
is 224 with
above example.
Install qmp
and psutils
on the host machine to pin the vCPUs.
$ pip install qmp
$ pip install psutils
$ sudo ./TCLocks/scripts/pin-vcpu.py 5555 <num_vm_cores>
Once you start the VM, let’s check you’re on the right kernel version.
(guest)$ uname -r
If you see 5.14.16-stock
, you’re all set and now it’s time to use TCLocks!
SSH into the VM and update the defaults.sh file in the ~/TCLocks/src
directory.
cores
and mutex_cores
to 1.python_env_cores
to 28.runtime
to 1.Shutdown the VM and execute in the host machine:
$ cd TCLocks/scripts
$ ./run-all.sh
Main scripts are under ./TCLocks/scripts/
. You can run all of the steps below using :
$ cd TCLocks/scripts
$ ./run-all.sh
Before running the experiments, SSH into the VM and update the defaults.sh file in the ~/TCLocks/src
directory.
Set the cores
and python_env_cores
to a list of CPUs upto the maximum number of CPUs in the VM.
For example, if the VM has 28 cores.
cores=(1 2 4 8 12 16 20 28) python_env_cores=’[1,2,4,8,12,16,20,28]’
Set the mutex_cores
to a list of CPUs upto 4x the maximum number of CPUs in the VM.
For oversubscription ( > maximum number of CPUs ), a few core counts are enough to validate.
For example, if the VM has 28 cores.
mutex_cores=(1 2 4 8 12 16 20 28 56 84 112)
Set the ncores
to the maximum number of CPUs in the VM.
ncores=28
Set the runtime
to 30 seconds.
runtime=30
(Figure 6)
$ cd TCLocks/scripts
$ ./run-micro-benchmark.sh
Expected Results:
(Figure 7)
$ cd TCLocks/scripts
$ ./run-macro-benchmark.sh
Expected Results:
(Figure 8)
$ cd TCLocks/scripts
$ ./run-nano-benchmark.sh
Expected Results:
(Figure 9)
$ cd TCLocks/scripts
$ ./run-userspace-benchmark.sh
Expected Results:
Copy the results folder (‘doc’) from the VM to the host machine.
$ cd TCLocks/scripts
$ ./run-vm.sh &
$ sleep 60
$ scp -r -P 4444 ubuntu@localhost:~/TCLocks/doc ../
$ sudo pkill -9 qemu
Each lock design will have a folder in the will-it-scale directory.
$ cd TCLocks/doc/results/will-it-scale
$ ls
$ cna-224 shfllock-224 stock-224 tclocks-224
Each benchmark will have a log file.
$ cd stock-224
$ lock1.log mmap1.log
For comparing lock1 performance, checkout the second column in the lock1.log
file for each of the lock design. You can also use the following command for comparision:
$ vimdiff stock-224/lock1.log cna-224/lock1.log shfllock-224/lock1.log tclocks-224/lock1.log
For each of the lock design, change the tmpfs
output to its corresponding lock output and create a single fxmark.log
file.
$ cd TCLocks/doc/results/fxmark
$ sed 's/tmpfs/stock/' stock/fxmark.log
$ sed 's/tmpfs/cna/' cna/fxmark.log
$ sed 's/tmpfs/shfllock/' shfllock/fxmark.log
$ sed 's/tmpfs/tclocks/' tclocks/fxmark.log
$ cat stock/fxmark.log > fxmark.log
$ cat cna/fxmark.log >> fxmark.log
$ cat shfllock/fxmark.log >> fxmark.log
$ cat tclocks/fxmark.log >> fxmark.log
Run the plotter file to extract the results.
$ cd src/benchmarks/fxmark/bin/
$ mkdir output
$ ./plotter.py --log=../../../../doc/results/fxmark/fxmark.log --ty=sc --out=./output
The output
directory will contain extracted results.
$ ls output
$ mem:cna:MRDM:directio.dat mem:stock:MRDM:directio.dat sc.gp
$ mem:cna:MWRM:directio.dat mem:stock:MWRM:directio.dat sc.pdf
$ mem:shfllock:MRDM:directio.dat mem:tclocks:MRDM:directio.dat
$ mem:shfllock:MWRM:directio.dat mem:tclocks:MWRM:directio.dat
For comparing MRDM
performance , you can use the following command:
$ cd output
$ vimdiff mem:stock:MRDM:directio.dat mem:cna:MRDM:directio.dat mem:shfllock:MRDM:directio.dat mem:tclocks:MRDM:directio.dat
Macro-benchmarks results will be present in the VM. For comparision, ssh into the VM and check the vbench
directory.
$ cd ~/TCLocks/src/benchmarks/vbench/
In the results
directory, every folder will be timestamped by the date and time. Each folder will corresponds to one lock desing and one benchmark. To figure out which lock design, use the following command:
$ cat results/<DATE-TIME>/benchmark-psearchy/cores-1/HostInfo.host-10.0.2.15.uname
$ Linux cloudimg 5.14.16-stock #3 SMP Wed Apr 26 01:16:25 CEST 2023 x86_64 x86_64 x86_64 GNU/Linux
You can extract the results using the following script:
$ sudo ./graph results/20230426-051646/benchmark-psearchy
You can compare the results for evaluated benchmarks (psearch
and metis
) and for evaluated lock designs (stock
, cna
, shfllock
and tclocks
)
$ cd TCLocks/scripts
$ ./run-parse-script.sh
Figure 7(a) Spinlock -> doc/results/results-spinlock-224cores-30seconds/1024buckets-4096entries/spinlock-100-percent-writes.csv
Figure 7(c) Mutex -> doc/results/results-mutex-56cores-30seconds/1024buckets-4096entries/mutex-100-percent-writes.csv
Figure 7(d) RwSem (1% writes) -> doc/results/results-rwsem-56cores-30seconds/1024buckets-4096entries/rwsem-1-percent-writes.csv
Figure 7(e) RwSem (20% writes) -> doc/results/results-rwsem-56cores-30seconds/1024buckets-4096entries/rwsem-20-percent-writes.csv
Figure 7(f) Optimization -> doc/results/results-spinlock-224cores-30seconds/1024buckets-4096entries/spinlock-optimization-100-percent-writes.csv
Figure 7(g) Prefetching -> doc/results/results-spinlock-224cores-30seconds/1024buckets-4096entries/spinlock-prefetch-100-percent-writes.csv Figure 7(g) Batch Size -> doc/results/results-spinlock-224cores-30seconds/1024buckets-4096entries/spinlock-batch-size-100-percent-writes.csv
For each lock design, there will be a folder containing results for each core count. The last line in each file contains the results. For comparision, use the first value (micros/ops). Lower is better.
First, download a ubuntu 20.04 LTS image (link).
$ wget https://releases.ubuntu.com/20.04/ubuntu-20.04.4-live-server-amd64.iso
Create a storage image.
$ qemu-img create ubuntu-20.04.img 50G
Start QEMU and use the downloaded iso image as a booting disk.
$ ./qemu-system-x86_64 \
--enable-kvm \
-m 128G \
-cpu host \
-smp cores=28,threads=1,sockets=8 \
-numa node,nodeid=0,mem=16G,cpus=0-27 \
-numa node,nodeid=1,mem=16G,cpus=28-55 \
-numa node,nodeid=2,mem=16G,cpus=56-83 \
-numa node,nodeid=3,mem=16G,cpus=84-111 \
-numa node,nodeid=4,mem=16G,cpus=112-139 \
-numa node,nodeid=5,mem=16G,cpus=140-167 \
-numa node,nodeid=6,mem=16G,cpus=168-195 \
-numa node,nodeid=7,mem=16G,cpus=196-223 \
-drive file=/path/to/created/ubuntu-20.04.img,format=raw \
-cdrom /path/to/downloaded/ubuntu-20.04.4-live-server-amd64.iso \
If X11 connection is there, you’ll see the QEMU GUI popup window to install ubuntu. Install ubuntu server with OpenSSH package and disabled LVM. Then login to the installed user. If you don’t have X11 connection, please refer this link to setup the image.
Open /etc/default/grub
and update GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0"
.
This will print initial booting messages to the console on the start of the
guest vm.
Then, run the following commands to apply the change and shutdown the guest
machine.
(guest)$ sudo update-grub
(guest)$ sudo shutdown -h now
Now, you can start your QEMU without the iso file and graphic.
$ ./qemu-system-x86_64 \
--enable-kvm \
-m 128G \
-cpu host \
-smp cores=28,threads=1,sockets=8 \
-numa node,nodeid=0,mem=16G,cpus=0-27 \
-numa node,nodeid=1,mem=16G,cpus=28-55 \
-numa node,nodeid=2,mem=16G,cpus=56-83 \
-numa node,nodeid=3,mem=16G,cpus=84-111 \
-numa node,nodeid=4,mem=16G,cpus=112-139 \
-numa node,nodeid=5,mem=16G,cpus=140-167 \
-numa node,nodeid=6,mem=16G,cpus=168-195 \
-numa node,nodeid=7,mem=16G,cpus=196-223 \
-drive file=/path/to/created/ubuntu-20.04.img,format=raw \
-nographic \
-overcommit mem-lock=off \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=pci.0,addr=0x3 \
-netdev user,id=hostnet0,hostfwd=tcp::4444-:22 \
-qmp tcp:127.0.0.1:5555,server,nowait \
To use custom kernel, you have two options.
First, compile the kernel and install it within the guest vm.
Or, compile the kernel in the host machine and pass it to the qemu via -kernel
option.
The second option is more convenient for frequent kernel changes, but it still requires one time static install for kernel modules. TCLocks-linux has three lock implementation all based on linux kernel v5.14.16, so you can reuse kernel modules across the three branches (stock, cna, shfllock) once installed.
To build and install a kernel inside the vm, first clone the TCLocks-linux repo and resolve dependencies.
(guest)$ git clone -b stock https://github.com/rs3lab/TCLocks.git
(guest)$ cd TCLocks
[Dependency] You may need following commands to build linux
(guest) $ sudo apt-get install build-essential libncurses5 libncurses5-dev bin86 \ kernel-package libssl-dev bison flex libelf-dev
[Dependency] In addition, please make sure you’re using gcc-9.
(guest) $ gcc -v ... gcc version 9.5.0
Before start compilation, please make sure CONFIG_PARAVIRT_SPINLOCKS
is not
set in your .config
file.
(guest)$ make -j <num_threads>
(guest)$ sudo make modules_install
(guest)$ sudo make install
(guest)$ sudo shutdown -h now # Install complete. Shut down guest machine
Now, in the host machine, you can choose a branch you want to use and then start a qemu with that lock implementation. Please make sure you’re using gcc-9 here too.
$ gcc -v
...
gcc version 9.5.0
$ git clone https://github.com/rs3lab/TCLocks.git
$ cd ~/TCLocks
$ git checkout -t origin/stock # or select origin/cna, origin/shfllock, origin/master (For TCLocks)
$ make -j <num_threads>
$ ./qemu-system-x86_64 \
--enable-kvm \
-m 128G \
-cpu host \
-smp cores=28,threads=1,sockets=8 \
-numa node,nodeid=0,mem=16G,cpus=0-27 \
-numa node,nodeid=1,mem=16G,cpus=28-55 \
-numa node,nodeid=2,mem=16G,cpus=56-83 \
-numa node,nodeid=3,mem=16G,cpus=84-111 \
-numa node,nodeid=4,mem=16G,cpus=112-139 \
-numa node,nodeid=5,mem=16G,cpus=140-167 \
-numa node,nodeid=6,mem=16G,cpus=168-195 \
-numa node,nodeid=7,mem=16G,cpus=196-223 \
-drive file=/path/to/created/ubuntu-20.04.img,format=raw \
-nographic \
-overcommit mem-lock=off \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=pci.0,addr=0x3 \
-netdev user,id=hostnet0,hostfwd=tcp::4444-:22 \
-qmp tcp:127.0.0.1:5555,server,nowait \
-kernel ~/TCLocks/src/kernel/linux-5.14.16/arch/x86/boot/bzImage \
-append "root=/dev/sda2 console=ttyS0" \
The uname -r
command confirms that the current guest VM is booted using the
custom kernel.
(guest)$ uname -r
5.14.16-stock
Inside the guest VM, clone the TCLocks repo
user
If you find an error like below :
$ qemu-system-x86_64: network backend 'user' is not compiled into this binary
You can recompile qemu with the slirp library Link.