Online Judge: Sandboxing [1]

Introduction

I always like to start introductions with problem statements because it’s always a good idea to know why we want to know about this and I suck at making appealing introductions.

For my final year project at Nanyang Technological University Singapore, I decided to make an online judge for the IE2108 Data Structures and Algorithms course.

For more context, our DSA course requires students to write Python code on paper. It was exhausting, and I’m pretty sure nobody bothered to check if your algorithm actually works.

An online judge is a web-based system that automatically checks the correctness of a user-submitted code for a certain problem by running it (the code) against a set of testcases and verifying the output. Online judges are used in coding contests (e.g. IOI, ACM ICPC) and universities.

Shouldn’t it be simple? I’d imagine we can just receive their submission, run python submission.py (for C++ maybe compile first using g++ -o submission submission.cpp then execute the binary ./submission), collect the output, and verify it. Unfortunately, it’s not that straightforward. When there are hundreds of code submissions launched at your system for you to execute, you must pray that none of them sends over a fork bomb, or a code that deletes your files, or a bitcoin mining process. Point is, allowing user-submitted code to be executed at your machine without any precautions opens a big can of vulnerabilities.

Without further ado, let’s discuss what vulnerabilities can be exploited to fully understand our position and what solutions we can consider.

Grader

The grader is a component of an online judge system that is responsible for executing user submissions against a suite of test cases and producing a verdict. It is critical for graders to be accurate, fair, and secure. While we hope that the users play nice, we cannot assume good faith. Therefore, we must assume that attacks against the grader is to be expected and must be mitigated.

Possible Attacks

Forišek (2006) categorized attacks to belong into one or more of these categories: denial of service (DoS), privileges escalation, destructive attack, and covert channel.^[1]

Denial-of-Service Attacks

Mareš (2021) described DoS attacks as one which the attacker tries to consume as much resources as possible, reducing availability of the system to other users.^[2] Such DoS attacks can occur at both compile time and at execution time.

Example of DoS at compilation time

#include "/dev/zero"

int main() {
    return 0;
}

The #include directive will tell the preprocessor to read the contents of /dev/null and insert it word for word into the source code before compilation. /dev/null happens to be a special file that produces an infinite stream of null bytes \0. See how that will take a really long time?

Example of DoS at execution time

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long x = 0;
    for(long long i = 0; i < 1e18; i++) {
        x++;
    }
    cout << x << "\n";

    return 0;
}

In competitive programming, I think the general rule is to assume that 1e9 operations take one second (at least that was what we were taught, we wasn’t taught specifics). So 1e18 operations is like 1e9 seconds, roughly 31.7 years.

#include <bits/stdc++.h>
using namespace std;

int main() {
    vector<char*> allocations;
    const size_t chunkSize = 1024 * 1024;

    while(true) {
        char *p = new(nothrow) char[chunkSize];
        if (!p) {
            break;
        }
        allocations.push_back(p);
    }

    return 0;
}

Allowing processes that will consume “infinite” resources is obviously not a good idea.

A fork bomb falls under this category.

Privileges Escalation

Privileges escalation occur when an attacker gains access to resources or actions beyond what they are permitted to.

For example, suppose you submit a correct solution to Two Sum (the hardest problem to have ever existed). Then I, a malicious user, submit an equally malicious code that locates where your submission is located and reads the file contents. If I am successful, I have just accessed data that I do not have the permissions to see. That would make you feel unfairly treated, no?

Destructive Attack

Forišek described destructive attacks to be attacks that “harm the system in some way, and thus disabling some of its functions or security measures”. Intentionally modifying or harming the test environment can be considered a destructive attack (think of a code that does rm -rf --no-preserve-root /). Another example is to generate a bunch of big files, causing the system to run out of disk space.

The paper also described the following scenario of modifying the test environment that isn’t destructive:

However, the attack does not always have to be destructive. We will give a concrete example of one attack that modified the testing environment in a clever way.

In the attacked system the contestant’s program was compiled and run on several test inputs. After each run, the output of the contestant’s program was compared to the correct output (using a file compare tool fc). If and only if for all outputs fc reported no differences, the submission was accepted. The attacker’s program did not produce any output. Instead, it created a new binary called fc. After the contestant’s program finished, the automated judge software called this binary instead of the original fc. The only things this binary did was to copy the correct output to the contestant’s output file, report that no differences were encountered, and delete itself. Note that at this moment the system was back in its original state, and all following submits were tested normally.

Covert Channel

A covert channel is an unintended way for two processes (or users) to communicate information in a system that is supposed to prevent that from happening.

I honestly cannot think of a way to exploit this on top of my head. Luckily, also in Forišek’s paper, an example code is given.

void sendInformation(int n) {
    // This abuses the memory size data returned by the grader
    malloc(1024 * (n & 31));

    // This abuses the verdict returned by the grader
    n >>= 5;
    if (n==0) exit(0); // exit 0
    if (n==1) assert(0); // abort
    if (n==2) while(1); // exceed the time limit
    if (n==3) { int a=3*n; a/=9-a; } // division by zero
}

The function above allows users to submit 7 bits of information by abusing the data returned by the grader. For example, if I want to send over 69 (which is 0b01000101 in binary), I will allocate 5120 (1024 * 0b00000101) bytes and abort using assert(0) (since n = 0b01).

The Solution

With those known possible exploits in mind, I propose a solution of having a sandboxed test environment that:

Runs processes as a non-root user.
Applies strict resource limits and accounting to processes.
Isolates files from other sandboxes.

Network is hardly ever required for solutions to programming problems. We can add a network lockdown to the sandbox.

My implementation for this can be seen in castletown.

I will dive deep into how I implemented this and discuss more core concepts (like user namespaces, overlayfs, etc.) which I find really interesting. Stay tuned for the next episode (?) of this blog series.

I’m happy if anyone is keen to contribute to the castletown project. Unfortunately, I cannot make it open source (yet) as it is part of my final year project. I’d love to hear your thoughts on it. Do let me know if there’s any fixes or improvements.

References

[1] Forišek, M. (2006). Security of programming contest systems. Information Technologies at School, 553-563.

[2] Mareš, M. (2021). Security of grading systems. Olympiads in Informatics, 15, 37–52. https://doi.org/10.15388/ioi.2021.04

Introduction#

Grader#

Possible Attacks#

Denial-of-Service Attacks#

Example of DoS at compilation time#

Example of DoS at execution time#

Privileges Escalation#

Destructive Attack#

Covert Channel#

The Solution#

References#