Wed 18 Nov 2020 19:00 - 19:20 at SPLASH-I - W-1 Chair(s): Patrick Lam, Julia Belyakova
Automatic software plagiarism detection tools are widely used in
educational settings to ensure that submitted work was not
copied. These tools have grown in use together with the rise in
enrollments in computer science programs and the widespread
availability of code on-line. Educators rely on the robustness of
plagiarism detection tools; the working assumption is that the effort
required to evade detection is as high as that required to actually do
the assigned work.
This paper shows this is not the case. It presents an entirely
automatic program transformation approach, MOSSAD, that defeats
popular software plagiarism detection tools.
MOSSAD comprises a framework that couples techniques inspired by
genetic programming with domain-specific knowledge to effectively
undermine plagiarism detectors. MOSSAD is effective at
defeating four plagiarism detectors, including
Moss and
JPlag. MOSSAD is both fast and
effective: it can, in minutes, generate modified versions of programs
that are likely to escape detection. More insidiously, because of its
non-deterministic approach, MOSSAD can, from a single program,
generate \emph{dozens} of variants, which are classified as no more
suspicious than legitimate assignments. A detailed study
of MOSSAD across a corpus of real student assignments
demonstrates its efficacy at evading detection. A user study shows
that graduate student assistants consistently
rate MOSSAD-generated code as just as readable as authentic
student code. This work motivates the need for both research on more
robust plagiarism detection tools and greater integration of naturally
plagiarism-resistant methodologies like code review into computer
science education.