Mossad: Defeating Software Plagiarism Detection (SPLASH 2020 - OOPSLA)

Sun 15 - Sat 21 November 2020 Online Conference

Who

Breanna Devore-McDonald, Emery D. Berger

Track

SPLASH 2020 OOPSLA

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 Nov 2020 07:00 - 07:20 at SPLASH-I - W-1 Chair(s): Karim Ali, Sophia Drossopoulou
Wed 18 Nov 2020 19:00 - 19:20 at SPLASH-I - W-1 Chair(s): Patrick Lam, Julia Belyakova

Abstract

Automatic software plagiarism detection tools are widely used in
educational settings to ensure that submitted work was not
copied. These tools have grown in use together with the rise in
enrollments in computer science programs and the widespread
availability of code on-line. Educators rely on the robustness of
plagiarism detection tools; the working assumption is that the effort
required to evade detection is as high as that required to actually do
the assigned work.

This paper shows this is not the case. It presents an entirely
automatic program transformation approach, MOSSAD, that defeats
popular software plagiarism detection tools.
MOSSAD comprises a framework that couples techniques inspired by
genetic programming with domain-specific knowledge to effectively
undermine plagiarism detectors. MOSSAD is effective at
defeating four plagiarism detectors, including
Moss and
JPlag. MOSSAD is both fast and
effective: it can, in minutes, generate modified versions of programs
that are likely to escape detection. More insidiously, because of its
non-deterministic approach, MOSSAD can, from a single program,
generate \emph{dozens} of variants, which are classified as no more
suspicious than legitimate assignments. A detailed study
of MOSSAD across a corpus of real student assignments
demonstrates its efficacy at evading detection. A user study shows
that graduate student assistants consistently
rate MOSSAD-generated code as just as readable as authentic
student code. This work motivates the need for both research on more
robust plagiarism detection tools and greater integration of naturally
plagiarism-resistant methodologies like code review into computer
science education.

Link to Publication

https://dl.acm.org/doi/pdf/10.1145/3428206

DOI

https://doi.org/10.1145/3428206

Breanna Devore-McDonald

University of Massachusetts at Amherst

Emery D. Berger