SPLASH 2020
Sun 15 - Sat 21 November 2020 Online Conference

Artifact Evaluation for OOPSLA 2020 is complete. Please see the results in the Chairs’ Report.

Help others to build upon the contributions of your paper!

The Artifact Evaluation process is a service provided by the community to help authors of accepted papers provide more substantial supplements to their papers so future researchers can more effectively build on and compare with previous work.

Authors of papers that pass Round 1 of PACMPL (OOPSLA) will be invited to submit an artifact that supports the conclusions of their paper. The AEC will read the paper and explore the artifact to give feedback about how well the artifact supports the paper and how easy it is for future researchers to use the artifact.

This submission is voluntary. Papers that go through the Artifact Evaluation process successfully will receive a seal of approval printed on the first page of the paper. Authors of papers with accepted artifacts are encouraged to make these materials publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

See the Call for Artifacts tab for more information.

In an effort to reach a broader reviewing audience, we are also accepting self-nominations for artifact review. Please see the Call for Self-Nominations tab for more information.

Call for Artifacts

Help others to build upon the contributions of your paper!

The Artifact Evaluation process is a service provided by the community to help authors of accepted papers provide more substantial supplements to their papers so future researchers can more effectively build on and compare with previous work.

Authors of papers that pass Round 1 of PACMPL (OOPSLA) will be invited to submit an artifact that supports the conclusions of their paper. The AEC will read the paper and explore the artifact to give feedback about how well the artifact supports the paper and how easy it is for future researchers to use the artifact.

This submission is voluntary. Papers that go through the Artifact Evaluation process successfully will receive a seal of approval printed on the first page of the paper. Authors of papers with accepted artifacts are encouraged to make these materials publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

Important Dates

  • August 8: Authors of papers accepted in Phase 1 submit artifacts   
  • August 15-18: Authors may respond to issues found following kick-the-tires instructions
  • September 15: Artifact notifications sent out

Selection Criteria

The artifact is evaluated in relation to the expectations set by the paper. For an artifact to be accepted, it must support all the main claims made in the paper. Thus, in addition to just running the artifact, the evaluators will read the paper and may try to tweak provided inputs or otherwise slightly generalize the use of the artifact from the paper in order to test the artifact’s limits.

Artifacts should be:

  • consistent with the paper,
  • as complete as possible,
  • well documented, and
  • easy to reuse, facilitating further research.

The AEC strives to place itself in the shoes of such future researchers and then to ask: how much would this artifact have helped me? Please see details of the outcomes of artifact evaluation (badges) for further guidance on what these mean.

Submission Process

All papers that pass phase 1 of OOPSLA reviewing are eligible to submit artifacts.

Your submission should consist of three pieces:

  1. an overview of your artifact,
  2. a URL pointing to either:
  • a single file containing the artifact (recommended), or
  • the address of a public source control repository
  1. A hash certifying the version of the artifact at submission time: either
  • an md5 hash of the single file file (use the md5 or md5sum command-line tool to generate the hash), or
  • the full commit hash for the (e.g., from git reflog --no-abbrev)

The URL must be a Google Drive, Dropbox, Github, Bitbucket, or (public) Gitlab URL, to help protect the anonymity of the reviewers. You may upload your artifact directly if it’s a single file less than 15 MB.

Artifacts do not need to be anonymous; reviewers will be aware of author identities.

Overview of the Artifact

Your overview should consist of two parts:

  • a Getting Started Guide and
  • Step-by-Step Instructions for how you propose to evaluate your artifact (with appropriate connections to the relevant sections of your paper);

The Getting Started Guide should contain setup instructions (including, for example, a pointer to the VM player software, its version, passwords if needed, etc.) and basic testing of your artifact that you expect a reviewer to be able to complete in 30 minutes. Reviewers will follow all the steps in the guide during an initial kick-the-tires phase. The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

The Step by Step Instructions explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out, note how long it is expected to run (roughly) and explain how to run it on smaller inputs. Reviewers may choose to run on smaller inputs or larger inputs depending on available hardware.

Where appropriate, include descriptions of and links to files (included in the archive) that represent expected outputs (e.g., the log files expected to be generated by your tool on the given inputs); if there are warnings that are safe to be ignored, explain which ones they are.

The artifact’s documentation should include the following:

  • A list of claims from the paper supported by the artifact, and how/why.
  • A list of claims from the paper not supported by the artifact, and why not.

Example: Performance claims cannot be reproduced in VM, authors are not allowed to redistribute specific benchmarks, etc. Artifact reviewers can then center their reviews / evaluation around these specific claims, though the reviewers will still consider whether the provided evidence is adequate to support claims that the artifact works.

Packaging the Artifact

When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members will have a limited time in which to make an assessment of each artifact.

Your artifact can contain a bootable virtual machine image with all of the necessary libraries installed. Using a virtual machine provides a way to make an easily reproducible environment — it is less susceptible to bit rot. It also helps the AEC have confidence that errors or other problems cannot cause harm to their machines. This is recommended.

Submitting source code that must be compiled is permissible. A more automated and/or portable build — such as a Docker file or a build tool that manages all compilation and dependencies (e.g., maven, gradle, etc.) — improves the odds the AEC will not be stuck getting different versions of packages working (particularly different releases of programming languages).

Authors submitting machine-checked proof artifacts should consult Marianna Rapoport’s Proof Artifacts: Guidelines for Submission and Reviewing.

You should make your artifact available as a single archive file and use the naming convention <paper #>.<suffix>, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents.

Based on the outcome of the OOPSLA 2019 AEC, the strongest recommendation we can give for ensuring quality packaging is to test your own directions on a fresh machine (or VM), following exactly the directions you have prepared.

While publicly available artifacts are often easier to review, and considered to be in the best interest of open science, artifacts are not required to be public and/or open source. Artifact reviewers will be instructed that the artifacts are for use only for artifact evaluation, that submitted versions of artifacts may not be made public by reviewers, and that copies of artifacts must not be kept beyond the review period. There is an additional badge specifically for making artifacts available in reliable locations (see below), and we strongly encourage authors of accepted artifacts to pursue it, but it is a separate process from evaluation of functionality, and it is not required.

Badges

The artifact evaluation committee evaluates each artifact for the awarding of one or two badges:

Functional: This is the basic “accepted” outcome for an artifact. An artifact can be awarded a functional badge if the artifact supports all claims made in the paper, possibly excluding some minor claims if there are very good reasons they cannot be supported. In the ideal case, an artifact with this designation includes all relevant code, dependencies, input data (e.g., benchmarks), and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the paper. If the artifact claims to outperform a related system in some way (in time, accuracy, etc.) and the other system was used to generate new numbers for the paper (e.g., an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system, and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this expected behavior.

Deviations from this ideal must be for good reason. A non-exclusive list of justifiable deviations includes:

  • Some benchmark code is subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (e.g., licensed benchmark suites like SPEC, or when a tool is applied to private proprietary code). In such cases, all available benchmarks should be included. If all benchmark data from the paper falls into this case, alternative data should be supplied: providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
  • Some of the results are performance data, and therefore exact numbers depend on the particular hardware. In this case, artifacts should explain how to recognize when experiments on other hardware reproduce the high-level results (e.g., that a certain optimization exhibits a particular trend, or that comparing two tools one outperforms the other in a certain class of cases).
  • In some cases repeating the evaluation may take a long time. Reviewers may not reproduce full results in such cases

In some cases, the artifact may require specialized hardware (e.g., a CPU with a particular new feature, or a specific class of GPU, or a cluster of GPUs). For such cases, authors should contact the Artifact Evaluation Co-Chairs (Colin Gordon and Anders Møller) as soon as possible after round 1 notification to work out how to make these possible to evaluate. In past years one outcome was that an artifact requiring specialized hardware paid for a cloud instance with the hardware, which reviwers could access remotely.

Reusable: This badge may only be awarded to artifacts judged functional. A Reusable badge is given when reviewers feel the artifact is particularly well packaged, documented, designed, etc. to support future research that might build on the artifact. For example, if it seems relatively easy for others to reuse this directly as the basis of a follow-on project, the AEC may award a Reusable badge.

Artifacts given one or both of the Functional and Reusable badges are generally referred to as accepted.

After decisions on the Functional and Reusable badges have been made, the AEC Chairs can award an additional badge to those accepted artifacts that make their artifact durably available:

Available: This badge may only be awarded to artifacts judged functional. This badge is given to accepted artifacts that are made available publicly in an archival location. Last year accepted artifacts who uploaded the evaluated version to Zenodo and sent the AEC chairs the DOI (after acceptance) automatically received this badge. Github, etc. are not adequate for receiving this badge (see FAQ).

COI

Conflict of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the two AEC chairs are handled by the other AEC chair, or the PC of the conference if both chairs are conflicted. Artifacts involving an AEC chair must be unambiguously accepted (they may not be borderline), and they may not be considered for the distinguished artifact award.

FAQ

This list will be updated with useful questions as time goes on.

My artifact requires hundreds of GB of RAM / hundreds of CPU hours / a specialized GPU / etc., that the AEC members may not have access to. How can we submit an artifact?
If the tool can run on an average modern machine, but may run extremely slow in comparison to the hardware used for the paper's evaluation, please document the expected running time on your own hardware, and point to examples the AEC may be able to replicate in less time. If your system will simply not work at all without hundreds of GB or RAM, or other hardware requirements that most typical graduate student machines will not satisfy, please contact the AEC chairs in advance to make arrangements. In the past this has included options such as the authors paying for a cloud instance with the required hardware, which reviewers can have anonymous access to (the AEC chairs play proxy to communicate when the instance may be off to save the authors money). Submissions using cloud instances or similar that are not cleared with the AEC Chairs in advance will be summarily rejected
Can my artifact be accepted if some of the paper’s claims are not supported by the artifact, for example if some benchmarks are omitted or the artifact does not include tools we experimentally compare against in the paper?
In general yes (if good explanations are provided, as explained above), but if such claims are essential to the overall results of the paper, the artifact will be rejected. As an extreme example, an artifact consisting of a working tool submitted with no benchmarks (e.g., if all benchmarks have source that may not be redistributed) would be rejected.
Why do we need to use Zenodo for the Available badge? Why not Github?
Commercial repositories are unreliable, in that there is no guarantee the evaluated artifact will remain available indefinitely. Contrary to popular belief, it is possible to rewrite git commit history in a public repository (see docs on git rebase and the "--force" option to git push, and note that git tags are mutable). Users can delete public repositories, or their accounts. And in addition to universities deleting departmental URLs over time, hosting companies also sometimes simply delete data: Bidding farewell to Google Code (2015), Sunsetting Mercurial Support in Bitbucket (2019).
Reviewers identified things to fix in documentation or scripts for our artifact, and we'd prefer to publish the fixed version. Can't we submit the improved version for the Availability badge?
No, but you can get part of what you want. For availability, we want the evaluated version to be available. But Zenodo allows revisions to artifacts. When you do this, each version will receive its own DOI, a landing page will be created listing all versions, and when someone visits the page for the *evaluated* version, Zenodo will inform them that an updated version is available. For more information: https://help.zenodo.org/#versioning
Can I get the Available badge without submitting an artifact? I'm still making the thing available!
No. The Availability badge means an artifact known to support the paper's claims is available in an archival location. Making un-evaluated artifacts available is still good, but is outside the scope of what the AEC will consider.
Can I get the Available badge for an artifact that was not judged to be Functional? I'm still making the thing available!
No. The Availability badge vouches that the available artifact is known to support the paper's claims. Availability of an artifact where reviewers tried to use it to replicate the paper's results and failed is of uncertain value.

Contact

Please contact Colin Gordon and Anders Møller if you have any questions.

Artifacts

Title
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI Pre-print
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
OOPSLA Artifacts
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
DOI
OOPSLA Artifacts
DOI
OOPSLA Artifacts
OOPSLA Artifacts
DOI

This year the OOPSLA 2020 Artifact Evaluation Chairs are seeking (self!) nominations for the Artifact Evaluation Committee (AEC). If you are a senior PhD student or post-doc with expertise relevant to the kinds of artifacts submitted to OOPSLA, please read the rest of this message and apply: https://forms.gle/QHS5cn8uuJjzUcaA9

If you are not, but know someone who might be interested, please let them know about this.

Generally, the bar for “senior” PhD student has been authorship on one paper at a SIGPLAN conference or a related conference (e.g., ICSE, FSE, ASE, ISSTA, ECOOP, ESOP, etc.), though this should be interpreted as a rough guideline rather than a hard requirement on where you have published. Prior experience with artifact evaluation (as a submitter or reviewer) is a plus, but also not required.

The AEC’s work will occur between the phase 1 notifications for OOPSLA (July 1, 2020) and the due date for phase 2 revisions (August 14, 2020).

For more information on artifact reviewing, consult the 2020 calls for artifacts: https://2020.splashcon.org/track/splash-2020-Artifacts#Call-for-Artifacts

If you have questions, don’t hesitate to contact the 2020 AEC chairs (Colin Gordon and Anders Møller).

Results Overview

For a total of 87 conditionally accepted OOPSLA papers, the authors expressed intent to submit artifacts. We received 67 initial submissions, one of which was withdrawn shortly after submission, leaving 66 artifacts for review (61% of the 109 conditionally accepted OOPSLA papers). Of those:

  • 17 were deemed non-functional. This is only an indication that the AEC was not able to reproduce all relevant claims to their satisfaction, and not an indictment of the corresponding paper.
  • 49 were accepted in some way (74% acceptance), broken down as:
    • 30 reusable (implying also functional), so 61% of accepted artifacts were found to be reusable
    • 19 functional

These percentages are similar to 2019. The overall number of submissions, however, increased substantially, from 44 last year (a 50% increase), which led to a last-minute scramble to grow the reviewer pool from 30 to 50 PhD students and post-docs, who wrote 200 reviews.

Distinguished Artifacts

Distinguished Artifact Reviewers

  • Caroline Lemieux (UC Berkeley
  • Aviral Goel (Northeastern)
  • Kaan Genc (Ohio State)
  • Maaz Bin Safeer Ahmad (University of Washington)
  • Aina Linn Georges (Aarhus University)

Recommendations for Future Artifact Evaluations

Artifact evaluation consisted of two phases: a kick-the-tires phase to debug installation and dependency issues, and a full review phase. Authors had 4 days to respond to problems encountered in the kick-the-tires phase. Common issues in the kick-the-tires phase included:

  • Overstating platform support. Several artifacts claiming the need for only UNIX-like systems failed severely under macOS — in particular those requiring 32-bit compilers, which are no longer present in newer macOS versions. We recommend future artifacts scope their claimed support more narrowly. Generally this could be fixed by the authors providing a Dockerfile.
  • Missing dependencies, or poor documentation of dependencies.

As with last year, the single most effective way to avoid these sorts of issues ahead of time is to run the instructions independently on a fresh machine, VM, or Docker container.

Common issues found during the full review phase included:

  • Comparing against existing tools on new benchmarks, but not including ways to reproduce the other tools’ executions. This was explicitly mentioned in the call for artifacts.
  • Not explaining how to interpret results. Several artifacts ran successfully and produced the output that was the basis for the paper, but without any way for reviewers to compare these for consistency with the paper. Examples included generating a list of warnings without documenting which were true vs. false positives, and generating large tables of numbers that were presented graphically in the paper without providing a way to generate analogous visualizations.

This year, as in the past several years, the timeline for artifact reviewing was intentionally boxed to the period between OOPSLA Phase 1 notifications and OOPSLA Phase 2 submissions for the papers. This arrangement originated as an experiment to see if it was feasible — purely in terms of timeline — for artifact evaluation to be a useful input to Phase 2 decisions. (To be clear: this was a study of timing feasibility, and the artifact evaluation results have to date not been factored into Phase 2 decisions.) The past several years shows this is feasible, but has costs. In practice, it results in only 6 weeks for artifact reviewing, end to end.

We believe it is worth decoupling the Phase 2 deadline from artifact evaluation to permit more time for artifact reviewing. The extra time could allow more time for authors to prepare artifacts (instead of the week currently given), would ease reviewer load, and would allow for an additional round of iteration with authors that would be useful in some cases. Given that artifact submission is limited to one attempt currently (unlike paper submissions), it may be worth considering a different review model with even more rounds of feedback and opportunities for authors to correct or improve their artifacts for problems encountered even later in reviewing.

More concrete suggestions for next year include:

  • Review forms should be changed from accept/reject terminology to having two numeric scores indicating inclinations on functionality and separately reusability, with suitably clearer score text.
  • The guidelines for awarding the Reusable badge should be more clear, both to authors and reviewers. There is value in leaving the reusability criteria open-ended, as reusability often means something very different for machine-checked proofs vs. proof-of-concept compilers vs. dynamic analysis tools. However, the current ACM criteria in use are so open-ended that it is hard for authors to know what to aim for.
  • To align better with the general ACM guidelines and other SIGPLAN conferences, we should allow artifacts to receive the Available badge without requiring that they meet the Functional requirements. This has the additional benefit of still rewarding artifacts which perhaps were “close” to achieving a Functional designation. This could proceed either by the AEC relaxing the requirements for Available badges (but still requiring the AEC to look at the artifacts), or by allowing Conference Publishing to handle artifact availability independently of the AEC (in which case it would be possible for papers to carry Available badges without ever being seen by the AEC).
  • We should seek funding for compute infrastructure for compute-intensive artifacts. This year and last saw artifact submissions requiring specific GPUs, small clusters, hundreds of GB of RAM on one machine, or dozens of cores. This leads to a scramble every year to recognize which artifacts have these significant requirements, and to try to rebalance them to reviewers with existing access to possibly-suitable systems. This year and last year some artifact authors rented cloud systems at their own expense for reviewers to use. However, this biases reviews in favor of those with the funding for that (which can easily run bills up to several thousand USD). While we should still permit and encourage this for artifacts whose authors have such resources, we should also solicit funds for cloud computing that the AEC can set up on its own as needed, based on the artifacts that arrive. It may be possible to apply through various providers’ research credits programs, though it might also be useful to include this as part of sponsorship requests for future editions of OOPSLA.