Learning Semantic Program Embeddings with Graph Interval Neural Network (SPLASH 2020 - OOPSLA)

Sun 15 - Sat 21 November 2020 Online Conference

Who

Yu Wang, Ke Wang, Fengjuan Gao, Linzhang Wang

Track

SPLASH 2020 OOPSLA

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 20 Nov 2020 11:40 - 12:00 at SPLASH-III - F-3B Chair(s): Yaniv David, Francisco Ferreira
Fri 20 Nov 2020 23:40 - 00:00 at SPLASH-III - F-3B Chair(s): Dimi Racordon, Yulei Sui

Abstract

Learning distributed representations of source code has been a challenging task for machine learning models. Earlier works treated programs as text so that natural language methods can be readily applied. Unfortunately, such approaches do not capitalize on the rich structural information possessed by source code. Of late, Graph Neural Network (GNN) was proposed to learn embeddings of programs from their graph representations. Due to the homogeneous (i.e. do not take advantage of the program-specific graph characteristics) and expensive (i.e. require heavy information exchange among nodes in the graph) message-passing procedure, GNN can suffer from precision issues, especially when dealing with programs rendered into large graphs. In this paper, we present a new graph neural architecture, called Graph Interval Neural Network (GINN), to tackle the weaknesses of the existing GNN. Unlike the standard GNN, GINN generalizes from a curated graph representation obtained through an abstraction method designed to aid models to learn. In particular, GINN focuses exclusively on intervals (generally manifested in looping construct) for mining the feature representation of a program, furthermore, GINN operates on a hierarchy of intervals for scaling the learning to large graphs.

We evaluate GINN for two popular downstream applications: variable misuse prediction and method name prediction. Results show in both cases GINN outperforms the state-of-the-art models by a comfortable margin. We have also created a neural bug detector based on GINN to catch null pointer deference bugs in Java code. While learning from the same 9,000 methods extracted from 64 projects, GINN-based bug detector significantly outperforms GNN-based bug detector on 13 unseen test projects. Next, we deploy our trained GINN-based bug detector and Facebook Infer, arguably the state-of-the-art static analysis tool, to scan the codebase of 20 highly starred projects on GitHub. Through our manual inspection, we confirm 38 bugs out of 102 warnings raised by GINN-based bug detector compared to 34 bugs out of 129 warnings for Facebook Infer. We have reported 38 bugs GINN caught to developers, among which 11 have been fixed and 12 have been confirmed (fix pending). GINN has shown to be a general, powerful deep neural network for learning precise, semantic program embeddings.

Link to Publication

https://dl.acm.org/doi/pdf/10.1145/3428205

DOI

https://doi.org/10.1145/3428205

Yu Wang

Nanjing University

Ke Wang

Visa Research

Fengjuan Gao

Nanjing University

Linzhang Wang

Nanjing University

Media

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 20 Nov
Displayed time zone: Central Time (US & Canada) change

11:00 - 12:20	F-3BOOPSLA at SPLASH-III +12h Chair(s): Yaniv David Technion, Francisco Ferreira Imperial College London

11:00 20m Talk		Just-in-Time Learning for Bottom-Up Enumerative Synthesis OOPSLA Shraddha Barke University of California at San Diego, Hila Peleg University of California at San Diego, Nadia Polikarpova University of California at San Diego Link to publication DOI Media Attached
11:20 20m Talk		Taming Type Annotations in Gradual Typing OOPSLA John Peter Campora University of Louisiana at Lafayette, Sheng Chen University of Louisiana at Lafayette Link to publication DOI Media Attached
11:40 20m Talk		Learning Semantic Program Embeddings with Graph Interval Neural Network OOPSLA Yu Wang Nanjing University, Ke Wang Visa Research, Fengjuan Gao Nanjing University, Linzhang Wang Nanjing University Link to publication DOI Media Attached
12:00 20m Talk		ιDOT: A DOT Calculus with Object Initialization OOPSLA Ifaz Kabir University of Alberta, Yufeng Li University of Waterloo, Ondřej Lhoták University of Waterloo Link to publication DOI Media Attached

23:00 - 00:20	F-3BOOPSLA at SPLASH-III Chair(s): Dimi Racordon University of Geneva, Switzerland, Yulei Sui University of Technology Sydney

23:00 20m Talk		Just-in-Time Learning for Bottom-Up Enumerative Synthesis OOPSLA Shraddha Barke University of California at San Diego, Hila Peleg University of California at San Diego, Nadia Polikarpova University of California at San Diego Link to publication DOI Media Attached
23:20 20m Talk		Taming Type Annotations in Gradual Typing OOPSLA John Peter Campora University of Louisiana at Lafayette, Sheng Chen University of Louisiana at Lafayette Link to publication DOI Media Attached
23:40 20m Talk		Learning Semantic Program Embeddings with Graph Interval Neural Network OOPSLA Yu Wang Nanjing University, Ke Wang Visa Research, Fengjuan Gao Nanjing University, Linzhang Wang Nanjing University Link to publication DOI Media Attached
00:00 20m Talk		ιDOT: A DOT Calculus with Object Initialization OOPSLA Ifaz Kabir University of Alberta, Yufeng Li University of Waterloo, Ondřej Lhoták University of Waterloo Link to publication DOI Media Attached