Learning Semantic Program Embeddings with Graph Interval Neural NetworkOOPSLA
Mon 16 Nov 2020 22:20 - 23:00 at SPLASH-VII - Posters Session 2 Mirror
Learning distributed representations of source code has been a challenging task for machine learning models. Earlier works treated programs as text so that natural language methods can be readily applied. Unfortunately, such approaches do not capitalize on the rich structural information possessed by source code. Of late, Graph Neural Network (GNN) was proposed to learn embeddings of programs from their graph representations. Due to the homogeneous (i.e. do not take advantage of the program-specific graph characteristics) and expensive (i.e. require heavy information exchange among nodes in the graph) message-passing procedure, GNN can suffer from precision issues, especially when dealing with programs rendered into large graphs. In this paper, we present a new graph neural architecture, called Graph Interval Neural Network (GINN), to tackle the weaknesses of the existing GNN. Unlike the standard GNN, GINN generalizes from a curated graph representation obtained through an abstraction method designed to aid models to learn. In particular, GINN focuses exclusively on intervals (generally manifested in looping construct) for mining the feature representation of a program, furthermore, GINN operates on a hierarchy of intervals for scaling the learning to large graphs.
We evaluate GINN for two popular downstream applications: variable misuse prediction and method name prediction. Results show in both cases GINN outperforms the state-of-the-art models by a comfortable margin. We have also created a neural bug detector based on GINN to catch null pointer deference bugs in Java code. While learning from the same 9,000 methods extracted from 64 projects, GINN-based bug detector significantly outperforms GNN-based bug detector on 13 unseen test projects. Next, we deploy our trained GINN-based bug detector and Facebook Infer, arguably the state-of-the-art static analysis tool, to scan the codebase of 20 highly starred projects on GitHub. Through our manual inspection, we confirm 38 bugs out of 102 warnings raised by GINN-based bug detector compared to 34 bugs out of 129 warnings for Facebook Infer. We have reported 38 bugs GINN caught to developers, among which 11 have been fixed and 12 have been confirmed (fix pending). GINN has shown to be a general, powerful deep neural network for learning precise, semantic program embeddings.
Mon 16 NovDisplayed time zone: Central Time (US & Canada) change
10:20 - 11:00 | |||
10:20 40mPoster | Resolution as Intersection Subtyping via Modus PonensOOPSLA Posters Koar Marntirosian KU Leuven, Tom Schrijvers KU Leuven, Bruno C. d. S. Oliveira University of Hong Kong, Georgios Karachalias Tweag | ||
10:20 40mPoster | Macros For Domain-Specific LanguagesOOPSLA Posters Michael Ballantyne Northeastern University, Alexis King Northwestern University, Matthias Felleisen Northeastern University | ||
10:20 40mPoster | Precise Inference of Expressive Units of Measurement TypesOOPSLA Posters Tongtong Xiang University of Waterloo, Jeff Y. Luo University of Waterloo, Werner Dietl University of Waterloo | ||
10:20 40mPoster | Guiding Dynamic Programing via Structural Probability for Accelerating Programming by ExampleOOPSLA Posters Ruyi Ji Peking University, Yican Sun Peking University, Yingfei Xiong Peking University, Zhenjiang Hu Peking University | ||
10:20 40mPoster | A Sparse Iteration Space Transformation Framework for Sparse Tensor AlgebraOOPSLA Posters Ryan Senanayake Reservoir Labs, Changwan Hong Massachusetts Institute of Technology, Ziheng Wang Massachusetts Institute of Technology, Amalee Wilson Stanford University, Stephen Chou Massachusetts Institute of Technology, Shoaib Kamil Adobe Research, Saman Amarasinghe Massachusetts Institute of Technology, Fredrik Kjolstad Stanford University DOI Pre-print Media Attached File Attached | ||
10:20 40mPoster | Learning Semantic Program Embeddings with Graph Interval Neural NetworkOOPSLA Posters Yu Wang Nanjing University, China, Ke Wang Visa Research, Fengjuan Gao Nanjing University, Linzhang Wang Nanjing University | ||
10:20 40mPoster | Statically Verified Refinements for Multiparty ProtocolsOOPSLA Posters Fangyi Zhou Imperial College London, Francisco Ferreira Imperial College London, Raymond Hu University of Hertfordshire, Rumyana Neykova Brunel University London, Nobuko Yoshida Imperial College London | ||
10:20 40mPoster | Towards a Formal Foundation of Intermittent ComputingOOPSLA Posters Milijana Surbatovich Carnegie Mellon University, Brandon Lucia Carnegie Mellon University, Limin Jia Carnegie Mellon University | ||
10:20 40mPoster | Incremental Predicate Analysis for Regression VerificationOOPSLA Posters | ||
10:20 40mPoster | StreamQL: A Query Language for Processing Streaming Time SeriesOOPSLA Posters | ||
10:20 40mPoster | SATUNE: Synthesizing Efficient SAT EncodersOOPSLA Posters Hamed Gorjiara University of California at Irvine, Guoqing Harry Xu University of California at Los Angeles, Brian Demsky University of California at Irvine | ||
10:20 40mPoster | Scaling Exact Inference for Discrete Probabilistic ProgramsOOPSLA Posters Steven Holtzen University of California at Los Angeles, Guy Van den Broeck University of California at Los Angeles, Todd Millstein University of California, Los Angeles | ||
10:20 40mPoster | Certified and Efficient Instruction SchedulingOOPSLA Posters Sylvain Boulmé Grenoble Alps University / CNRS / Grenoble INP / VERIMAG, Cyril Six Kalray / Grenoble Alps University / CNRS / Grenoble INP / VERIMAG, David Monniaux Grenoble Alps University / CNRS / Grenoble INP / VERIMAG | ||
10:20 40mPoster | Revisiting Iso-Recursive SubtypingOOPSLA Posters Yaoda Zhou University of Hong Kong, Bruno C. d. S. Oliveira University of Hong Kong, Jinxu Zhao University of Hong Kong | ||
10:20 40mPoster | Guided Linking: Dynamic Linking Without the CostsOOPSLA Posters Sean Bartell University of Illinois at Urbana-Champaign, Will Dietz University of Illinois at Urbana-Champaign, Vikram S. Adve University of Illinois at Urbana-Champaign | ||
10:20 40mPoster | Unifying Execution of Imperative Generators and Declarative SpecificationsOOPSLA Posters Pengyu Nie University of Texas at Austin, Marinela Parovic University of Texas at Austin, Zhiqiang Zang University of Texas at Austin, Sarfraz Khurshid University of Texas at Austin, Aleksandar Milicevic Microsoft, Milos Gligoric University of Texas at Austin | ||
10:20 40mPoster | CompCertELF: Verified Separate Compilation of C Programs into ELF Object FilesOOPSLA Posters Yuting Wang Shanghai Jiao Tong University, Xiangzhe Xu Nanjing University, Pierre Wilke CentraleSupélec, Zhong Shao Yale University |
22:20 - 23:00 | |||
22:20 40mPoster | Learning Semantic Program Embeddings with Graph Interval Neural NetworkOOPSLA Posters Yu Wang Nanjing University, China, Ke Wang Visa Research, Fengjuan Gao Nanjing University, Linzhang Wang Nanjing University | ||
22:20 40mPoster | Towards a Formal Foundation of Intermittent ComputingOOPSLA Posters Milijana Surbatovich Carnegie Mellon University, Brandon Lucia Carnegie Mellon University, Limin Jia Carnegie Mellon University | ||
22:20 40mPoster | StreamQL: A Query Language for Processing Streaming Time SeriesOOPSLA Posters | ||
22:20 40mPoster | Incremental Predicate Analysis for Regression VerificationOOPSLA Posters | ||
22:20 40mPoster | SATUNE: Synthesizing Efficient SAT EncodersOOPSLA Posters Hamed Gorjiara University of California at Irvine, Guoqing Harry Xu University of California at Los Angeles, Brian Demsky University of California at Irvine | ||
22:20 40mPoster | A Sparse Iteration Space Transformation Framework for Sparse Tensor AlgebraOOPSLA Posters Ryan Senanayake Reservoir Labs, Changwan Hong Massachusetts Institute of Technology, Ziheng Wang Massachusetts Institute of Technology, Amalee Wilson Stanford University, Stephen Chou Massachusetts Institute of Technology, Shoaib Kamil Adobe Research, Saman Amarasinghe Massachusetts Institute of Technology, Fredrik Kjolstad Stanford University DOI Pre-print Media Attached File Attached | ||
22:20 40mPoster | Unifying Execution of Imperative Generators and Declarative SpecificationsOOPSLA Posters Pengyu Nie University of Texas at Austin, Marinela Parovic University of Texas at Austin, Zhiqiang Zang University of Texas at Austin, Sarfraz Khurshid University of Texas at Austin, Aleksandar Milicevic Microsoft, Milos Gligoric University of Texas at Austin | ||
22:20 40mPoster | Statically Verified Refinements for Multiparty ProtocolsOOPSLA Posters Fangyi Zhou Imperial College London, Francisco Ferreira Imperial College London, Raymond Hu University of Hertfordshire, Rumyana Neykova Brunel University London, Nobuko Yoshida Imperial College London | ||
22:20 40mPoster | Guiding Dynamic Programing via Structural Probability for Accelerating Programming by ExampleOOPSLA Posters Ruyi Ji Peking University, Yican Sun Peking University, Yingfei Xiong Peking University, Zhenjiang Hu Peking University | ||
22:20 40mPoster | Revisiting Iso-Recursive SubtypingOOPSLA Posters Yaoda Zhou University of Hong Kong, Bruno C. d. S. Oliveira University of Hong Kong, Jinxu Zhao University of Hong Kong | ||
22:20 40mPoster | Macros For Domain-Specific LanguagesOOPSLA Posters Michael Ballantyne Northeastern University, Alexis King Northwestern University, Matthias Felleisen Northeastern University | ||
22:20 40mPoster | CompCertELF: Verified Separate Compilation of C Programs into ELF Object FilesOOPSLA Posters Yuting Wang Shanghai Jiao Tong University, Xiangzhe Xu Nanjing University, Pierre Wilke CentraleSupélec, Zhong Shao Yale University | ||
22:20 40mPoster | Certified and Efficient Instruction SchedulingOOPSLA Posters Sylvain Boulmé Grenoble Alps University / CNRS / Grenoble INP / VERIMAG, Cyril Six Kalray / Grenoble Alps University / CNRS / Grenoble INP / VERIMAG, David Monniaux Grenoble Alps University / CNRS / Grenoble INP / VERIMAG | ||
22:20 40mPoster | Resolution as Intersection Subtyping via Modus PonensOOPSLA Posters Koar Marntirosian KU Leuven, Tom Schrijvers KU Leuven, Bruno C. d. S. Oliveira University of Hong Kong, Georgios Karachalias Tweag | ||
22:20 40mPoster | Precise Inference of Expressive Units of Measurement TypesOOPSLA Posters Tongtong Xiang University of Waterloo, Jeff Y. Luo University of Waterloo, Werner Dietl University of Waterloo | ||
22:20 40mPoster | Guided Linking: Dynamic Linking Without the CostsOOPSLA Posters Sean Bartell University of Illinois at Urbana-Champaign, Will Dietz University of Illinois at Urbana-Champaign, Vikram S. Adve University of Illinois at Urbana-Champaign | ||
22:20 40mPoster | Scaling Exact Inference for Discrete Probabilistic ProgramsOOPSLA Posters Steven Holtzen University of California at Los Angeles, Guy Van den Broeck University of California at Los Angeles, Todd Millstein University of California, Los Angeles |