Shiftry: RNN Inference in 2KB of RAM (SPLASH 2020 - OOPSLA)

Sun 15 - Sat 21 November 2020 Online Conference

Who

Aayan Kumar, Vivek Seshadri, Rahul Sharma

Track

SPLASH 2020 OOPSLA

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 17 Nov 2020 12:00 - 12:20 at SPLASH-I - T-3 Chair(s): Olivier Tardieu, Burcu Kulahcioglu Ozkan
Wed 18 Nov 2020 00:00 - 00:20 at SPLASH-I - T-3 Chair(s): Chengyu Zhang, Ting Cao

Abstract

Traditionally, IoT devices send collected sensor data to an intelligent cloud where machine learning (ML) inference happens. However, this course is rapidly changing and there is a recent trend to run ML on the edge IoT devices themselves. An intelligent edge is attractive because it saves network round trip (efficiency) and keeps user data at the source (privacy). However, the IoT devices are much more resource constrained than the cloud, which makes running ML on them challenging. Specifically, consider Arduino Uno, a commonly used board, that has 2KB of RAM and 32KB of read-only Flash memory. Although recent breakthroughs in ML have created novel recurrent neural network (RNN) models that provide good accuracy with KB-sized models, deploying them on tiny devices with such hard memory requirements has remained elusive.

We provide, Shiftry, an automatic compiler from high-level floating-point ML models to fixed-point C-programs with 8-bit and 16-bit integers, which have significantly lower memory requirements. For this conversion, Shiftry uses a data-driven float-to-fixed procedure and a RAM management mechanism. These techniques enable us to provide first empirical evaluation of RNNs running on tiny edge devices. On simpler ML models that prior work could handle, Shiftry-generated code has lower latency and higher accuracy.

Link to Publication

https://dl.acm.org/doi/pdf/10.1145/3428250

DOI

https://doi.org/10.1145/3428250

Aayan Kumar

Microsoft Research

India

Vivek Seshadri

Microsoft Research

Rahul Sharma

Microsoft Research

Media

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 17 Nov
Displayed time zone: Central Time (US & Canada) change

11:00 - 12:20	T-3OOPSLA at SPLASH-I +12h Chair(s): Olivier Tardieu IBM Research, Burcu Kulahcioglu Ozkan MPI-SWS

11:00 20m Talk		Koord: A Language for Programming and Verifying Distributed Robotics Application OOPSLA Ritwika Ghosh University of Illinois at Urbana-Champaign, Chiao Hsieh University of Illinois at Urbana-Champaign, Sasa Misailovic University of Illinois at Urbana-Champaign, Sayan Mitra University of Illinois at Urbana-Champaign Link to publication DOI Pre-print Media Attached
11:20 20m Talk		Learning-Based Controlled Concurrency Testing OOPSLA Suvam Mukherjee Microsoft Research, Pantazis Deligiannis Microsoft Research, Arpita Biswas IISc Bangalore, Akash Lal Microsoft Research Link to publication DOI Pre-print Media Attached
11:40 20m Talk		LiveDroid: Identifying and Preserving Mobile App State in Volatile Runtime Environments OOPSLA Umar Farooq University of California at Riverside, Zhijia Zhao University of California at Riverside, Manu Sridharan University of California at Riverside, Iulian Neamtiu New Jersey Institute of Technology Link to publication DOI Pre-print Media Attached
12:00 20m Talk		Shiftry: RNN Inference in 2KB of RAM OOPSLA Aayan Kumar Microsoft Research, Vivek Seshadri Microsoft Research, Rahul Sharma Microsoft Research Link to publication DOI Media Attached

23:00 - 00:20	T-3OOPSLA at SPLASH-I Chair(s): Chengyu Zhang East China Normal University, Ting Cao Microsoft Research

23:00 20m Talk		Koord: A Language for Programming and Verifying Distributed Robotics Application OOPSLA Ritwika Ghosh University of Illinois at Urbana-Champaign, Chiao Hsieh University of Illinois at Urbana-Champaign, Sasa Misailovic University of Illinois at Urbana-Champaign, Sayan Mitra University of Illinois at Urbana-Champaign Link to publication DOI Pre-print Media Attached
23:20 20m Talk		Learning-Based Controlled Concurrency Testing OOPSLA Suvam Mukherjee Microsoft Research, Pantazis Deligiannis Microsoft Research, Arpita Biswas IISc Bangalore, Akash Lal Microsoft Research Link to publication DOI Pre-print Media Attached
23:40 20m Talk		LiveDroid: Identifying and Preserving Mobile App State in Volatile Runtime Environments OOPSLA Umar Farooq University of California at Riverside, Zhijia Zhao University of California at Riverside, Manu Sridharan University of California at Riverside, Iulian Neamtiu New Jersey Institute of Technology Link to publication DOI Pre-print Media Attached
00:00 20m Talk		Shiftry: RNN Inference in 2KB of RAM OOPSLA Aayan Kumar Microsoft Research, Vivek Seshadri Microsoft Research, Rahul Sharma Microsoft Research Link to publication DOI Media Attached