SPLASH 2020
Sun 15 - Sat 21 November 2020 Online Conference
Wed 18 Nov 2020 15:40 - 16:00 at SPLASH-I - W-5 Chair(s): Mohsen Lesani, Dan Barowy
Thu 19 Nov 2020 03:40 - 04:00 at SPLASH-I - W-5 Chair(s): Nengkun Yu, Filip Křikava

Data repositories often consist of text files in a wide variety of standard formats, ad-hoc formats, as well as mixtures of formats where data in one format is embedded into a different format. It is therefore a significant challenge to parse these files into a structured tabular form, which is important to enable any downstream data processing.

We present \textsc{Unravel}, an extensible framework for structure interpretation of ad-hoc formats. \textsc{Unravel} can automatically, with no user input, extract tabular data from a diverse range of standard, ad-hoc and mixed format files. The framework is also easily extensible to add support for previously unseen formats, and also supports interactivity from the user in terms of examples to guide the system when specialized data extraction is desired. Our key insight is to allow arbitrary combination of extraction and parsing techniques through a concept called \emph{partial structures}. Partial structures act as a common language through which the file structure can be shared and refined by different techniques. This makes \textsc{Unravel} more powerful than applying the individual techniques in parallel or sequentially. Further, with this rule-based extensible approach, we introduce the novel notion of \emph{re-interpretation} where the variety of techniques supported by our system can be exploited to improve accuracy while optimizing for particular quality measures or restricted environments. On our benchmark of $617$ text files gathered from a variety of sources, \textsc{Unravel} is able to extract the intended table in many more cases compared to state-of-the-art techniques.

Wed 18 Nov
Times are displayed in time zone: Central Time (US & Canada) change

15:00 - 16:20: W-5OOPSLA at SPLASH-I +12h
Chair(s): Mohsen LesaniUniversity of California at Riverside, USA, Dan BarowyWilliams College
15:00 - 15:20
Talk
OOPSLA
Thodoris SotiropoulosAthens University of Economics and Business, Stefanos ChaliasosAthens University of Economics and Business, Dimitris MitropoulosAthens University of Economics and Business, Diomidis SpinellisAthens University of Economics and Business
Link to publication DOI Pre-print Media Attached
15:20 - 15:40
Talk
OOPSLA
Azalea RaadMPI-SWS / Imperial College London, Ori LahavTel Aviv University, Viktor VafeiadisMPI-SWS
Link to publication DOI Media Attached
15:40 - 16:00
Talk
OOPSLA
Sumit GulwaniMicrosoft, Vu LeMicrosoft, Arjun RadhakrishnaMicrosoft, Ivan RadičekMicrosoft, Mohammad RazaMicrosoft
Link to publication DOI Media Attached
16:00 - 16:20
Talk
OOPSLA
Fangyi ZhouImperial College London, Francisco FerreiraImperial College London, Raymond HuUniversity of Hertfordshire, Rumyana NeykovaBrunel University London, Nobuko YoshidaImperial College London
Link to publication DOI Pre-print Media Attached

Thu 19 Nov
Times are displayed in time zone: Central Time (US & Canada) change

03:00 - 04:20: W-5OOPSLA at SPLASH-I
Chair(s): Nengkun YuUniversity of Technology Sydney, Filip KřikavaCzech Technical University
03:00 - 03:20
Talk
OOPSLA
Thodoris SotiropoulosAthens University of Economics and Business, Stefanos ChaliasosAthens University of Economics and Business, Dimitris MitropoulosAthens University of Economics and Business, Diomidis SpinellisAthens University of Economics and Business
Link to publication DOI Pre-print Media Attached
03:20 - 03:40
Talk
OOPSLA
Azalea RaadMPI-SWS / Imperial College London, Ori LahavTel Aviv University, Viktor VafeiadisMPI-SWS
Link to publication DOI Media Attached
03:40 - 04:00
Talk
OOPSLA
Sumit GulwaniMicrosoft, Vu LeMicrosoft, Arjun RadhakrishnaMicrosoft, Ivan RadičekMicrosoft, Mohammad RazaMicrosoft
Link to publication DOI Media Attached
04:00 - 04:20
Talk
OOPSLA
Fangyi ZhouImperial College London, Francisco FerreiraImperial College London, Raymond HuUniversity of Hertfordshire, Rumyana NeykovaBrunel University London, Nobuko YoshidaImperial College London
Link to publication DOI Pre-print Media Attached