Automating the Art of Data Augmentation

Part I Overview

Data augmentation is a de facto technique used in nearly every state-of-the-art model in applications such as image and text classification. Heuristic data augmentation schemes are often tuned manually by human experts with extensive domain knowledge, and may result in suboptimal augmentation policies. In this blog post, we provide a... [Read More]

Into the Wild: Machine Learning In Non-Euclidean Spaces

Is our comfortable and familiar Euclidean space and its linear structure always the right place for machine learning? Recent research argues otherwise: it is not always needed and sometimes harmful, as demonstrated by a wave of exciting work. Starting with the notion of hyperbolic representations for hierarchical data two years... [Read More]

Powerful Abstractions for Programming Your Training Data

Using standard models (i.e. pretrained BERT) and minimal tuning, we leverage key abstractions for programmatically building and managing training data to achieve a state-of-the-art result on SuperGLUE—a a newly curated benchmark with six tasks for evaluating “general-purpose language understanding technologies.” We also give updates on Snorkel’s use in the real... [Read More]

Butterflies Are All You Need: A Universal Building Block for Structured Linear Maps

We use a type of structured matrix known as a butterfly matrix to learn fast algorithms for discrete linear transforms such as the Discrete Fourier Transform. We further introduce a hierarchy of matrix families based on composing butterfly matrices, which is capable of efficiently representing any structured matrix (any matrix... [Read More]