How Foundation Models Changed our Work

Chris Ré

Our group got excited about foundation models (FMs), and I was prodded to write up how FMs changed our research view. We work both on the foundations of these new models and how they are changing workflows, primarily around data.

Foundations of Foundation Models

We’ve been examining several lines of work to more deeply understand foundation models. We can then use this to improve them or extend them to new domains.

  • Performance. A key challenge is runtime performance. Motivated by this and led by Tri Dao we developed FlashAttention, an IO-Aware view on Attention inspired by classical database ideas. FlashAttention greatly speeds up Transformer models, which are the de facto building block in FMs ranging from GPT3 to Dall-E to Stable Diffusion. This package is now widely used as the fastest implementation of Transformers, including in the ML Perf benchmarks from MSFT, NVIDIA and more.
    • Stay tuned for upcoming extensions and improvements via Dan Fu
  • Long Range Correlations. The ability to handle long sequences could dramatically improve today’s models ability to take in more complex instructions, more demonstrations, and allow us to perform fundamentally new tasks. The leading benchmark in the area is the Long Range Arena benchmark from Google.
    • Using Flash Attention, we extended the best result on this benchmark for Transformer models–beating random chance for the first-time!
    • Led by Albert Gu, we also developed a new model called S4 that showed one can view recurrences and CNNs all in one framework. This model set new state-of-the-art quality for Long Range Arena, solving the Path-X problem for the first time. All of this while only being linear time in sequence length (while transformers are quadratic.)
    • We continue to improve both lines of work with new ideas – stay tuned!
  • Networks. Modern accelerators can obtain 50-80% efficiency on popular machine learning workloads. However, they scale up to many machines and often require using very expensive network connections. Building on a range of ideas in relaxing consistency from ancient history (Hogwild!), Ce Zhang and his team lead the exploration of decentralized training. Here, we showed that we could fine-tune foundation models using slow networks–even across continents. More is coming very soon!

We’re also looking into time series models and observational supervision for those models. These new modalities are incredibly exciting, led by Michael Zhang and Khaled Saab. Our efforts on the generative side are led by Vishnu. Megan is leading the charge on understanding multiple encoders and knowledge. We’re also examining how we can apply FMs to privacy-sensitive data, led by Simran. Here, we’ve looked at how new models with exciting in-context learning abilities change the way we think about prior privacy approaches and how to get these powerful in-context abilities with smaller LMs in AMA, as well as split QA.

Foundation Models for Data

Foundation models are driven by data, and the recent spate of work on instruction tuning and model alignment shows that we’re just starting to explore how to program these models. A few highlights

  • Combining with Knowledge. Inspired by our work on Snorkel, we began examining the data needs of these foundation models, led by Mayee and Fred, Liger used weak supervision to show how to build models faster and cheaper with minimal labeled data.
  • Data Exploration. Domino looked for underperforming slices using foundation models to power data exploration. They’ve recently taken to auditing models and are building an incredible system for exploring data with FMs in Meerkat, led by Sabri, Karan and Arjun. This extends Karan’s work on Robustness Gym.
    • Led by Lingjiao and James, we’ve started to collect data sets of model predictions for use in this work!
  • Data Plumbing. Led by Ines, Laurel, and Avanika, we’ve examined how to handle the data cleaning, integration, and cleaning process. They’ve shown that foundation models can set state-of-the-art on these tasks–without ever being trained to do so and with minimal instruction! I talked about this in my SIGMOD keynote.

We continue to examine how data processing and understanding are changed by these new tools. To understand what really matters, we’re working on exciting applications in law and medicine led by Neel, Sarah, and Maya–much more coming soon!