May 18, 2024 · 7 min read
ECLAIR: A Treat for the Enterprise
👩💻 Death by 1,000 Clicks
Over 90% of jobs now require digital skills, with workers averaging 3 hrs/day doing repetitive digital tasks tangential to their core responsibilities -- a phenomenon referred to as "death by 1,000 clicks".
While tons of recent attention has focused on developing better personal assistants, automating these tedious enterprise workflows represents a much larger (but much harder!) opportunity.
McKinsey estimates that $4 trillion/year in productivity gains could be realized at places like hospitals, government agencies, and corporations with high administrative burden and knowledge-intensive tasks.
$4 trillion sounds like a lot -- So how do we get there?
⏰ The Promise of Robotic Process Automation (RPA)
Today, enterprises hoping to automate workflows typically purchase Robotic Process Automation (RPA) software from vendors like UIPath or Blue Prism.
In RPA, a bot is hard-coded to follow a set of predefined rules for completing a workflow -- it’s essentially a decision tree built using a low-code editor, as shown in Figure #1.
Narrowly-scoped RPA deployments can have ROIs of 30-200% and double the speed of workflows. However, the adoption of RPA has been inhibited by three key failure modes that surfaced in our interviews with technology leaders at a hospital and B2B enterprise (see our paper for full case studies):
- High set-up costs: It typically takes 12-18 months to go from project kickoff to deployment, and often requires trained specialists to map workflows, write automation scripts, and integrate with IT infrastructure.
- Unreliable execution: Since RPA relies on hard-coded rules, bots cannot adapt to slight variations in input. In our B2B case study, the RPA bot was initially only 60% accurate and took 6 months of improvements to reach 95%.
- Burdensome maintenance: Deployments require continued human oversight to validate the RPA bot’s outputs and fix edge cases. In our B2B case study, the bot required 2 full-time equivalents (FTEs) worth of continued monitoring.
The common cause of these shortcomings is the impossibility of enumerating all possible scenarios that a bot might encounter. Most enterprise knowledge is “tacit” -- i.e. hard to explicitly articulate and almost never written down -- which makes a rule-based system like RPA fundamentally limited.
Examples of such “tacit” knowledge include:
- Billing administrators at Stanford Hospital knowing that lapsed insurance coverage should never be deleted from a patient’s health record, but rather have its end date set to “Jan 1, 1901” for auditing purposes; or
- Customer support agents at a B2B enterprise knowing how far a conversation should go before offering a discount or escalating to a supervisor.
These small bits of knowledge are typically acquired through observation and on-the-job experience rather than written documentation.
Recent research suggests that multimodal foundation models (FMs) such as GPT-4 can sidestep the failure modes of traditional RPA, just as deep learning eclipsed rule-based approaches over the past decade. Multimodal FMs have been shown to automate simple web navigation, desktop, and mobile tasks by leveraging their visual understanding, real-time decision making, and generalized reasoning capabilities.
However, a large gap still exists between these proof-of-concepts and enterprise-ready solutions -- it’s one thing for GPT-4 to order you a burger on Doordash, but would you trust it to coordinate your hip replacement surgery?
Thus, we ask:
⚙️ ECLAIR: Enterprise sCaLe AI for woRkflows
We take the first natural step and apply multimodal FMs across all three stages of traditional RPA through a system called ECLAIR (“Enterprise sCaLe AI for woRkflows”).
Critically, we show that it is possible to automate all three stages of the RPA pipeline (from task specification to task auditing) without any human oversight, as shown below (see Figure 2):
- Demonstrate: ECLAIR uses multimodal FMs to learn from human workflow expertise by watching video demonstrations and reading written documentation. This lowers set-up costs and technical barriers to entry.
- Execute: ECLAIR observes the state of the screen to plan actions by leveraging the reasoning and visual understanding abilities of multimodal FMs in conjunction with the task-specific knowledge learned in the Demonstrate step. This improves robustness over traditional methods that require rigid APIs or hard-coding of rules.
- Validate: ECLAIR utilizes multimodal FMs to self-monitor and error correct its actions in real-time. This reduces the need for human oversight.
🏥 Real-World Healthcare Workflow Demo
Talk is cheap -- show me a demo!
We applied ECLAIR to a real-world enterprise workflow sourced from Stanford Hospital -- placing a patient telesitter order in Epic on behalf of a nurse. See a full recording of ECLAIR completing the workflow here.
(Epic is the most popular electronic health record software used by hospitals, and “placing a patient telesitter order” means that a nurse has requested for a patient to be placed under continuous remote visual monitoring)
- Demonstrate: Using ECLAIR, we record a nurse placing a patient sitter order. Our script runs in the background and captures a full screen recording and telemetry data such as clicks, keystrokes, and scrolls (Figure 3). ECLAIR then generates screenshots at key frames (i.e. frames of the video in which an action was taken) and writes a standard operating procedure (“SOP”) that details every step of the workflow in natural language (Figure 4).
- Execute: At execution time, ECLAIR takes full control of the computer (Figure 5). Using a manually edited SOP and raw screenshots of the computer screen, ECLAIR decides the next action to take (i.e., “click on the submit button with location x=1241, y=74”). This action is then executed. ECLAIR repeats this process until it determines that the task is complete (Figure 6).
- Validate: Finally, ECLAIR validates that the workflow was successfully completed (Figure 7). It does so by processing the action trace it generated during the Execution phase alongside the generated SOP from the Demonstrate step. ECLAIR also provides action-level and workflow-level validation during the Execution phase to detect and correct mistakes.
⏭️ To Infinity and Beyond
ECLAIR is a first step towards applying FMs to enterprise workflow automation.
There are significant opportunities for further research, such as improved error handling and monitoring to improve task completion rates, as well as a better understanding of how to incorporate human-in-the-loop review for workflows that require human signoff (e.g. a physician signing a medication order).
We encourage users to apply ECLAIR to their own workflows and use cases -- Please see our code repository for more details.
We are currently looking for design partners to help us take ECLAIR to the next level! If you would like to see ECLAIR applied to your use case, please fill out this form to get 1-1 set-up support and customization for deploying ECLAIR: ECLAIR Waitlist Form
📚 Additional Resources
- How do you use ECLAIR? Check out our Github here for sample scripts: https://github.com/HazyResearch/eclair-agents
- Want to learn more? Read our paper here: https://arxiv.org/abs/2405.03710
Acknowledgements
This work would not have been possible without amazing collaborators at Stanford Healthcare who selflessly shared their time and expertise with us. In particular, we are extremely grateful for the support of Stanford Healthcare’s TDS (Nikesh Kotecha, Aditya Sharma, Nigam Shah), AI-Cares Initiative (Minh Nguyen), and Nursing Innovation and Informatics (Nerissa Ambers and Darren Batara) for their incredible help with this project!