Overview
Advances in artificial intelligence are opening new ways to uncover patterns in large cancer datasets, but only if the data is structured for both humans and machines. A central method is embedding: representing complex data as numeric vectors that capture key features and relationships, so similar items are positioned near each other in a shared space. Vector databases store and index these representations, enabling efficient search, comparison, and analysis at scale. Because embeddings put diverse data into a common format, they can help integrate lab results, imaging, and genomic data, and may support safer sharing by abstracting sensitive details. In health, where valuable data is also tightly protected, this could be transformative.
The Embedding Databases Workshop convenes experts from research, clinical care, technology, and industry to identify a small set of high-impact challenges. The aim is to focus effort on key roadblocks and explore how these emerging tools can responsibly unlock data, ensuring that critical information reaches those who can use it to save lives.
The Challenge
Despite their promise, embedding-based approaches face significant challenges. Many in the cancer data science community are still unfamiliar with these methods, making it difficult to assess where they are most useful. The field is also evolving rapidly, with open questions about whether embeddings can sufficiently protect sensitive data to enable broader sharing. This workshop will address both the development of effective methods for creating and querying cancer data embeddings, including across multimodal data, and the associated privacy concerns.
Event Outcomes
The aim of the initial events (Parts 1 and 2) is to look beyond the current landscape of activity in this space and identify the best embedding methods for cancer-related data, as well as what are the most promising applications of such embeddings for advancing clinical cancer research. The two aspects, of course, are deeply intertwined. The ultimate goal is to bring together new teams around plausible approaches targeting some of the identified applications, and support those teams in pursuing funding opportunities for pilot projects.
Workshop Series
This workshop unfolds in three distinct parts — Town Halls, a Convergence Session, and an Innovation Lab — each building on the last to move from open exploration to bold, actionable research. All parts are fully virtual and highly interactive. Each requires separate registration or application.
Part 1: Town Halls
Open to All | Registration Required | (Complete) Session A: Wednesday, April 8 - 12:30–2:00 PM ET
Session B: Thursday, April 9 - 2:00–3:30 PM ET
The Town Halls are your invitation to join the conversation. Designed for researchers, clinicians, technologists, and any other stakeholders, these sessions surface the most pressing questions and challenges in this space — and set the stage for everything that follows.
We're hosting two identical sessions to maximize access. Attend either one; no need to join both.
(Complete) Register For Session A: April 8, 12:30-2pm ET
👉 Register For Session B: April 9, 2-3:30pm ETIf you aren't able to make it to a Town Hall, but still want to share your input, please fill out this form by April 15th.
Part 2: Convergence Session
Application Required | 75 Spots Available | Thursday, April 23 - 10:00 AM–5:00 PM ET | Applications due: Monday, April 13 by 11:59 PM PT
This is where broad thinking sharpens into focus. Drawing on insights from the Town Halls, a curated group of participants will work together to identify 3–5 high-priority scientific challenges where intervention is most needed. We're seeking a diverse mix of expertise — technical, clinical, and everything in between — to ensure the themes that emerge are both rigorous and relevant. One priority challenge will go on to anchor the Innovation Lab.
Part 3: Innovation Lab
5 days virtual, spread across ~3 weeks | Dates & Times TBD | Application Required | 25 Spots Available
The Innovation Lab brings together participants from diverse disciplines to generate bold, high-impact research proposals that address the priority challenges identified during the Convergence Session.
The Innovation Lab creates an immersive, highly collaborative environment that encourages creative, free-thinking engagement beyond everyday professional routines. Participants will work in multidisciplinary teams to develop a shared understanding of the selected challenges, refine specific research questions, and design novel, actionable research concepts.
By the end of the workshop, teams will have developed compelling, forward-looking proposals around the topic area selected during the Convergence Session.
Want to be the first to know when the Innovation Lab topic, dates, and application open?
Who Should Apply
We are looking for participants who bring deep expertise in at least one of several key areas:
- Cancer research and clinical oncology
- Data science and machine learning
- Privacy and data governance
- Health data infrastructure
We particularly value individuals who have experience working across disciplinary boundaries and are comfortable navigating problems that don't yet have clear solutions. Seniority in a given field is welcome, but just as important is a willingness to engage with unfamiliar perspectives and contribute to early-stage, collaborative thinking.
The ideal cohort will include a mix of specialists, including:
- Cancer data scientists
- AI developers
- Embedding specialists
- Other data scientists
- Clinical oncologists
- Drug developers
- Cancer clinician scientists
- Pharma data scientists and other R&D scientists
- Complex systems modelers
- Cancer biologists