Protege
Licensed real-world data for AI builders.
Why people are saving it
Relevant for AI builders navigating data licensing, governance, and access to real-world training sets.
What they're building
Protege connects AI developers with licensed real-world datasets sourced from data holders, creating a governed marketplace for training data.
Foundation model usage
Protege depends on semantic retrieval or memory to turn messy context into usable model input.
NYC footprint
Protege is part of the New York City AI startup scene, with a profile focused on its market category, stage, and product signal.
Funding
Latest funding: Series A1 · $30M · 2026-01-08. Lead investor: Andreessen Horowitz; Footwork; CRV; Bloomberg Beta.
Platform / OpenAI fit
Strong fit for embeddings, retrieval, extraction, document intelligence, and persistent context layers.
Notes
Protege is tagged infra signal based on buyer clarity, repeat workflow signal, public activity, and fit with the AI Atlas map.

