Back to directory
Data & Memory LayerSeries AInfra signal50

Protege

Licensed real-world data for AI builders.

Why people are saving it

Relevant for AI builders navigating data licensing, governance, and access to real-world training sets.

What they're building

Protege connects AI developers with licensed real-world datasets sourced from data holders, creating a governed marketplace for training data.

Foundation model usage

Low intensityEmbeddings & semantic search

Protege depends on semantic retrieval or memory to turn messy context into usable model input.

NYC footprint

Protege is part of the New York City AI startup scene, with a profile focused on its market category, stage, and product signal.

Funding

Latest funding: Series A1 · $30M · 2026-01-08. Lead investor: Andreessen Horowitz; Footwork; CRV; Bloomberg Beta.

Platform / OpenAI fit

Strong fit for embeddings, retrieval, extraction, document intelligence, and persistent context layers.

Notes

Protege is tagged infra signal based on buyer clarity, repeat workflow signal, public activity, and fit with the AI Atlas map.

Related NYC startups

Foresight

AI data infrastructure for private markets

Foresight brings unified data and AI workflows to private-market investing.

Data & Memory LayerSeedWorkflow signal74views

Recent Activity

Seed-stage company serving private-market workflows with data and AI infrastructure. · in 5d
Aaru

Synthetic populations for market research

Aaru uses simulated populations to predict consumer, market, and political behavior.

Data & Memory LayerSeries AWorkflow signal87views

Recent Activity

Series A company applying AI agents and simulation to research, polling, and market prediction. · in 5d
Qualitate

AI-native primary intelligence platform that runs structured expert discussions and turns them into research data

AI-native primary intelligence platform that runs structured expert discussions and turns them into research data.

Data & Memory LayerSeedInfra signal69views

Recent Activity

Added: deepens the data, memory, and research infrastructure layer. · in 3w