PuppyGraph is the first and only real time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles. Capable of scaling with petabytes of data and executing complex 10-hop queries in seconds, PuppyGraph supports use cases from enhancing LLMs with knowledge graphs to fraud detection, cybersecurity and more. Trusted by industry leaders, including Coinbase, AMD, Netskope, Palo Alto Network, eBay, and more.

How does PuppyGraph compare to Neo4j?

Unlike Neo4j, which requires you to load and sync data into its proprietary graph store, PuppyGraph runs directly on your data sources—eliminating ETL, reducing TCO, and enabling faster time-to-value. PuppyGraph also integrates natively with Databricks Unity Catalog, Google BigQuery, and AlloyDB.

What are the performance benefits of PuppyGraph?

PuppyGraph delivers multi-hop traversals in seconds over billions of edges. Real customer stories cite 5-hop queries on 1B+ edges in under 3 seconds.

Does PuppyGraph support my cloud data stack?

Yes. PuppyGraph natively integrates with Databricks Unity Catalog, Google BigQuery, AlloyDB, and AWS, keeping a single governed copy of your data.

How does PuppyGraph handle data governance and security?

PuppyGraph leverages your existing catalog and security (Unity Catalog, BigQuery, AlloyDB), so all graph queries respect your current access controls.

Can PuppyGraph power AI and LLM applications (GraphRAG)?

Yes. PuppyGraph enables Graph-based Retrieval Augmented Generation (GraphRAG) directly on your governed data—providing explainable, multi-hop context for LLMs and enterprise AI.

New · A field report from PuppyGraph

Text-to-SQL agents work at 10 tables. They break at 100s.

What 50+ data team leaders told us about scaling agents to 1,000s of tables — and the architecture pattern that actually works in production.

Read below

Try PuppyGraph free

Join Slack

15 sections4 min readField research from 50+ data teams

Why we're here

Text-to-SQL agents work at 10 tables. They break at 100s.

What we heard

12 months of conversations with 50+ data team leaders.

They shared that more tables lead to significantly higher error rates.

The pattern underneath

Ontology graphs encode rich semantics.

Text-to-SQL agents can only consult them as reference.

Most of the graph's structural power is unused.

What's actually working

A cybersecurity firm built a system across 1,500 tables using PuppyGraph.

Iceberg-native. Zero ETL. Customer-facing.

The field research

50+ conversations about blockers to production text-to-SQL agents

Data team leaders we've spoken with. Spanning financial services, security, networking, retail, and semiconductors.

Takeaway: every team hits the same ceiling. We've watched them climb it, then climb it again.

What everyone is trying

The common text-to-SQL architecture

User

Natural language question

LLM

Context engineering

SQL

Generated query

Lakehouse

Execution

Answer

Back to user

At small schema size: this works well.

At enterprise scale: context exceeds prompt limits + joins compound + semantic ambiguity multiplies.

The journey teams take

Three workaround stages for more tables

Stage 01

Text-to-SQL on the Lakehouse

Start here: LLM writes SQL over Iceberg.

Workarounds teams try

Inject schema into the prompt
Add few-shot examples
Restrict to "safe" tables
Layer in ontology hints

Stage 02

Graph DB + Ontology Overlay

Pivot when SQL failures compound.

Workarounds teams try

GraphDB as ontology reference
Leverage ontology graph to generate better SQL

Stage 03

Prompt Engineering + Fine-Tuning

Double down on the model. Invest in eval infra.

Workarounds teams try

Encode institutional knowledge into prompts
Fine-tune on expert SQL log
Build regression tests on golden query sets

The diagnosis

Reference knowledge vs. enforced knowledge

SQL assumes the analyst already knows which joins make sense. Replace the analyst with an agent, and that knowledge vanishes — every patch is an attempt to put it back.

Reference knowledge

Documentation. Prompts. Training data.

Lives OUTSIDE the query engine.

Can be ignored by the model
Can be overridden when locally convenient
Used by every SQL-paradigm patch — ontology overlays, semantic layers, fine-tuning

Enforced knowledge

Built into the query layer itself.

Lives INSIDE the query engine.

Cannot be ignored by the model
Wrong joins are structurally impossible to express
The railway only goes where the tracks go

The right architecture

Map vs. railway: reference vs. enforced ontology

A map is reference knowledge: useful, but ignorable. A railway is structural: the agent can only travel where the tracks go.

Map / Reference semantics: SQL + Ontology

Agent figures out its own path
Wrong turns are possible
Errors are silent — wrong queries still run
Slows down as schema grows

Railway / Enforced semantics: PuppyGraph

Agent only travels where track exists
Wrong destinations have no track
Errors tell the agent exactly where to reroute
Optimized for the queries agents actually run

The agent-language fit

Why Cypher fits agent-generated queries

Cypher is concise where SQL is verbose

3-hop joins in SQL: ~15 lines.

Same in Cypher: 1 line.

LLMs are trained on 20 years of Cypher

Not a new language for the LLM.

It's a well-represented one.

Generated by the model, not authored by users

Agents write Cypher.

Humans ask in English.

For agent-generated queries on graph-shaped data, Cypher is the lower-friction path. Humans don't need to learn it — they don't write it.

The scary failure mode

One agent self-corrects. The other proceeds with bad data.

Agent attempts to retrieve student grades alongside teacher salary data — a join that violates business meaning. Production deployment becomes real when agents recover autonomously, without silent wrong answers.

SQL on Iceberg

SELECT s.name, s.grade, sal.salary
FROM students s
JOIN salaries sal ON s.id = sal.person_id;

Error returned

No error message returned. Student suddenly has salary when they should not have.

What the agent learns

Nothing went wrong.

Returns plausible, non-empty result. Reasoning is done over incorrect data.

PuppyGraph · Cypher on the same Iceberg data

MATCH (s:Student)-[:HAS_SALARY]->(sal:salary)
RETURN s.name, s.grade, sal.salary

Error returned

No edge 'HAS_SALARY' exists between 'Student' and 'Salary'.

What the agent learns

Salary is semantically out of scope from Student.

Error message sent back to LLM for improved query generation.

What it takes

What any production-ready agentic system has to meet

Agent harness / enforced ontology

The failure

Wrong joins. Silent wrong answers.

The requirement

Business rules built into the query structure. Wrong queries are structurally impossible to express, not just discouraged.

Process unlimited data

The failure

Enterprise data = massive & spread out.

The requirement

Scale and reach in one storage:

Federation across data stores
Native distributed sharding & shuffling
Query in both SQL & Graph

Subsecond performance

The failure

Too slow for real-time agents.

The requirement

MPP architecture and vectorized execution. Subsecond response for multi-hop traversals — agents don't wait.

How PuppyGraph fits in

Graph queries directly on your Iceberg lakehouse — wherever it lives

AI Agents · Apps · Notebooks

↓

PuppyGraph

Federated Graph Query Engine · Enforced Ontology · Subsecond Performance

↓

Iceberg lakehouse

S3 Tables · Databricks · Polaris

Object storage

S3 · GCS · Azure · R2 · on-prem

Warehouses & OLTP

Redshift · Snowflake · Postgres · MySQL

Iceberg-native by design. Federate across whatever else your enterprise runs on.

Zero ETL.

Query Iceberg natively. Data never leaves your storage.

Federated, not siloed.

Your lakehouse can be the analytical core without forcing a full migration of operational data.

Proof in production — 01

: Agentic IT Ops

Built a “Glean++” agent to cut IT support costs by reducing human-in-loop ticket resolution time. Using it as the blueprint for company-wide AI overhaul.

Unified data sources

ServiceNow · IT tickets

Confluence · KB docs

Jira · bug reports

Slack · conversations

Iceberg tables · telemetry

GitHub · code issues

Example queries

Find all issues related to this component that were resolved in the past 3 months.
Which commits introduced the most recurring incidents?
If this service fails, what dependent systems will be affected?

“This work is a strong example of how we're operationalizing AI and data across the enterprise — building the foundation for more autonomous capabilities ahead.”

Hasmukh Ranjan · CIO @ AMD

PuppyGraph turning point