PuppyGraph is the first and only real time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles. Capable of scaling with petabytes of data and executing complex 10-hop queries in seconds, PuppyGraph supports use cases from enhancing LLMs with knowledge graphs to fraud detection, cybersecurity and more. Trusted by industry leaders, including Coinbase, AMD, Netskope, Palo Alto Network, eBay, and more.

How does PuppyGraph compare to Neo4j?

Unlike Neo4j, which requires you to load and sync data into its proprietary graph store, PuppyGraph runs directly on your data sources—eliminating ETL, reducing TCO, and enabling faster time-to-value. PuppyGraph also integrates natively with Databricks Unity Catalog, Google BigQuery, and AlloyDB.

What are the performance benefits of PuppyGraph?

PuppyGraph delivers multi-hop traversals in seconds over billions of edges. Real customer stories cite 5-hop queries on 1B+ edges in under 3 seconds.

Does PuppyGraph support my cloud data stack?

Yes. PuppyGraph natively integrates with Databricks Unity Catalog, Google BigQuery, AlloyDB, and AWS, keeping a single governed copy of your data.

How does PuppyGraph handle data governance and security?

PuppyGraph leverages your existing catalog and security (Unity Catalog, BigQuery, AlloyDB), so all graph queries respect your current access controls.

Can PuppyGraph power AI and LLM applications (GraphRAG)?

Yes. PuppyGraph enables Graph-based Retrieval Augmented Generation (GraphRAG) directly on your governed data—providing explainable, multi-hop context for LLMs and enterprise AI.

See all articles

Table of Contents

Introduction to MySQL

Graph Database

Apache AGE vs Neo4j: Key Differences and Comparison

Sa Wang

Software Engineer

June 30, 2026

Apache AGE and Neo4j both let you model data as a property graph and query it with Cypher, and the resemblance mostly ends there. Neo4j is a purpose-built native graph database: its storage, indexing, and execution are engineered around traversal, and it runs as its own server with its own operational model. Apache AGE is an extension that adds graph capabilities to PostgreSQL, so the graph lives inside a relational database you may already run, and one engine serves SQL tables, JSON documents, and graphs at once.

The decision between them is rarely about which executes a single Cypher query faster. It is about whether you want a dedicated graph system or graph features folded into an existing PostgreSQL deployment, and that choice pulls in licensing, scaling, ecosystem, and how far a graph workload can grow before the architecture pushes back. This post defines each system, lists what each does well, sets them side by side in a feature table, and works through when each one fits. It closes with a third architecture that sidesteps the storage question entirely, by querying graph data where it already lives.

What is Apache AGE?

Apache AGE (the name stands for A Graph Extension) is a PostgreSQL extension that adds graph database functionality on top of an existing relational database. Rather than standing up a separate graph engine, you install AGE into PostgreSQL and gain the ability to create graphs, store nodes and edges, and traverse them with graph queries, while the same database continues to serve ordinary SQL. The design goal, stated in the project documentation, is a single storage layer that handles both relational and graph data.

AGE is an Apache Software Foundation top-level project. It entered the Apache Incubator in April 2020 and graduated to a top-level project in May 2022, which puts its governance and release process under the ASF rather than any single vendor. Its lineage traces back to Bitnine’s AgensGraph, a multi-model fork of PostgreSQL, and AGE reworks that idea into a standard extension that installs against stock PostgreSQL releases (versions 11 through 18 in the current release). It is licensed under Apache 2.0.

The query side is openCypher, the open specification of the Cypher language. What makes AGE distinct is hybrid querying: because the graph lives inside PostgreSQL, a single statement can mix SQL and Cypher. Cypher is embedded through a cypher() function call that sits in the FROM clause of a SQL query, and the columns it returns are declared explicitly using AGE’s agtype data type:

SELECT *
FROM cypher('social', $$
    MATCH (a:Person)-[:KNOWS]->(b:Person)
    WHERE a.city = 'Berlin'
    RETURN a.name, b.name
$$) AS (a_name agtype, b_name agtype);

Under the hood, graph entities are stored as agtype, a JSONB-based type (a superset of JSON) that represents scalars, lists, maps, and graph elements like vertices, edges, and paths. Because the storage is ultimately PostgreSQL’s relational model, AGE inherits the database underneath it, and the project’s FAQ is candid that the relational model’s limitations come along too: queries that translate into a large number of table joins, which is what deep multi-hop traversals become, carry the cost of those joins.

Key features

Graph inside PostgreSQL. The defining feature is the absence of a second system. Graph, relational, and JSON document data share one database, one connection, one transaction boundary, and one backup. For a team already running PostgreSQL, adding a graph workload does not mean operating new infrastructure.

openCypher with hybrid SQL. AGE speaks openCypher, so graph practitioners write familiar MATCH patterns, and it lets SQL and Cypher combine in one statement. The interop has documented edges: you cannot write SQL directly inside a Cypher block, and only void or scalar user-defined functions can be called from within Cypher (set-returning functions are not supported), but Cypher can be wrapped inside a SQL CTE or subquery freely.

Inherited PostgreSQL maturity. AGE rides on PostgreSQL’s indexing (B-tree, hash, and GIN indexes apply to AGE data), its transactional guarantees (ACID across relational, JSON, and graph), its security and encryption mechanisms, and its broad extension ecosystem. This is decades of operational tooling that AGE does not have to reinvent.

Vendor-neutral licensing and governance. Apache 2.0 with ASF governance is a permissive, single-license story, with no separate community and enterprise tiers to reconcile.

Tooling. The official Python driver is built on psycopg3, a standard PostgreSQL driver, and parses agtype into vertex, edge, and path objects. AGE Viewer is an official web-based tool for visualizing graphs stored in a PostgreSQL and AGE database.

Two honest limits round out the picture. AGE does not ship a graph data science library, so algorithms like PageRank or community detection are not bundled the way they are in some graph platforms. And horizontal sharding through Citus, the PostgreSQL extension teams usually reach for to shard a database across nodes, is not yet supported, because Citus does not distribute the inherited tables AGE relies on; scale-out follows PostgreSQL’s own path rather than a graph-native one.

Get Started with PuppyGraph for FREE

What is Neo4j?

Neo4j is a native graph database. The word native is load-bearing: rather than mapping graphs onto another storage model, Neo4j stores and accesses data as a property graph directly, with nodes and relationships as first-class structures that each carry typed properties. Its central architectural property is index-free adjacency, meaning each node holds direct references to its neighbors, so traversing a relationship is a pointer hop rather than an index lookup. The practical consequence is that traversal cost depends on how much of the graph a query touches, not on the total size of the graph, which is the opposite of the join-cost behavior a relational store exhibits as data grows.

Neo4j created Cypher, the property graph query language, and later opened it as openCypher. Cypher is now evolving toward GQL, the graph query language published as the ISO/IEC 39075 standard in 2024, a standardization effort Neo4j participated in from the start. Neo4j stores its data in native graph formats on disk and offers several storage formats tuned for different scale and feature needs.

Neo4j ships in editions. The Community Edition is fully functional for a single instance and is open source under GPLv3. The Enterprise Edition is commercially licensed and adds the capabilities that production deployments at scale tend to need: clustering, online backup and restore, role-based access control, LDAP integration, multiple and composite databases, and a parallel Cypher runtime. AuraDB is Neo4j’s fully managed cloud service for teams that prefer not to operate the database themselves. Versioning moved to a calendar scheme in 2025 (releases now carry YYYY.MM numbers), alongside a versioned Cypher language line.

Key features

Native graph storage and traversal. Index-free adjacency and storage engineered for graphs are what make Neo4j fast at the workloads graph databases exist for: deep traversals, variable-length paths, and pattern matching across many hops.

A mature query and standards story. Cypher is widely known, well documented, and now aligning with the GQL ISO standard, which lowers the long-term risk of betting on a single-vendor language.

Graph Data Science library. Neo4j’s GDS library provides graph algorithms as callable procedures, spanning centrality, community detection, similarity, pathfinding, node embeddings, and link prediction, plus a Pregel API for custom algorithms. For analytical and machine-learning work on graphs, this is a substantial built-in toolkit.

Procedure and integration ecosystem. APOC (Awesome Procedures On Cypher) extends Cypher with hundreds of utility procedures, and the broader ecosystem includes connectors and integrations built up over more than a decade.

Clustering and scale (Enterprise). Autonomous clustering, introduced in Neo4j 5, handles automated database placement and horizontal read scaling, and composite databases (which supersede the earlier Fabric feature) support sharded and federated queries across databases.

Access and visualization. Neo4j is reached over the Bolt protocol, with official drivers for .NET, Go, Java, JavaScript, and Python, and ships visualization through Neo4j Browser and Neo4j Bloom. Recent releases added native vector indexes for similarity search, bringing embedding-based retrieval into the same database as the graph.

Get Started with PuppyGraph for FREE

Apache AGE vs Neo4j: feature comparison

The two systems answer the same question (how do I store and query a property graph) from opposite architectural starting points. The table below sets the differences that tend to drive a decision side by side.

Dimension	Apache AGE	Neo4j
Architecture	Graph capability as a PostgreSQL extension; graph, relational, and JSON share one engine	Purpose-built native graph database, running as its own server
Storage	Graph entities stored as `agtype` (a JSONB-based superset of JSON) on PostgreSQL's relational storage	Native graph storage with index-free adjacency, engineered for traversal
Traversal performance	Multi-hop traversals compile into table joins, so cost climbs with traversal depth	Index-free adjacency: traversal cost tracks the data a query touches, not total graph size
Query language	openCypher, plus hybrid queries that combine SQL and Cypher in one statement	Cypher (which Neo4j created), now aligning toward the GQL ISO standard
Multi-model	Native: SQL tables, JSON documents, and graphs in the same database and transaction	Graph-first; relational or document data typically lives in other systems
Graph algorithms	No bundled data-science library; traversals and paths expressed in Cypher	Graph Data Science library: centrality, community detection, pathfinding, embeddings
Scaling	Inherits PostgreSQL's operations and indexing; no Citus distribution yet	Autonomous clustering and composite databases in the Enterprise Edition
Licensing	Apache 2.0, vendor-neutral ASF governance, single license	Community Edition GPLv3 (single instance); Enterprise commercial; AuraDB managed
Ecosystem and tooling	Standard PostgreSQL drivers and tooling; psycopg3-based Python driver; AGE Viewer	Bolt protocol, official drivers in five languages, APOC, GDS, Browser and Bloom
Where it pushes back	A graph layer on a relational engine, not a system built for graphs	A separate system to run; graph data is a copy distinct from source systems

The table makes the trade-off concrete, and traversal performance sits at the heart of it. AGE optimizes for consolidation: keep one database, gain a graph, and pay for it with traversal performance that is bounded by the relational join cost underneath and a thinner graph-specific ecosystem. Neo4j optimizes for graph depth: a storage engine and algorithm library built specifically for traversal, so performance holds up as queries reach across many hops, paid for by running a dedicated system and maintaining graph data that is separate from wherever your operational or analytical data already lives. Neither is a flaw; each is a coherent answer to a different priority, and the right pick follows from which priority is yours.

When to choose Apache AGE vs Neo4j

Choose Apache AGE when PostgreSQL is already in the picture. If your data already lives in PostgreSQL and the graph is one workload among relational and JSON ones, AGE lets you add graph queries without standing up and operating a second database. The same cypher() call that walks relationships can sit alongside ordinary SQL in the same transaction, and you keep PostgreSQL’s backups, security, and indexing for the whole dataset. That hybrid querying, where one statement mixes Cypher and SQL so a graph traversal can join directly against relational tables, is something a dedicated graph database running as its own system cannot offer.

Choose Apache AGE when operational simplicity and licensing matter. One system is cheaper to run, monitor, and secure than two. For teams that want graph capability without a new piece of infrastructure, and that value a single permissive Apache 2.0 license with vendor-neutral governance, AGE is the lighter-weight path. This fits best when graph traversals are moderate in depth and the graph is not the dominant workload.

Choose Neo4j when the graph is the primary workload. If traversals are deep, patterns are complex, and graph performance is the thing you are optimizing for, a native graph engine is built for exactly that. The further your queries reach across hops, the more the architectural difference tells, because Neo4j’s traversal cost tracks the data touched rather than the joins generated.

Choose Neo4j when you need graph data science and mature tooling. The GDS library, APOC, Bloom and Browser visualization, drivers across five languages, and a large body of documentation and community knowledge are a real head start for analytics, machine learning on graphs, and exploratory work. Enterprise clustering and composite databases address high availability and scale-out when a single instance is not enough.

The decision is less a ranking than a fit to circumstances. AGE is the natural choice when graph is an addition to a PostgreSQL-centric stack; Neo4j is the natural choice when graph is the center of gravity and depth, algorithms, and dedicated tooling justify a system of its own.

Get Started with PuppyGraph for FREE

Which one is right for you?

If the previous section reads as a menu, the choice collapses to a single question: is the graph the center of gravity, or one capability alongside a relational stack you already run? When traversal depth, algorithms, and graph-native tooling are what you are optimizing for, Neo4j earns its dedicated system; when the graph is an addition to a PostgreSQL-centric stack, AGE adds it without a second database to operate. Most decisions settle on that axis alone.

Both answers, though, share an assumption worth naming: the graph data lives in the graph system. With AGE, that means your data sits in PostgreSQL and is modeled as a graph there. With Neo4j, it means loading data into Neo4j’s native store, separate from the warehouses, lakehouses, and operational databases that may already hold it. That assumption is reasonable, and often it is exactly what you want. But it is not the only possible architecture, and when the data you want to query as a graph already lives across other systems, a different shape can fit better.

A third approach: a graph layer over the data you already have

The premise behind both AGE and Neo4j is that you choose where the graph is stored and then move data into it. PuppyGraph starts from the opposite premise: the data stays where it is, and the graph is a query layer placed over it. It is a distributed graph query engine that maps a graph schema onto existing tables in sources like PostgreSQL, Snowflake, and Apache Iceberg, then runs graph queries against those tables in place, with no ETL and no second copy of the data to keep in sync.

PuppyGraph is itself a graph query engine, not a SQL-translation layer. It compiles a graph query into a plan of node and edge operators that run inside its own distributed engine, rather than translating the query into one large SQL statement for the source’s relational planner to execute. When it reads from a SQL store, it issues only simple projection and filter queries; the multi-hop traversal work happens inside PuppyGraph. The engine is built for the shape of graph analytics: execution is columnar and vectorized, so a traversal touches only the attributes it needs; predicate pushdown and min/max statistics cut how much data is scanned in the first place; and execution is auto-sharded across executor nodes, so a workload scales horizontally by adding nodes. Because the query is represented as graph operators rather than relational ones, the engine optimizes for traversal and pattern matching directly instead of inheriting a relational planner’s choices. That is where its traversal performance comes from, while the data itself stays in the systems that already own it.

PuppyGraph speaks openCypher over the Bolt protocol, so for anyone coming from Neo4j it drops into the same drivers, applications, and BI tools, and query skills carry over from both AGE and Neo4j since the syntax is the same Cypher family. Gremlin is supported as well, for teams that prefer it. What it does not ask for is a separate graph store: the data is never loaded or copied, and the sources it maps a graph over are not limited to PostgreSQL. The fit is different rather than strictly better: AGE keeps graph and relational data together inside one PostgreSQL system, Neo4j gives a dedicated native graph engine, and PuppyGraph runs graph queries over data that already lives across your warehouse, lakehouse, and relational stores.

Get Started with PuppyGraph for FREE

Conclusion

Apache AGE and Neo4j are both credible ways to run property graph queries with Cypher, and they diverge at the architecture. AGE folds graph capability into PostgreSQL, so a single system serves relational, document, and graph data under one permissive license, with traversal performance and graph tooling bounded by what the relational engine and its ecosystem provide. Neo4j is a native graph database with storage and a Graph Data Science library built specifically for traversal, paid for by operating a dedicated system and maintaining graph data separately from your other stores. The right choice follows from how central the graph workload is and how much operational surface you want to take on.

Both, however, assume the graph data has to live inside the graph system. When the data you want to traverse already spans warehouses, lakehouses, and operational databases, querying it in place can beat choosing a new home for it.

Try the forever-free PuppyGraph Developer Edition and book a demo with the team to see how openCypher and Gremlin queries run over warehouse and lakehouse tables, with no graph-specific ETL, when the data you want as a graph already lives across relational and lakehouse systems.

Sa Wang

Software Engineer

Sa Wang is a Software Engineer with exceptional mathematical ability and strong coding skills. He holds a Bachelor's degree in Computer Science and a Master's degree in Philosophy from Fudan University, where he specialized in Mathematical Logic.

‍