PuppyGraph is the first and only real time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles. Capable of scaling with petabytes of data and executing complex 10-hop queries in seconds, PuppyGraph supports use cases from enhancing LLMs with knowledge graphs to fraud detection, cybersecurity and more. Trusted by industry leaders, including Coinbase, AMD, Netskope, Palo Alto Network, eBay, and more.

How does PuppyGraph compare to Neo4j?

Unlike Neo4j, which requires you to load and sync data into its proprietary graph store, PuppyGraph runs directly on your data sources—eliminating ETL, reducing TCO, and enabling faster time-to-value. PuppyGraph also integrates natively with Databricks Unity Catalog, Google BigQuery, and AlloyDB.

What are the performance benefits of PuppyGraph?

PuppyGraph delivers multi-hop traversals in seconds over billions of edges. Real customer stories cite 5-hop queries on 1B+ edges in under 3 seconds.

Does PuppyGraph support my cloud data stack?

Yes. PuppyGraph natively integrates with Databricks Unity Catalog, Google BigQuery, AlloyDB, and AWS, keeping a single governed copy of your data.

How does PuppyGraph handle data governance and security?

PuppyGraph leverages your existing catalog and security (Unity Catalog, BigQuery, AlloyDB), so all graph queries respect your current access controls.

Can PuppyGraph power AI and LLM applications (GraphRAG)?

Yes. PuppyGraph enables Graph-based Retrieval Augmented Generation (GraphRAG) directly on your governed data—providing explainable, multi-hop context for LLMs and enterprise AI.

See all articles

Table of Contents

Introduction to MySQL

Graph Database

Nebula Graph vs Neo4j: Key Differences & Comparison

Matt Tanner

Head of Developer Relations

No items found.

January 16, 2026

Nebula Graph vs Neo4j: Key Differences & Comparison

Choosing between Nebula Graph and Neo4j comes down to several factors, since both differ in architecture, which then impacts performance and operations, and which workloads each platform handles best. Understanding how each platform will handle your use case is critical to ensuring you move forward with the best option.

Nebula Graph and Neo4j represent two very different approaches to graph databases. Neo4j pioneered the native graph database category with its index-free adjacency model, which stores relationships as physical pointers between nodes, enabling predictable, low-latency traversal. Nebula Graph is an open-source distributed graph database that separates compute from storage, built for horizontal scaling across very large, multi-billion-scale graphs.

Performance depends heavily on workload type and scale. Neo4j excels at transactional queries with low hop counts on interconnected data. Use cases include fraud detection, identity resolution, and recommendation engines. Nebula Graph handles write-intensive workloads and graph analytics across large-scale datasets, including knowledge graphs spanning billions of entities, social network analysis, and distributed fraud detection systems.

This comparison examines the core technical differences between these popular graph databases. We'll cover how they store graph data, execute graph queries, scale, and what workloads each handles best. We'll also cover PuppyGraph, which runs graph analytics directly on existing data infrastructure without ETL or separate graph storage.

Get Started with PuppyGraph for FREE

What is Nebula Graph?

Nebula Graph is an open-source distributed graph database that separates compute from storage. It uses a shared-nothing architecture where the query layer (Graph Service), storage layer (Storage Service), and metadata layer (Meta Service) run as independent processes. Nebula Graph partitions data using vertex ID hashing and replicates across nodes using Raft consensus. The architecture can handle very large graphs with tens to hundreds of billions of vertices and edges in large clusters.

The architecture has three core services. The Graph Service (nebula-graphd) is stateless and handles query parsing, optimization, and execution. The Storage Service (nebula-storaged) manages data persistence using RocksDB as the underlying key-value store, with Raft providing distributed consensus. The Meta Service (nebula-metad) stores schema definitions, partition mappings, and cluster configuration. Since the services are independent, you can add more graphd (Nebula’s term for Graph Service nodes) instances for query capacity or more storaged (their term for Storage Service nodes) instances for storage capacity.

Nebula Graph partitions data using vertex ID hashing across storage partitions. Each partition replicates across nodes using Raft consensus. Like other property graphs, Nebula Graph stores vertices and edges with typed properties, and vertices and outgoing edges co-locate in the same partition to minimize cross-partition queries during traversals.

Key Features

For those considering using Nebula, here are some of the key features that differentiate the platform:

nGQL Query Language: Nebula Graph offers nGQL, which has two syntax modes. The native nGQL mode uses imperative commands such as GO, FETCH, and LOOKUP, which feel closer to procedural programming. The openCypher mode provides declarative MATCH patterns in an openCypher-compatible dialect. You can pipe queries together Unix-style with |. Some teams prefer nGQL's explicitness for complex queries, while others prefer Cypher's pattern matching for simpler reads.

Distributed Architecture: Nebula Graph uses a shared-nothing architecture where storage nodes don't share memory or disk. Data is hash-partitioned across storage nodes by vertex ID. Each partition replicates to typically 3 nodes using Raft. This spreads both data and write load across the cluster. The tradeoff is operational complexity. You need to manage partition balancing, handle node additions/removals, and monitor Raft health. Queries that need data from many partitions require network hops between storage nodes.

Storage with RocksDB: Nebula Graph uses RocksDB as its local storage engine on each storage node. RocksDB is a log-structured merge tree (LSM) database optimized for write throughput. Nebula Graph translates graph operations (get neighbors, insert vertex) into RocksDB key-value operations. It uses custom key encoding where a vertex and its outgoing edges live under adjacent keys, so range scans retrieve them together. Each storage node runs its own RocksDB instance with separate write-ahead logs.

Graph Spaces: Nebula Graph lets you create multiple graph spaces (isolated graphs) in one cluster. Each space has its own schema, partition count, and replica factor. Spaces don't share data. They're physically separated at the storage layer. This is useful for multi-tenant deployments where different teams or applications need their own graphs without setting up separate clusters. You can't query across spaces, though. Each query runs against a single space.

Horizontal Scaling: Nebula Graph scales by adding more storage or graph service nodes. To add storage capacity, you add a new storaged node and run a balance command to redistribute partitions. This moves data across the network, so it's not instant. To add query capacity, you add graphd nodes and point clients at them. Since graphd is stateless, this is quick. The architecture supports large graphs, but scaling operations (adding nodes, rebalancing) require manual intervention and monitoring.

Snapshot-Based High Availability: Nebula Graph supports snapshots for backup and disaster recovery. You can take snapshots of the entire cluster or individual spaces. Raft replication handles normal operation, while snapshots provide point-in-time recovery for major failures or corruption.

Get Started with PuppyGraph for FREE

What is Neo4j?

Neo4j is a widely adopted graph database that stores relationships as physical pointers between nodes. Instead of using indexes to find connections, each relationship record contains direct references to its source node, target node, and neighboring relationships in a linked list structure. This means traversal cost scales with the number of relationships you explore, not with the total graph size. Neo4j runs on a single unified storage engine and uses Cypher as its graph query language.

In Neo4j's data model, nodes represent entities and can have multiple labels to indicate type. Relationships are directed edges with a single type, connecting a source node to a target node. Both nodes and relationships can have properties as key-value pairs. Unlike relational databases that spread graph data across multiple tables requiring joins, the database stores all of this in specialized store files. These include separate files for nodes, relationships, properties, and labels. Each record type has a fixed size, so Neo4j can calculate the disk location of any record directly from its ID.

Each relationship record stores pointers to its start node, end node, the relationship type, the property record, and the next relationship in the adjacency chain. Neo4j maintains these as doubly-linked lists for forward or backward traversal. When you query for a node's relationships, Neo4j reads the node record to get the first relationship pointer, then follows the linked list. No index lookups happen during traversal.

Key Features

Many of Neo4j's key features were pioneered by them as one of the earliest graph databases on the market. Here are some of the most critical features that Neo4j brings to the table:

Cypher Query Language: Cypher uses pattern syntax where ASCII art represents graph patterns. For example, (a)-[:KNOWS]->(b) describes nodes connected by a relationship. Queries are declarative, meaning you describe what you want rather than how to get it. The query planner decides the execution strategy, choosing between index lookups and traversals based on estimated costs. Cypher supports aggregations, path finding, and subqueries.

Index-Free Adjacency: Neo4j's traversal performance comes from direct pointer chasing rather than index lookups. When you traverse from node A to node B, Neo4j reads A's relationship pointer, follows it to the relationship record, and jumps to B. Each hop is a few sequential reads. The time to traverse N relationships is O(N), regardless of total graph size. This works well for queries exploring local neighborhoods (1-5 hops) but can get expensive for deep traversals that touch millions of relationships.

ACID Transactions: Neo4j provides ACID transactions with snapshot isolation and locking. Transactions use write locks. When you modify a node or relationship, Neo4j locks it until commit. Concurrent reads see a snapshot from the transaction start. If two transactions conflict (both trying to write the same node), one aborts with a deadlock error. This works well for low-contention workloads but can bottleneck under high write concurrency to the same nodes.

Ecosystem and Tooling: Neo4j has been around since 2007. The ecosystem includes official drivers for major programming languages (Python, Java, JavaScript, Go, .NET), graph visualization tools (Neo4j Bloom, Browser), and a desktop IDE. Community tools include APOC (a plugin library with hundreds of procedures), a graph data science library, and connectors for Spark and Kafka. The longer history means more Stack Overflow answers, tutorials, and third-party integrations than other graph databases.

Schema Flexibility: Neo4j doesn't require schema definitions upfront. You can create nodes with any labels and properties without declaring them first. Indexes and constraints are optional. Add them for query performance or data validation. Nodes with the same label can have different property sets. This lets you iterate quickly during development, but it also means you can accidentally create inconsistent data if you're not careful.

Clustering: Neo4j Enterprise Edition uses a Core + Read Replica architecture. The Core cluster runs Raft consensus with 3, 5, or 7 nodes. Odd numbers avoid split-brain. One Core is the leader and handles all writes. Followers replicate the transaction log and can take over if the leader fails. Read Replicas asynchronously poll Core servers for updates and serve read-only queries. Writes don't scale horizontally. All writes funnel through the single leader. Reads scale by adding more replicas. The Community Edition doesn't include clustering and runs on a single instance.

Nebula Graph vs Neo4j: Feature Comparison

Putting together the key features discussed for both platforms above, here is a more streamlined view of how they stack up against each other, feature-by-feature.

Feature	Nebula Graph	Neo4j
Architecture	Shared-nothing distributed with separate compute/storage services	Unified native graph with index-free adjacency
Query Language	nGQL (SQL-like) + openCypher-compatible	Cypher (openCypher compatible)
Storage Engine	RocksDB key-value store with graph abstraction layer	Native graph storage with pointer-based relationships
Scalability Model	Horizontal scaling for both reads and writes	Read scaling via replicas, single-primary writes
Write Performance	Distributed writes across partitions	Single-primary write coordination
Consistency Model	Raft-based distributed consistency per partition	ACID transactions with serializable isolation
Schema Flexibility	Strong schema with defined tags and edge types	Schema-optional with flexible properties
High Availability	Raft replication with 3+ replicas per partition	Causal clustering with Core + Read Replica architecture
Deployment Options	Self-hosted, Nebula Graph Cloud (AWS, Azure)	Self-hosted, Neo4j Aura (managed cloud)
Graph Spaces	Multiple isolated graph spaces per cluster	Single graph per database instance
Query Compilation	Execution plan optimization	Cypher query planner with cost-based optimization
Backup/Recovery	Cluster snapshots for point-in-time recovery	Hot backups, incremental backups (Enterprise)
OLTP Performance	Millisecond latency for multi-hop queries at scale	Sub-millisecond to low-millisecond latency for localized queries
OLAP Capabilities	Designed for both OLTP and OLAP workloads	Primarily OLTP with integration to external OLAP tools
License	Apache 2.0 (open source)	GPLv3 (Community), Commercial (Enterprise)
Maturity	Active development since 2018	Mature platform since 2007
Ecosystem	Growing community, Spark/Flink connectors	Large tooling ecosystem, broad integrations
Data Import	Batch import via CSV, Spark Writer, streaming	LOAD CSV, Neo4j ETL tools, APOC procedures

When to Choose Nebula Graph vs Neo4j

Now you've come to the point where you need to decide which is the best fit for you. The choice between Nebula Graph and Neo4j depends on your scale requirements, workload characteristics, and operational priorities. At a high-level, here are when you should choose one or the other and why:

For massive-scale distributed writes, Nebula Graph is the better fit. Its shared-nothing architecture provides horizontal scalability across nodes. When you need high-velocity writes across billions of vertices and edges (streaming fraud detection, real-time knowledge graph updates, large-scale social network analysis), Nebula Graph handles distributed writes without funneling through a single primary node. The system partitions data and distributes write load across storage nodes.

For transactional workloads with localized queries, Neo4j is the better choice. Its index-free adjacency delivers predictable sub-millisecond to low-millisecond latency for pattern matching within 1-5 hops. Applications like fraud detection, identity management, recommendation engines, and customer behavior analysis get consistent query performance and mature tooling. Flexible schema means rapid iteration without migration overhead.

The architectures of these platforms give them different advantages and disadvantages in terms of performance, query language support, and operational complexity. Longevity and the ecosystem are also factors to consider as well.

Write performance differs. Nebula Graph partitions writes across distributed storage nodes for sustained high throughput. Neo4j concentrates writes on a primary server (or Core cluster), better suited for read-heavy workloads with moderate writes. For continuous high-volume write streams, Nebula Graph scales better.

Ecosystem maturity matters. Neo4j's longer history provides extensive integrations, visualization tools, driver libraries, and community resources. Cypher's wide adoption as an open standard reduces vendor lock-in. Nebula Graph's ecosystem is growing but less mature, with fewer pre-built integrations.

Query language matters. Neo4j's Cypher is intuitive, widely adopted, and has extensive documentation. Nebula Graph's nGQL offers SQL-like familiarity plus openCypher compatibility for migration flexibility.

Operational complexity matters. Neo4j's integrated architecture is simpler to deploy. Nebula Graph's disaggregated services need more expertise in cluster management and multi-service coordination. If operational simplicity beats scalability, Neo4j reduces DevOps burden.

The final question is, why do you need a separate graph database at all? This is where PuppyGraph enters the conversation, with a simpler, more performant approach than the above.

Get Started with PuppyGraph for FREE

Why Consider PuppyGraph as an Alternative

If you're evaluating graph databases, both Nebula Graph and Neo4j need dedicated infrastructure and continuous data synchronization. You extract data from existing systems, transform it into a graph format, load it into a separate database, and maintain pipelines to keep it current. This ETL complexity creates operational overhead, data duplication, and latency between source updates and graph availability.

PuppyGraph is the first and only real-time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that can be deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles.

It seamlessly integrates with data lakes like Apache Iceberg, Apache Hudi, and Delta Lake, as well as databases including MySQL, PostgreSQL, and DuckDB, so you can query across multiple sources simultaneously.

Figure: PuppyGraph Supported Data Sources

Figure: Example Architecture with PuppyGraph

Key PuppyGraph capabilities include:

Zero ETL: PuppyGraph runs as a query engine on your existing relational databases and lakes. Skip pipeline builds, reduce fragility, and start querying as a graph in minutes.

No Data Duplication: Query your data in place, eliminating the need to copy large datasets into a separate graph database. This ensures data consistency and leverages existing data access controls.

Real Time Analysis: By querying live source data, analyses reflect the current state of the environment, mitigating the problem of relying on static, potentially outdated graph snapshots. PuppyGraph users report 6-hop queries across billions of edges in less than 3 seconds.

Scalable Performance: PuppyGraph’s distributed compute engine scales with your cluster size. Run petabyte-scale workloads and deep traversals like 10-hop neighbors, and get answers back in seconds. This exceptional query performance is achieved through the use of parallel processing and vectorized evaluation technology.

Best of SQL and Graph: Because PuppyGraph queries your data in place, teams can use their existing SQL engines for tabular workloads and PuppyGraph for relationship-heavy analysis, all on the same source tables. No need to force every use case through a graph database or retrain teams on a new query language.

Lower Total Cost of Ownership: Graph databases make you pay twice — once for pipelines, duplicated storage, and parallel governance, and again for the high-memory hardware needed to make them fast. PuppyGraph removes both costs by querying your lake directly with zero ETL and no second system to maintain. No massive RAM bills, no duplicated ACLs, and no extra infrastructure to secure.

Flexible and Iterative Modeling: Using metadata-driven schemas allows creating multiple graph views from the same underlying data. Models can be iterated upon quickly without rebuilding data pipelines, supporting agile analysis workflows.

Standard Querying and Visualization: Support for standard graph query languages (openCypher, Gremlin) and integrated visualization tools helps analysts explore relationships intuitively and effectively.

Proven at Enterprise Scale: PuppyGraph is already used by half of the top 20 cybersecurity companies, as well as engineering-driven enterprises like AMD and Coinbase. Whether it’s multi-hop security reasoning, asset intelligence, or deep relationship queries across massive datasets, these teams trust PuppyGraph to replace slow ETL pipelines and complex graph stacks with a simpler, faster architecture.

Figure: PuppyGraph in-production clients

Figure: What customers and partners are saying about PuppyGraph

As data grows more complex, the most valuable insights often lie in how entities relate. PuppyGraph brings those insights to the surface, whether you’re modeling organizational networks, social introductions, fraud and cybersecurity graphs, or GraphRAG pipelines that trace knowledge provenance.

Figure: Cloud Security Graph Use Case on PuppyGraph UI

Figure: Social Network Use Case on PuppyGraph UI

Figure: eCommerce Use Case on PuppyGraph UI

Figure: Architecture with graph database vs. with PuppyGraph

Deployment is simple: download the free Docker image, connect PuppyGraph to your existing data stores, define graph schemas, and start querying. PuppyGraph can be deployed via Docker, AWS AMI, GCP Marketplace, or within a VPC or data center for full data control.

Get Started with PuppyGraph for FREE

Conclusion

Nebula Graph and Neo4j represent two distinct approaches to graph database architecture. Nebula Graph provides a shared-nothing distributed architecture built for horizontal scaling and write-intensive workloads. Neo4j delivers index-free adjacency and a mature ecosystem built for transactional queries and localized traversals.

Neo4j fits teams that want transactional consistency, development velocity, and mature tooling for applications with moderate scale and read-heavy workloads. Nebula Graph fits organizations building massive-scale graphs with distributed write requirements and infrastructure teams that can manage distributed systems.

Beyond traditional graph databases, PuppyGraph provides graph intelligence without requiring you to duplicate data or manage specialized infrastructure. This means you can establish graph capabilities directly on your existing databases and data lakes, without ETL or separate graph storage.

To see how it all works, get started with PuppyGraph's forever-free Developer edition. You can also book a demo today to talk with our graph experts.

No items found.

Matt Tanner

Head of Developer Relations

Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.

Nebula Graph vs Neo4j: Key Differences & Comparison

What is Nebula Graph?

Key Features

What is Neo4j?

Key Features

Nebula Graph vs Neo4j: Feature Comparison

When to Choose Nebula Graph vs Neo4j

Why Consider PuppyGraph as an Alternative

Conclusion

See PuppyGraph
In Action

See PuppyGraph
In Action

Dev Edition

Enterprise Edition

Developer

Enterprise

Nebula Graph vs Neo4j: Key Differences & Comparison

What is Nebula Graph?

Key Features

What is Neo4j?

Key Features

Nebula Graph vs Neo4j: Feature Comparison

When to Choose Nebula Graph vs Neo4j

Why Consider PuppyGraph as an Alternative

Conclusion

See PuppyGraphIn Action

See PuppyGraphIn Action

Dev Edition

Enterprise Edition

Developer

Enterprise

See PuppyGraph
In Action

See PuppyGraph
In Action