PuppyGraph is the first and only real time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles. Capable of scaling with petabytes of data and executing complex 10-hop queries in seconds, PuppyGraph supports use cases from enhancing LLMs with knowledge graphs to fraud detection, cybersecurity and more. Trusted by industry leaders, including Coinbase, AMD, Netskope, Palo Alto Network, eBay, and more.

How does PuppyGraph compare to Neo4j?

Unlike Neo4j, which requires you to load and sync data into its proprietary graph store, PuppyGraph runs directly on your data sources—eliminating ETL, reducing TCO, and enabling faster time-to-value. PuppyGraph also integrates natively with Databricks Unity Catalog, Google BigQuery, and AlloyDB.

What are the performance benefits of PuppyGraph?

PuppyGraph delivers multi-hop traversals in seconds over billions of edges. Real customer stories cite 5-hop queries on 1B+ edges in under 3 seconds.

Does PuppyGraph support my cloud data stack?

Yes. PuppyGraph natively integrates with Databricks Unity Catalog, Google BigQuery, AlloyDB, and AWS, keeping a single governed copy of your data.

How does PuppyGraph handle data governance and security?

PuppyGraph leverages your existing catalog and security (Unity Catalog, BigQuery, AlloyDB), so all graph queries respect your current access controls.

Can PuppyGraph power AI and LLM applications (GraphRAG)?

Yes. PuppyGraph enables Graph-based Retrieval Augmented Generation (GraphRAG) directly on your governed data—providing explainable, multi-hop context for LLMs and enterprise AI.

See all articles

Table of Contents

Introduction to MySQL

Graph Database

Choosing Between ArangoDB and Dgraph: A Developer’s Guide

Matt Tanner

Head of Developer Relations

No items found.

December 12, 2025

Choosing Between ArangoDB and Dgraph: A Developer’s Guide

In general, graph databases excel in use cases where relationships between entities drive application logic. But differences in graph database capabilities and architectures fundamentally shape the development experience, operational overhead, and system performance. In short, not all graph databases are created equal, as they aren't a "one size fits all".

That said, ArangoDB and Dgraph represent distinct philosophies for querying and storing graph data. ArangoDB unifies multiple data models, including document, key-value, and graph capabilities under a single engine with AQL. Dgraph was built from the ground up around GraphQL, making it the primary database interface rather than an API layer.

ArangoDB's self-contained cluster simplifies deployment but faces eventual scaling limits. Dgraph's distributed-first design scales horizontally through predicate-based sharding but requires understanding distributed coordination.

The differences between these two technologies run deep. In this article, we will examine both systems across architecture, query models, and operational characteristics, exploring practical trade-offs that matter in production. We'll also examine when PuppyGraph's zero-ETL approach might eliminate the need for data migration (and a separate graph database) entirely.

Get Started with PuppyGraph for FREE

What is ArangoDB?

Two stylized avocado halves and the product name. — Figure: ArangoDB Logo

ArangoDB is a distributed, multi-model database that unifies document, graph, and key-value data under a single engine and query language. Rather than deploying specialized databases for different access patterns, teams can model interconnected data structures within a single system.

The architectural approach treats generalization as an advantage. One storage engine, one query planner, and one optimizer handle all data models. Developers query documents and traverse relationships using AQL (ArangoDB Query Language) without building ETL pipelines or managing separate engines.

It's important to note that starting with version 3.12 (released in 2024), ArangoDB shifted from Apache 2.0 to BSL 1.1 (Business Source License), restricting specific commercial uses without agreement. The license converts to Apache 2.0 after four years.

Architecture Overview

ArangoDB operates through a shared-nothing cluster with three specialized layers:

Coordinators process incoming queries, breaking them into execution plans and routing work across shards. These stateless nodes scale horizontally for concurrent connections.
DB-Servers store data shards and run query operations, with collections automatically distributed based on defined keys.
Agents implement Raft consensus to coordinate cluster state, handle node failures, and ensure metadata consistency.

This separation enables distributed queries and writes with consistent performance characteristics across replicas.

SmartGraphs and Data Models

Community edition traversals can span shards, creating coordination costs as queries fetch from multiple DB-Servers. For high-performance distributed graphs, Enterprise SmartGraphs co-locates related vertices and edges on the same shards, reducing network hops during queries.

ArangoDB natively supports three data patterns: documents (JSON collections), key-value pairs (direct retrieval), and graphs (vertices and edges as document structures). Developers query across all three using AQL without synchronizing between separate systems.

Get Started with PuppyGraph for FREE

AQL and Performance

AQL combines SQL-like syntax with graph-specific functions. A simple traversal query could look something like this:

FOR v, e IN 1..3 OUTBOUND "users/alex" GRAPH "social"
  RETURN v.name

In this example, the query returns all vertices reachable from Alex in three hops. The query planner automatically optimizes execution across shards and indexes using rule-based optimization with cost-based pruning.

Under the hood, ArangoDB uses RocksDB for LSM-tree storage, providing efficient operations under high write throughput. Graph performance depends on data locality: within-shard traversals achieve near in-memory speeds, while cross-shard traversals incur coordination overhead. Overall, the system performs best on hybrid workloads that mix transactional updates with graph queries rather than ultra-deep traversals.

Key Features

Some of ArangoDB's most used and talked about features include:

Multi-Model Flexibility: ArangoDB's unified engine eliminates the need to synchronize data between separate document stores and graph databases. Query documents as collections, traverse relationships as graphs, or perform direct key-value lookups, all within the same transaction.

AQL Query Language: A declarative language that balances SQL familiarity with graph-specific capabilities. Supports complex joins, aggregations, and multi-hop traversals with consistent syntax.

Built-in High Availability: Raft-based coordination between Agents and automatic replication between DB-Servers provides predictable failover. Replication is asynchronous by default, with configurable synchronous replication for critical collections.

Enterprise SmartGraphs: Ensures related graph data resides on the same shard, dramatically improving traversal performance for distributed workloads. Available in the Enterprise Edition.

Operational Simplicity: Self-contained cluster architecture with fewer moving parts than modular graph systems. The built-in backup service or ArangoGraph cloud platform simplifies management.

Security and Access Control: Native role-based access control, encrypted communication through TLS, and audit logging for compliance requirements.

What is Dgraph?

Dgraph is a horizontally scalable, distributed, and native graph database designed for real-time queries across connected datasets. Founded by Manish Rai Jain in 2015, Dgraph's defining characteristic is treating GraphQL as a first-class database interface rather than an API adaptation layer.

The architecture prioritizes distributed-first design with GraphQL at its core. Rather than translating GraphQL into another query language, Dgraph interprets GraphQL schemas directly and executes efficient graph operations from GraphQL requests out of the box.

Architecture Overview

Dgraph distributes responsibilities across specialized components:

Dgraph Zero orchestrates cluster membership, server group assignments, and data placement.

Dgraph Alpha nodes handle graph data storage and parallel query execution, with predicates sharded across the cluster. When queries touch multiple predicates, Alpha nodes coordinate retrieval and merge operations.

Ratel offers an optional web interface for visualization and query testing.

Additionally, BadgerDB (another technology created by DGraph) provides the storage layer, a Go-based key-value store integrated as an embedded library, removing the need to manage external storage infrastructure.

Data sharding follows predicates (properties) rather than vertices, grouping all instances of each relationship type. This benefits query filtering on specific edges. Dgraph automatically redistributes the predicate "tablets" across Alpha groups for load balancing, though individual predicates can't be split, so capacity planning is required for hotspots.

Raft consensus drives replication, delivering strong consistency. With 2N+1 replicas, up to N node failures won't disrupt operations.

Data Model and Query Languages

Dgraph uses the property graph model, where vertices and edges carry arbitrary key-value attributes. Two query interfaces are available:

GraphQL functions as the core API, integrated into the database engine. Schema definition automatically generates a working API with queries, mutations, and subscriptions, no middleware required. This means that users can issue GraphQL queries like the one below directly from their applications if desired:

query {
  queryUser(filter: { name: { eq: "Alex" } }) {
    name
    email
    friends {
      name
      location
    }
  }
}

Another great piece of DGraph is DQL (Dgraph Query Language), which handles advanced scenarios that GraphQL can't express. This includes stuff like recursive patterns, complex aggregations, and analytical queries with custom filters. In practice, teams typically route application APIs through GraphQL while using DQL for analytical processing.

Consistency, Transactions, and Performance

Dgraph provides distributed ACID transactions with configurable consistency levels. Optimistic concurrency control through MVCC enables high read concurrency while maintaining isolation. Transactions span the entire cluster, with linearizable reads ensuring clients see their own writes immediately.

The distributed architecture scales horizontally. Single-hop traversals benefit from predicate-based sharding. Multi-hop queries require cross-shard coordination, typically delivering millisecond-to-sub-second latency for moderate-depth queries in datasets with billions of edges.

Write throughput scales with cluster size as Alpha nodes process mutations in parallel. Raft replication ensures durability but adds latency since a quorum must acknowledge writes. The query planner automatically selects efficient execution strategies based on filter conditions and indexes.

Key Features

Although Dgraph has somewhat slowed down on innovation in the last few years, due to company shifts, there are still many core features that keep users coming back to the tech, including:

Native GraphQL Support: GraphQL is not a layer on top of Dgraph but the primary interface. Define your schema once and get a complete API without writing resolvers or middleware.

Distributed Architecture: Predicate-based sharding enables horizontal scaling. Add Alpha nodes to distribute more predicates across the cluster and increase capacity.

Strong Consistency: Raft-based replication provides ACID transactions with configurable consistency levels. Ensure critical operations maintain data integrity even in the face of node failures.

DQL for Advanced Operations: When GraphQL's declarative model isn't enough, DQL provides procedural control for complex analytical queries and recursive traversals.

Embedded Storage: BadgerDB is integrated as a library rather than a separate service. This simplifies deployment; there's no separate storage cluster to configure and maintain.

Security and Access Control: TLS encryption, predicate-level access control lists, and audit logging in enterprise configurations. GraphQL compatibility means existing GraphQL tools and security patterns apply directly.

Get Started with PuppyGraph for FREE

ArangoDB vs Dgraph: Feature Comparison

To make this comparison easier, let's quickly break down things to show core features and how each of the technologies compares:

Feature	ArangoDB	Dgraph
Data Model	Multi-model (document, graph, key-value)	Property graph with native GraphQL
Query Language	AQL (SQL-like with graph extensions)	GraphQL and DQL
Architecture	Self-contained cluster (Coordinators, DB-Servers, Agents)	Distributed with separate node types (Zero, Alpha)
Storage Engine	RocksDB (LSM-tree)	BadgerDB (embedded key-value store)
Sharding Strategy	Vertex-based sharding (SmartGraphs in Enterprise)	Predicate-based sharding
Consistency Model	Configurable (async/sync replication)	Strong consistency with Raft consensus
Transaction Support	ACID transactions	Distributed ACID with MVCC
GraphQL Support	Third-party adapters	Native, first-class support
Clustering	Raft-based coordination via Agents	Raft for replication within predicate groups
Traversal Performance	Excellent within-shard, overhead across shards	Optimized for 1-5 hops, coordination for deeper queries
Multi-Model Queries	Native support across documents and graphs	Graph-only (requires separate systems for documents)
Operational Complexity	Moderate (self-contained cluster)	Lower for small deployments (embedded storage)
Horizontal Scalability	Limited by shard distribution, SmartGraphs help	Linear for predicates, hotspot predicates can bottleneck
Open Source License	BSL 1.1 for v3.12+ (converts to Apache 2.0 after 4 years), commercial (enterprise)	Apache 2.0
Cloud Options	ArangoGraph (managed service)	Dgraph Cloud (now deprecated, closing down)
Best For	Hybrid workloads mixing documents and graphs	GraphQL-first applications with moderate traversals

When to Choose ArangoDB vs Dgraph

From our review above, it's clear that the two databases are quite different. So which one is the best fit for your particular use case?

When to Choose ArangoDB

Choose ArangoDB when applications require multi-model capabilities without managing multiple specialized systems.

ArangoDB's single-engine architecture means documents, key-values, and graphs share one optimizer and transaction layer. This fits applications where relationships matter but don't dominate, analytics dashboards correlating metrics with service dependencies, supply chains managing inventory while tracking logistics networks, or product catalogs combining full-text search with recommendation graphs.

Eliminating data synchronization between separate systems simplifies application logic since everything uses AQL. Operationally, the self-contained cluster architecture is less complex than systems requiring distributed storage and indexing backends, making it suitable for smaller teams prioritizing stability.

Performance-wise, ArangoDB suits mid-scale graphs with hundreds of millions of vertices and relatively localized traversal patterns. As graphs approach billions of edges with extensively distributed queries, more horizontally-focused architectures become advantageous.

Choose ArangoDB if you need:

Unified querying across documents and graph relationships
SQL-familiar query syntax with graph extensions
Self-contained cluster operations
Mid-scale graphs with localized patterns
Enterprise SmartGraphs for distributed optimization

When to Choose Dgraph

Choose Dgraph when GraphQL forms your application's core interface and translation layers create friction.

Native GraphQL integration means schema definition directly generates a complete API, no resolvers, no middleware, no synchronization between API and database layers. For teams already building on GraphQL frontends, this architectural alignment accelerates development. Your GraphQL schema becomes your database schema, with queries executing directly against the graph engine.

Raft-based distributed architecture provides ACID transaction guarantees critical for applications demanding guaranteed consistency and immediate read-after-write visibility.

As a graph-only system, Dgraph requires complementary solutions for document storage, full-text indexing, or key-value operations. BadgerDB's embedded design simplifies graph workload deployment but doesn't eliminate multi-system architectures when applications need diverse data models.

Dgraph fits GraphQL-first development, where data is naturally modeled as a graph and typical query patterns span moderate depths (1-5 hops) rather than deep analytical traversals.

Choose Dgraph if you:

Want GraphQL without translation overhead
Build applications where GraphQL is the standard
Require strong consistency with distributed ACID
Focus primarily on graph data versus mixed models
Work with 1-5 hop query patterns
Prefer Apache 2.0 open-source licensing

Which One is Right for You?

The choice between these systems depends on architectural requirements rather than feature comparisons.

Select ArangoDB for applications mixing data models, when you need SQL-familiar querying with multi-model support and straightforward cluster operations. The integrated architecture reduces system complexity but encounters horizontal scaling boundaries that enterprise SmartGraphs partially mitigate. Optimal for mid-scale scenarios where graph queries combine with document operations.

Select Dgraph when GraphQL drives your architecture, and consistency guarantees are non-negotiable. Native GraphQL removes impedance mismatches, accelerating GraphQL-centric development. The distributed design scales horizontally but remains graph-focused, necessitating supplementary systems for document or key-value requirements. Ideal for moderate-depth traversals (1-5 hops) in GraphQL-aligned architectures.

Critical decision points to factor into your decision include:

data model scope (unified vs. graph-focused)
traversal depth (deep analytics vs. moderate patterns)
team background (SQL vs. GraphQL)
growth trajectory (integrated vs. distributed)
and operational priorities (simplified vs. specialized)

Why Consider PuppyGraph as an Alternative

While ArangoDB and Dgraph each offer compelling architectures for managing connected data, both ultimately require moving information into their own storage engines and maintaining separate graph infrastructures.

And when your analytics depends on deep, multi-hop traversals, both systems can show strain in different ways. Dgraph can be fast for moderate-depth queries, but predicate-based sharding can create hotspot bottlenecks and deeper traversals often require more cross-group coordination, increasing latency. ArangoDB offers flexibility through its multi-model engine, but its integrated cluster faces eventual scaling limits, and traversal performance is highly dependent on locality, so deep traversals that cross shards can degrade query performance.

For teams seeking graph insights without the operational cost, latency, and duplication that come with traditional databases, a different approach is emerging.

PuppyGraph is the first and only real-time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that can be deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles.

It seamlessly integrates with data lakes like Apache Iceberg, Apache Hudi, and Delta Lake, as well as databases including MySQL, PostgreSQL, and DuckDB, so you can query across multiple sources simultaneously.

Figure: PuppyGraph Supported Data Sources

Figure: Example Architecture with PuppyGraph

Key PuppyGraph capabilities include:

Zero ETL: PuppyGraph runs as a query engine on your existing relational databases and lakes. Skip pipeline builds, reduce fragility, and start querying as a graph in minutes.

No Data Duplication: Query your data in place, eliminating the need to copy large datasets into a separate graph database. This ensures data consistency and leverages existing data access controls.

Real Time Analysis: By querying live source data, analyses reflect the current state of the environment, mitigating the problem of relying on static, potentially outdated graph snapshots. PuppyGraph users report 6-hop queries across billions of edges in less than 3 seconds.

Scalable Performance: PuppyGraph’s distributed compute engine scales with your cluster size. Run petabyte-scale workloads and deep traversals like 10-hop neighbors, and get answers back in seconds. This exceptional query performance is achieved through the use of parallel processing and vectorized evaluation technology.

Best of SQL and Graph: Because PuppyGraph queries your data in place, teams can use their existing SQL engines for tabular workloads and PuppyGraph for relationship-heavy analysis, all on the same source tables. No need to force every use case through a graph database or retrain teams on a new query language.

Lower Total Cost of Ownership: Graph databases make you pay twice — once for pipelines, duplicated storage, and parallel governance, and again for the high-memory hardware needed to make them fast. PuppyGraph removes both costs by querying your lake directly with zero ETL and no second system to maintain. No massive RAM bills, no duplicated ACLs, and no extra infrastructure to secure.

Flexible and Iterative Modeling: Using metadata driven schemas allows creating multiple graph views from the same underlying data. Models can be iterated upon quickly without rebuilding data pipelines, supporting agile analysis workflows.

Standard Querying and Visualization: Support for standard graph query languages (openCypher, Gremlin) and integrated visualization tools helps analysts explore relationships intuitively and effectively.

Proven at Enterprise Scale: PuppyGraph is already used by half of the top 20 cybersecurity companies, as well as engineering-driven enterprises like AMD and Coinbase. Whether it’s multi-hop security reasoning, asset intelligence, or deep relationship queries across massive datasets, these teams trust PuppyGraph to replace slow ETL pipelines and complex graph stacks with a simpler, faster architecture.

Figure: PuppyGraph in-production clients

Figure: What customers and partners are saying about PuppyGraph

As data grows more complex, the most valuable insights often lie in how entities relate. PuppyGraph brings those insights to the surface, whether you’re modeling organizational networks, social introductions, fraud and cybersecurity graphs, or GraphRAG pipelines that trace knowledge provenance.

Figure: Architecture with graph database vs. with PuppyGraph

Deployment is simple: download the free Docker image, connect PuppyGraph to your existing data stores, define graph schemas, and start querying. PuppyGraph can be deployed via Docker, AWS AMI, GCP Marketplace, or within a VPC or data center for full data control.

Get Started with PuppyGraph for FREE

Conclusion

ArangoDB delivers multi-model capabilities with SQL-like querying for teams managing hybrid workloads. Its self-contained architecture simplifies operations for mid-scale deployments. Dgraph provides native GraphQL integration with strong consistency guarantees, ideal for GraphQL-first applications requiring distributed transactions.

Both require migrating data into dedicated systems. PuppyGraph eliminates this overhead by querying your existing data infrastructure directly. No ETL pipelines, no data duplication, no separate graph database to maintain.

For graph analytics on your current data stores, download PuppyGraph's free Developer Edition or book a demo with our team.

No items found.

Matt Tanner

Head of Developer Relations

Matt is a developer at heart with a passion for data, software architecture, and writing technical content. In the past, Matt worked at some of the largest finance and insurance companies in Canada before pivoting to working for fast-growing startups.

Get started with PuppyGraph!

PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model.

Developer Edition

Forever free
Single noded
Designed for proving your ideas
Available via Docker install

Free Download

Enterprise Edition

30-day free trial with full features
Everything in developer edition & enterprise features
Designed for production
Available via AWS AMI & Docker install

* No payment required

Start Free Trial

Book Demo

Choosing Between ArangoDB and Dgraph: A Developer’s Guide

What is ArangoDB?

Architecture Overview

SmartGraphs and Data Models

AQL and Performance

Key Features

What is Dgraph?

Architecture Overview

Data Model and Query Languages

Consistency, Transactions, and Performance

Key Features

ArangoDB vs Dgraph: Feature Comparison

When to Choose ArangoDB vs Dgraph

When to Choose ArangoDB

When to Choose Dgraph

Which One is Right for You?

Why Consider PuppyGraph as an Alternative

Conclusion

See PuppyGraph
In Action

See PuppyGraph
In Action

Get started with PuppyGraph!

Dev Edition

Enterprise Edition

Developer

Enterprise

Developer Edition

Enterprise Edition

Choosing Between ArangoDB and Dgraph: A Developer’s Guide

What is ArangoDB?

Architecture Overview

SmartGraphs and Data Models

AQL and Performance

Key Features

What is Dgraph?

Architecture Overview

Data Model and Query Languages

Consistency, Transactions, and Performance

Key Features

ArangoDB vs Dgraph: Feature Comparison

When to Choose ArangoDB vs Dgraph

When to Choose ArangoDB

When to Choose Dgraph

Which One is Right for You?

Why Consider PuppyGraph as an Alternative

Conclusion

See PuppyGraphIn Action

See PuppyGraphIn Action

Get started with PuppyGraph!

Dev Edition

Enterprise Edition

Developer

Enterprise

Developer Edition

Enterprise Edition

See PuppyGraph
In Action

See PuppyGraph
In Action