Choosing Between ArangoDB and Dgraph: A Developer’s Guide

In general, graph databases excel in use cases where relationships between entities drive application logic. But differences in graph database capabilities and architectures fundamentally shape the development experience, operational overhead, and system performance. In short, not all graph databases are created equal, as they aren't a "one size fits all".
That said, ArangoDB and Dgraph represent distinct philosophies for querying and storing graph data. ArangoDB unifies multiple data models, including document, key-value, and graph capabilities under a single engine with AQL. Dgraph was built from the ground up around GraphQL, making it the primary database interface rather than an API layer.
ArangoDB's self-contained cluster simplifies deployment but faces eventual scaling limits. Dgraph's distributed-first design scales horizontally through predicate-based sharding but requires understanding distributed coordination.
The differences between these two technologies run deep. In this article, we will examine both systems across architecture, query models, and operational characteristics, exploring practical trade-offs that matter in production. We'll also examine when PuppyGraph's zero-ETL approach might eliminate the need for data migration (and a separate graph database) entirely.
What is ArangoDB?

ArangoDB is a distributed, multi-model database that unifies document, graph, and key-value data under a single engine and query language. Rather than deploying specialized databases for different access patterns, teams can model interconnected data structures within a single system.
The architectural approach treats generalization as an advantage. One storage engine, one query planner, and one optimizer handle all data models. Developers query documents and traverse relationships using AQL (ArangoDB Query Language) without building ETL pipelines or managing separate engines.
It's important to note that starting with version 3.12 (released in 2024), ArangoDB shifted from Apache 2.0 to BSL 1.1 (Business Source License), restricting specific commercial uses without agreement. The license converts to Apache 2.0 after four years.
Architecture Overview
ArangoDB operates through a shared-nothing cluster with three specialized layers:
- Coordinators process incoming queries, breaking them into execution plans and routing work across shards. These stateless nodes scale horizontally for concurrent connections.
- DB-Servers store data shards and run query operations, with collections automatically distributed based on defined keys.
- Agents implement Raft consensus to coordinate cluster state, handle node failures, and ensure metadata consistency.
This separation enables distributed queries and writes with consistent performance characteristics across replicas.
SmartGraphs and Data Models
Community edition traversals can span shards, creating coordination costs as queries fetch from multiple DB-Servers. For high-performance distributed graphs, Enterprise SmartGraphs co-locates related vertices and edges on the same shards, reducing network hops during queries.
ArangoDB natively supports three data patterns: documents (JSON collections), key-value pairs (direct retrieval), and graphs (vertices and edges as document structures). Developers query across all three using AQL without synchronizing between separate systems.
AQL and Performance
AQL combines SQL-like syntax with graph-specific functions. A simple traversal query could look something like this:
FOR v, e IN 1..3 OUTBOUND "users/alex" GRAPH "social"
RETURN v.nameIn this example, the query returns all vertices reachable from Alex in three hops. The query planner automatically optimizes execution across shards and indexes using rule-based optimization with cost-based pruning.
Under the hood, ArangoDB uses RocksDB for LSM-tree storage, providing efficient operations under high write throughput. Graph performance depends on data locality: within-shard traversals achieve near in-memory speeds, while cross-shard traversals incur coordination overhead. Overall, the system performs best on hybrid workloads that mix transactional updates with graph queries rather than ultra-deep traversals.
Key Features
Some of ArangoDB's most used and talked about features include:
- Multi-Model Flexibility: ArangoDB's unified engine eliminates the need to synchronize data between separate document stores and graph databases. Query documents as collections, traverse relationships as graphs, or perform direct key-value lookups, all within the same transaction.
- AQL Query Language: A declarative language that balances SQL familiarity with graph-specific capabilities. Supports complex joins, aggregations, and multi-hop traversals with consistent syntax.
- Built-in High Availability: Raft-based coordination between Agents and automatic replication between DB-Servers provides predictable failover. Replication is asynchronous by default, with configurable synchronous replication for critical collections.
- Enterprise SmartGraphs: Ensures related graph data resides on the same shard, dramatically improving traversal performance for distributed workloads. Available in the Enterprise Edition.
- Operational Simplicity: Self-contained cluster architecture with fewer moving parts than modular graph systems. The built-in backup service or ArangoGraph cloud platform simplifies management.
- Security and Access Control: Native role-based access control, encrypted communication through TLS, and audit logging for compliance requirements.
What is Dgraph?

Dgraph is a horizontally scalable, distributed, and native graph database designed for real-time queries across connected datasets. Founded by Manish Rai Jain in 2015, Dgraph's defining characteristic is treating GraphQL as a first-class database interface rather than an API adaptation layer.
The architecture prioritizes distributed-first design with GraphQL at its core. Rather than translating GraphQL into another query language, Dgraph interprets GraphQL schemas directly and executes efficient graph operations from GraphQL requests out of the box.
Architecture Overview
Dgraph distributes responsibilities across specialized components:
- Dgraph Zero orchestrates cluster membership, server group assignments, and data placement.
- Dgraph Alpha nodes handle graph data storage and parallel query execution, with predicates sharded across the cluster. When queries touch multiple predicates, Alpha nodes coordinate retrieval and merge operations.
- Ratel offers an optional web interface for visualization and query testing.
Additionally, BadgerDB (another technology created by DGraph) provides the storage layer, a Go-based key-value store integrated as an embedded library, removing the need to manage external storage infrastructure.
Data sharding follows predicates (properties) rather than vertices, grouping all instances of each relationship type. This benefits query filtering on specific edges. Dgraph automatically redistributes the predicate "tablets" across Alpha groups for load balancing, though individual predicates can't be split, so capacity planning is required for hotspots.
Raft consensus drives replication, delivering strong consistency. With 2N+1 replicas, up to N node failures won't disrupt operations.
Data Model and Query Languages
Dgraph uses the property graph model, where vertices and edges carry arbitrary key-value attributes. Two query interfaces are available:
GraphQL functions as the core API, integrated into the database engine. Schema definition automatically generates a working API with queries, mutations, and subscriptions, no middleware required. This means that users can issue GraphQL queries like the one below directly from their applications if desired:
query {
queryUser(filter: { name: { eq: "Alex" } }) {
name
email
friends {
name
location
}
}
}Another great piece of DGraph is DQL (Dgraph Query Language), which handles advanced scenarios that GraphQL can't express. This includes stuff like recursive patterns, complex aggregations, and analytical queries with custom filters. In practice, teams typically route application APIs through GraphQL while using DQL for analytical processing.
Consistency, Transactions, and Performance
Dgraph provides distributed ACID transactions with configurable consistency levels. Optimistic concurrency control through MVCC enables high read concurrency while maintaining isolation. Transactions span the entire cluster, with linearizable reads ensuring clients see their own writes immediately.
The distributed architecture scales horizontally. Single-hop traversals benefit from predicate-based sharding. Multi-hop queries require cross-shard coordination, typically delivering millisecond-to-sub-second latency for moderate-depth queries in datasets with billions of edges.
Write throughput scales with cluster size as Alpha nodes process mutations in parallel. Raft replication ensures durability but adds latency since a quorum must acknowledge writes. The query planner automatically selects efficient execution strategies based on filter conditions and indexes.
Key Features
Although Dgraph has somewhat slowed down on innovation in the last few years, due to company shifts, there are still many core features that keep users coming back to the tech, including:
- Native GraphQL Support: GraphQL is not a layer on top of Dgraph but the primary interface. Define your schema once and get a complete API without writing resolvers or middleware.
- Distributed Architecture: Predicate-based sharding enables horizontal scaling. Add Alpha nodes to distribute more predicates across the cluster and increase capacity.
- Strong Consistency: Raft-based replication provides ACID transactions with configurable consistency levels. Ensure critical operations maintain data integrity even in the face of node failures.
- DQL for Advanced Operations: When GraphQL's declarative model isn't enough, DQL provides procedural control for complex analytical queries and recursive traversals.
- Embedded Storage: BadgerDB is integrated as a library rather than a separate service. This simplifies deployment; there's no separate storage cluster to configure and maintain.
- Security and Access Control: TLS encryption, predicate-level access control lists, and audit logging in enterprise configurations. GraphQL compatibility means existing GraphQL tools and security patterns apply directly.
ArangoDB vs Dgraph: Feature Comparison
To make this comparison easier, let's quickly break down things to show core features and how each of the technologies compares:
When to Choose ArangoDB vs Dgraph
From our review above, it's clear that the two databases are quite different. So which one is the best fit for your particular use case?
When to Choose ArangoDB
Choose ArangoDB when applications require multi-model capabilities without managing multiple specialized systems.
ArangoDB's single-engine architecture means documents, key-values, and graphs share one optimizer and transaction layer. This fits applications where relationships matter but don't dominate, analytics dashboards correlating metrics with service dependencies, supply chains managing inventory while tracking logistics networks, or product catalogs combining full-text search with recommendation graphs.
Eliminating data synchronization between separate systems simplifies application logic since everything uses AQL. Operationally, the self-contained cluster architecture is less complex than systems requiring distributed storage and indexing backends, making it suitable for smaller teams prioritizing stability.
Performance-wise, ArangoDB suits mid-scale graphs with hundreds of millions of vertices and relatively localized traversal patterns. As graphs approach billions of edges with extensively distributed queries, more horizontally-focused architectures become advantageous.
Choose ArangoDB if you need:
- Unified querying across documents and graph relationships
- SQL-familiar query syntax with graph extensions
- Self-contained cluster operations
- Mid-scale graphs with localized patterns
- Enterprise SmartGraphs for distributed optimization
When to Choose Dgraph
Choose Dgraph when GraphQL forms your application's core interface and translation layers create friction.
Native GraphQL integration means schema definition directly generates a complete API, no resolvers, no middleware, no synchronization between API and database layers. For teams already building on GraphQL frontends, this architectural alignment accelerates development. Your GraphQL schema becomes your database schema, with queries executing directly against the graph engine.
Raft-based distributed architecture provides ACID transaction guarantees critical for applications demanding guaranteed consistency and immediate read-after-write visibility.
As a graph-only system, Dgraph requires complementary solutions for document storage, full-text indexing, or key-value operations. BadgerDB's embedded design simplifies graph workload deployment but doesn't eliminate multi-system architectures when applications need diverse data models.
Dgraph fits GraphQL-first development, where data is naturally modeled as a graph and typical query patterns span moderate depths (1-5 hops) rather than deep analytical traversals.
Choose Dgraph if you:
- Want GraphQL without translation overhead
- Build applications where GraphQL is the standard
- Require strong consistency with distributed ACID
- Focus primarily on graph data versus mixed models
- Work with 1-5 hop query patterns
- Prefer Apache 2.0 open-source licensing
Which One is Right for You?
The choice between these systems depends on architectural requirements rather than feature comparisons.
Select ArangoDB for applications mixing data models, when you need SQL-familiar querying with multi-model support and straightforward cluster operations. The integrated architecture reduces system complexity but encounters horizontal scaling boundaries that enterprise SmartGraphs partially mitigate. Optimal for mid-scale scenarios where graph queries combine with document operations.
Select Dgraph when GraphQL drives your architecture, and consistency guarantees are non-negotiable. Native GraphQL removes impedance mismatches, accelerating GraphQL-centric development. The distributed design scales horizontally but remains graph-focused, necessitating supplementary systems for document or key-value requirements. Ideal for moderate-depth traversals (1-5 hops) in GraphQL-aligned architectures.
Critical decision points to factor into your decision include:
- data model scope (unified vs. graph-focused)
- traversal depth (deep analytics vs. moderate patterns)
- team background (SQL vs. GraphQL)
- growth trajectory (integrated vs. distributed)
- and operational priorities (simplified vs. specialized)
Why Consider PuppyGraph as an Alternative
While ArangoDB and Dgraph each offer compelling architectures for managing connected data, both ultimately require moving information into their own storage engines and maintaining separate graph infrastructures.
And when your analytics depends on deep, multi-hop traversals, both systems can show strain in different ways. Dgraph can be fast for moderate-depth queries, but predicate-based sharding can create hotspot bottlenecks and deeper traversals often require more cross-group coordination, increasing latency. ArangoDB offers flexibility through its multi-model engine, but its integrated cluster faces eventual scaling limits, and traversal performance is highly dependent on locality, so deep traversals that cross shards can degrade query performance.
For teams seeking graph insights without the operational cost, latency, and duplication that come with traditional databases, a different approach is emerging.

PuppyGraph is the first and only real-time, zero-ETL graph query engine in the market, empowering data teams to query existing relational data stores as a unified graph model that can be deployed in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles.
It seamlessly integrates with data lakes like Apache Iceberg, Apache Hudi, and Delta Lake, as well as databases including MySQL, PostgreSQL, and DuckDB, so you can query across multiple sources simultaneously.


Key PuppyGraph capabilities include:
- Zero ETL: PuppyGraph runs as a query engine on your existing relational databases and lakes. Skip pipeline builds, reduce fragility, and start querying as a graph in minutes.
- No Data Duplication: Query your data in place, eliminating the need to copy large datasets into a separate graph database. This ensures data consistency and leverages existing data access controls.
- Real Time Analysis: By querying live source data, analyses reflect the current state of the environment, mitigating the problem of relying on static, potentially outdated graph snapshots. PuppyGraph users report 6-hop queries across billions of edges in less than 3 seconds.
- Scalable Performance: PuppyGraph’s distributed compute engine scales with your cluster size. Run petabyte-scale workloads and deep traversals like 10-hop neighbors, and get answers back in seconds. This exceptional query performance is achieved through the use of parallel processing and vectorized evaluation technology.
- Best of SQL and Graph: Because PuppyGraph queries your data in place, teams can use their existing SQL engines for tabular workloads and PuppyGraph for relationship-heavy analysis, all on the same source tables. No need to force every use case through a graph database or retrain teams on a new query language.
- Lower Total Cost of Ownership: Graph databases make you pay twice — once for pipelines, duplicated storage, and parallel governance, and again for the high-memory hardware needed to make them fast. PuppyGraph removes both costs by querying your lake directly with zero ETL and no second system to maintain. No massive RAM bills, no duplicated ACLs, and no extra infrastructure to secure.
- Flexible and Iterative Modeling: Using metadata driven schemas allows creating multiple graph views from the same underlying data. Models can be iterated upon quickly without rebuilding data pipelines, supporting agile analysis workflows.
- Standard Querying and Visualization: Support for standard graph query languages (openCypher, Gremlin) and integrated visualization tools helps analysts explore relationships intuitively and effectively.
- Proven at Enterprise Scale: PuppyGraph is already used by half of the top 20 cybersecurity companies, as well as engineering-driven enterprises like AMD and Coinbase. Whether it’s multi-hop security reasoning, asset intelligence, or deep relationship queries across massive datasets, these teams trust PuppyGraph to replace slow ETL pipelines and complex graph stacks with a simpler, faster architecture.


As data grows more complex, the most valuable insights often lie in how entities relate. PuppyGraph brings those insights to the surface, whether you’re modeling organizational networks, social introductions, fraud and cybersecurity graphs, or GraphRAG pipelines that trace knowledge provenance.

Deployment is simple: download the free Docker image, connect PuppyGraph to your existing data stores, define graph schemas, and start querying. PuppyGraph can be deployed via Docker, AWS AMI, GCP Marketplace, or within a VPC or data center for full data control.
Conclusion
ArangoDB delivers multi-model capabilities with SQL-like querying for teams managing hybrid workloads. Its self-contained architecture simplifies operations for mid-scale deployments. Dgraph provides native GraphQL integration with strong consistency guarantees, ideal for GraphQL-first applications requiring distributed transactions.
Both require migrating data into dedicated systems. PuppyGraph eliminates this overhead by querying your existing data infrastructure directly. No ETL pipelines, no data duplication, no separate graph database to maintain.
For graph analytics on your current data stores, download PuppyGraph's free Developer Edition or book a demo with our team.
Get started with PuppyGraph!
Developer Edition
- Forever free
- Single noded
- Designed for proving your ideas
- Available via Docker install
Enterprise Edition
- 30-day free trial with full features
- Everything in developer edition & enterprise features
- Designed for production
- Available via AWS AMI & Docker install


