Fast Analytics and Connected Insights: Graph Analytics on ClickHouse with PuppyGraph

Sa Wang
Software Engineer
Danfeng Xu
CTO & Co-Founder
|
November 6, 2025
Fast Analytics and Connected Insights: Graph Analytics on ClickHouse with PuppyGraph

Modern cybersecurity platforms depend on fast, scalable analytics to process massive streams of logs, alerts, and telemetry in real time. Security teams need to detect anomalies, monitor activity, and investigate incidents across ever-growing cloud environments. The scale and velocity of this data require analytical systems that can ingest, store, and query billions of records efficiently while supporting continuous monitoring and investigation.

Many cybersecurity companies already use ClickHouse for these workloads. Its columnar architecture, high compression ratio, and low-latency query performance make it ideal for handling security event data at scale. ClickHouse enables teams to run real-time dashboards, anomaly detection, and incident response workflows—all from a single analytical engine capable of handling multi-tenant datasets and long-term retention.

At the same time, cybersecurity data naturally forms a graph structure. Accounts connect to identities, identities open sessions, sessions generate events, and events operate on resources. Viewing this data as a graph enables analysts to trace paths, uncover relationships, and identify hidden patterns such as lateral movement or privilege escalation. PuppyGraph builds on top of ClickHouse to provide this graph perspective, enabling direct graph queries and analytics on existing ClickHouse data without requiring ETL processes or a dedicated graph database.

Why ClickHouse for Security Analytics

Security analytics entails processing vast quantities of structured and semi-structured data, including login events, API calls, alerts, audit trails, and network telemetry. These datasets arrive continuously and can grow to petabyte scale within days. Analysts need to run both real-time queries for monitoring and historical queries for investigation, often across multi-tenant environments containing data from many customers. Achieving this level of speed, scale, and flexibility requires a database engine optimized for analytical workloads rather than transactional ones.

ClickHouse, originally designed for large-scale analytical processing, fits this need perfectly. Its column-oriented architecture and vectorized execution engine enable high-throughput ingestion and sub-second query performance on billions of rows. Built for continuous ingest, ClickHouse manages high query concurrency, ensuring that dashboards, alerts, and investigative queries remain fast and responsive, even during intense workloads. Data is stored efficiently through advanced compression and partitioning, allowing security platforms to retain detailed records for long periods while keeping costs predictable. Because ClickHouse scales both horizontally and vertically, it can accommodate rapid data growth without compromising latency or performance. It also integrates smoothly with ingestion pipelines and visualization tools, enabling teams to analyze data directly from streams, message queues, or cloud object storage using standard SQL.

These capabilities have already made ClickHouse a trusted foundation for modern security platforms. Organizations such as Exabeam, Wallarm, and Harvey use ClickHouse to power their real-time detection and investigation pipelines, analyzing billions of daily events at scale. Its combination of performance, scalability, and efficiency allows security teams to continuously monitor threats, reduce response times, and maintain complete visibility across complex, multi-tenant environments.

Adding Graph Analytics to ClickHouse with PuppyGraph

Security analytics data is inherently interconnected. Accounts authenticate identities, identities open sessions, sessions generate events, and events act on resources. When analysts seek relationships among these entities, such as tracing the path from a compromised account to a critical asset, they shift from aggregation to connected analysis. Representing this data as a graph makes these investigations intuitive and efficient.

PuppyGraph provides a lightweight graph modeling layer on top of ClickHouse. Instead of building a separate graph database, users define a logical graph through a simple JSON-based schema. This schema describes how existing ClickHouse tables map to vertices and edges, creating a virtual graph layer on top of relational data. Because the graph is defined in metadata, not storage, teams can easily create multiple graph views over the same dataset to support different analytical needs without ETL or data duplication.

Graph queries can then be executed using openCypher or Gremlin, expressing complex, multi-hop relationships naturally. Because PuppyGraph operates directly on ClickHouse, it inherits ClickHouse’s performance, concurrency, and scalability. The virtual-graph approach is also cost-effective: no additional data copies, no separate cluster to maintain, and no latency from synchronization. Analysts can move seamlessly between event-level queries and graph-level exploration, uncovering how users, sessions, and resources interact in context.

Together, ClickHouse and PuppyGraph deliver a unified analytics environment where high-performance event processing and relationship-driven investigation coexist within a single, efficient stack.

How to Run Graph Queries on ClickHouse

In many cybersecurity platforms, ClickHouse serves as the real-time analytics engine or data warehouse for processing and exploring large volumes of security data collected from customer environments. For example, it may store tables such as accounts, identities, sessions, events, and resources. The accounts table contains customer or user account metadata; identities record authentication information; sessions capture user logins or service connections; events log specific actions or API calls; and resources represent the assets being accessed, such as storage buckets, databases, or compute instances. 

These tables are inherently connected. For example, an account owns one or more identities, an identity initiates sessions, a session records events, and each event operates on a specific resource. Together, these relationships form a natural graph structure. Representing the data as a graph makes these connections explicit and enables intuitive multi-hop queries—for instance, tracing how a single account interacted with multiple resources through a chain of sessions and events.

PuppyGraph sits directly above ClickHouse as a graph query layer. To integrate, users configure a ClickHouse catalog in PuppyGraph, specifying the JDBC connection URL, credentials, and the target database. PuppyGraph then introspects the ClickHouse schema, allowing all tables and views in that catalog to be referenced without copying or transforming data. 

After the connection is established, the graph model can be built and represented as a graph schema in PuppyGraph. The schema is essentially a JSON file that includes the ClickHouse catalog configuration along with the mappings of vertices and edges derived from the underlying tables. For example, an Account vertex can be defined with its ID and attributes taken from the accounts table, while a HasIdentity edge can be defined from the identities table, connecting Account and Identity vertices. Together, these definitions along with the catalog configuration, form the complete graph model. PuppyGraph also provides a schema builder in its web UI to simplify this process through an interactive interface.

Once the graph model is defined, analysts can query ClickHouse data using openCypher and Gremlin, both standard graph query languages supported by PuppyGraph. openCypher provides a declarative syntax for describing patterns and relationships, while Gremlin offers a traversal-based style that expresses how to navigate through the graph. These languages make it easy to express multi-hop relationships intuitively. For example, the following openCypher query finds all accounts that accessed S3 bucket:

MATCH (a:Account)-[:HasIdentity]->(i:Identity)
  -[:HasSession]->(s:Session)
  -[:RecordsEvent]->(e:Event)
  -[:OperatesOn]->(r:Resource)
WHERE r.resource_type = 'S3Bucket'
RETURN a.account_id AS Account,
       r.resource_name AS Resource,
       e.event_id AS EventID
LIMIT 50

The following query returns complete relationship paths, helping users trace activity chains. In PuppyGraph’s UI, the graph visualization of the query results is displayed interactively.

MATCH path = (a:Account)-[:HasIdentity]->(i:Identity)
  -[:HasSession]->(s:Session)
  -[:RecordsEvent]->(e:Event)
  -[:OperatesOn]->(r:Resource)
WHERE r.resource_type = 'EC2Instance'
RETURN path
LIMIT 25

Bringing It All Together

ClickHouse and PuppyGraph together enable a new level of visibility in cybersecurity analytics. ClickHouse provides the speed, scalability, and efficiency needed to manage massive, real-time event streams, while PuppyGraph extends those capabilities with graph-based analytics. Without ETL or data duplication, analysts can move seamlessly from statistical summaries to connected investigations, exploring how accounts, sessions, events, and resources interact in context. PuppyGraph is already working with half of the top 20 cybersecurity companies.

Get started with ClickHouse and explore PuppyGraph’s ClickHouse integration to bring real-time graph analytics to your security data.

Danfeng Xu
CTO & Co-Founder

Danfeng Xu, CTO and Co-founder of PuppyGraph, is a passionate learner with extensive experience across online platforms, streaming services, big data, and developer productivity. He previously worked at LinkedIn, where he led a unified server platform strategy for thousands of microservices and modernized the engagement platform to deliver dynamic, personalized and engaging user experiences. He holds a Master's degree in Computer Science from UCLA.

Sa Wang
Software Engineer

Sa Wang is a Software Engineer with exceptional mathematical ability and strong coding skills. He holds a Bachelor's degree in Computer Science and a Master's degree in Philosophy from Fudan University, where he specialized in Mathematical Logic.

See PuppyGraph
In Action

See PuppyGraph
In Action

Graph Your Data In 10 Minutes.

Get started with PuppyGraph!

PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model.

Dev Edition

Free Download

Enterprise Edition

Developer

$0
/month
  • Forever free
  • Single node
  • Designed for proving your ideas
  • Available via Docker install

Enterprise

$
Based on the Memory and CPU of the server that runs PuppyGraph.
  • 30 day free trial with full features
  • Everything in Developer + Enterprise features
  • Designed for production
  • Available via AWS AMI & Docker install
* No payment required

Developer Edition

  • Forever free
  • Single noded
  • Designed for proving your ideas
  • Available via Docker install

Enterprise Edition

  • 30-day free trial with full features
  • Everything in developer edition & enterprise features
  • Designed for production
  • Available via AWS AMI & Docker install
* No payment required