Security Analytics with Graph Insights

Software Engineer
|
June 6, 2025
Security Analytics with Graph Insights

Most security tools excel at spotting individual events: an unusual login, a flagged IP, a failed access attempt. However, real incidents rarely unfold in isolation, instead involving chains of activity across users, systems, and environments. Without context, it’s easy to miss what truly matters or to waste time chasing false positives.

Security analytics transforms cybersecurity into an intelligence-first discipline. By correlating data across users, devices, and systems, it reveals hidden patterns, anomalies, and early indicators of compromise. It helps security teams shift from passive reaction to proactive detection, prevention, and response.

In this post, we’ll break down how security analytics works, where it adds value, and how graph-based techniques help uncover threats hiding in the relationships between events.

What is Security Analytics?

Security analytics is the process of collecting, aggregating, normalizing, and analyzing security-related data to detect, investigate, and respond to threats. It transforms scattered signals from sources like network traffic, system logs, identity activity, and cloud events into structured, actionable insight.

Unlike tools that monitor events in isolation, security analytics builds a connected view of what’s happening across users, systems, and services. Consider a failed login attempt, an unusual data transfer, and a new administrator account created within minutes. Viewed separately, they might appear harmless. Taken together, they suggest a coordinated attack.

Modern security analytics platforms apply statistical models, behavior baselining, and sometimes machine learning or graph analytics to detect anomalies and surface relationships. They operate continuously and at scale, helping teams keep up with high-volume, high-velocity telemetry across hybrid environments. By maintaining context and correlation, security analytics supports faster detection, clearer investigations, and more informed decisions.

How Does Security Analytics Work?

Security analytics operates through a series of stages that transform raw telemetry into actionable insight. Each step builds context, reduces noise, and enables faster, more accurate threat detection. Understanding these stages helps teams evaluate or design an effective analytics workflow.

1. Data Collection Across the Attack Surface

The process starts with broad telemetry ingestion from endpoints, network devices, identity systems, cloud services, and applications. Data is gathered via agents, log shippers, APIs, and SIEM integrations. Without wide coverage, attackers can exploit blind spots. Real-time collection ensures visibility into fast-evolving environments.

2. Normalization and Enrichment

Raw security data is messy. Logs and telemetry from different systems often come in varied formats: JSON from cloud providers, syslog from network devices, CSVs from endpoints, and with inconsistent field names, timestamp formats, or missing identifiers. Normalization is the process of converting this data into a consistent, structured format that makes it suitable for analysis. This includes standardizing field names (e.g., src_ip, source_ip, and ip.src all becoming source_ip), aligning timestamps, and unifying event types across sources. Normalization ensures that events from different systems can be compared, correlated, and queried uniformly. Without it, even basic cross-source detection becomes difficult or unreliable.

Once data is normalized, enrichment adds context to make it more meaningful. This means attaching additional information that wasn't in the original event but helps with interpretation or prioritization. Examples include:

  • Mapping IP addresses to geolocations
  • Linking device IDs to asset ownership
  • Annotating events with user roles or business unit tags
  • Cross-referencing with threat intelligence (e.g., marking IPs as known malicious)
  • Attaching vulnerability metadata to affected hosts

Enrichment helps analysts focus on what's important. Instead of chasing every failed login, they can prioritize logins from high-privilege accounts, targeting sensitive systems, from risky geographies. In short, normalization cleans the data for analysis; enrichment makes that data useful for decision-making.

3. Correlation, Detection, and Threat Modeling

With normalized, enriched data, the analytics engine applies a layered detection strategy. Rule-based correlation identifies known attack patterns and policy violations. Behavior baselining detects anomalies against historical user or device activity. Machine learning models spot deviations too subtle for manual rule creation.

Effective security analytics systems combine these techniques rather than relying on a single approach, balancing precision (low false positives) with breadth (catching novel threats). Moreover, graph-based correlation is increasingly common, linking related events across users, devices, and time windows to expose hidden attack paths.

4. Prioritization and Response Automation

Not every alert deserves the same level of urgency. Security analytics platforms evaluate the context of each event to prioritize what matters most. Factors like asset sensitivity, user privilege level, behavioral deviation, and threat intelligence indicators are used to assign risk scores. This helps separate critical incidents, such as unusual access to production systems by a privileged account, from low-risk anomalies that can wait for review.

High-risk events may trigger automated response actions through integrations with SOAR (Security Orchestration, Automation, and Response) tools. These actions might include disabling accounts, isolating endpoints, or blocking network access. Automating response to known or repeatable threats reduces investigation time and limits the attacker’s window of opportunity.

Lower-priority events are typically routed into investigation queues with full context attached: historical activity, related entities, and enrichment data, so that analysts can investigate efficiently and focus on detecting stealthier or emerging threats. By combining risk-based prioritization with automation, security analytics enables faster containment and more scalable security operations.

Types of Security Analytics

Security analytics relies on a combination of techniques to detect threats and uncover suspicious activity. Each approach analyzes data in different ways—some focus on matching known patterns, others look for deviations or structural relationships. Modern platforms often combine several methods to improve accuracy, reduce false positives, and capture both known and novel attacks.

Rule-Based Analytics

This is the most traditional form of detection, where rules are written to match specific conditions like five failed logins in a minute, a login from a foreign country, or an administrator accessing sensitive data outside business hours. While these detections are transparent and easy to tune, they are inherently reactive. This means they can only catch what is already known and often generate high volumes of alerts if not carefully managed.

Rule-based analytics is still widely used, particularly for compliance, access policy enforcement, and identifying repeatable threats. However, its limitations in detecting unknown attack patterns have led to broader use of complementary techniques.

Behavioral Baseline Analytics

Rather than searching for known patterns, this approach models “normal” behavior for users, devices, or services. Once a baseline is established, deviations can be flagged. For example, a user logging in at unusual hours, transferring more data than usual, or accessing resources they rarely use would be flagged.

This method is particularly effective for detecting insider threats, compromised credentials, or lateral movement, where behavior changes but doesn’t necessarily match a known signature. It’s also more adaptive to different environments, since baselines are shaped by actual usage patterns rather than hard-coded assumptions.

Machine Learning-Based Analytics

Machine learning brings statistical modeling and pattern recognition to security analytics. Unlike simple thresholds or baselines, ML can identify complex relationships and subtle outliers across large, high-dimensional datasets.

For example, anomaly detection algorithms might flag a combination of user location, device type, and access time that has never occurred before. Clustering models can group similar behaviors to highlight outliers. Some systems even use supervised learning to classify benign versus malicious activity based on labeled training data.

ML helps reduce noise by learning what’s typical and flagging only meaningful deviations. However, it requires careful tuning and interpretability, especially in security contexts where understanding why something was flagged is often as important as the alert itself.

Graph-Based Analytics

Graph analytics focuses on the structure of relationships between entities—such as users, IP addresses, processes, files, and cloud resources. Rather than analyzing events in isolation, it models how entities interact over time.

For instance, an attacker might access a low-privilege account, use it to move laterally to another machine, escalate privileges, and then access a sensitive database. These steps may not trigger alerts individually, but when viewed as a connected sequence in a graph, they reveal a clear attack path.

Graph-based approaches are particularly powerful for detecting multi-stage threats, privilege escalation, lateral movement, and policy misconfigurations. They provide a natural way to represent security environments where relationships matter as much as the events themselves.

Effective security analytics platforms don’t rely on just one of these techniques. Instead, they layer them by using rules for speed and clarity, baselines for behavioral shifts, machine learning for pattern recognition, and graph analytics for context and reachability. This layered strategy supports both high-confidence alerts and deeper investigation workflows.

Security Analytics vs. SIEM: What’s the Difference?

Security analytics and SIEM (Security Information and Event Management) are often mentioned together, but they serve different roles. Understanding the distinction between them is important for clarifying where modern detection and investigation capabilities originate.

SIEM as the Data Platform

A SIEM system collects, stores, and organizes security event data from across an organization. It centralizes logs from firewalls, endpoints, identity providers, cloud services, and more. SIEMs are essential for compliance, alerting, and having a unified view of security telemetry. However, traditional SIEMs are primarily focused on log aggregation and rule-based alerts. Their detection capabilities are often limited to what’s explicitly defined.

Security Analytics as the Intelligence Layer

Security analytics builds on top of that foundation. It enriches, correlates, and analyzes the data to detect threats that SIEM rules alone would miss. This includes applying behavioral models, machine learning, and graph-based techniques to find anomalies and trace attack paths. In many environments, security analytics tools ingest data from a SIEM and provide the analytical capabilities it lacks.

How They Work Together

Modern security architectures often combine both: the SIEM provides data collection and centralized visibility, while security analytics enhances detection, prioritization, and response. In some platforms, the line is blurring because SIEM vendors are incorporating advanced analytics and analytics platforms are adding ingestion and storage features. However, the conceptual difference remains.

Security analytics isn’t a replacement for SIEM. It’s an evolution—a necessary extension that brings intelligence and context to the data a SIEM collects.

Security Analytics Use Cases

Security analytics supports a wide range of use cases across different parts of the security workflow. By correlating events, identifying patterns, and surfacing hidden relationships, it enables faster detection, better triage, and deeper investigations. Below are some of the most common and impactful applications.

Threat Detection and Anomaly Identification

Security analytics can identify early indicators of compromise by spotting deviations from normal behavior. For example, if a user suddenly accesses sensitive systems at unusual hours from a new device, baseline analytics and anomaly detection flag it, even though no known signature is matched. This helps catch stealthy attacks that bypass traditional rules.

Insider Threat Detection

Insider risks often involve legitimate users doing unexpected things. Security analytics combines behavior modeling, access logs, and contextual data (like device ownership or privilege level) to detect when employees access data they shouldn’t or act outside established norms. This reduces reliance on static rules and enables more nuanced, risk-aware monitoring.

Incident Investigation and Root Cause Analysis

When an alert is triggered, security analytics tools can reconstruct the event sequence—showing what happened, when, and how it spread. This shortens investigation time by surfacing relevant activity across systems and users without manual log searching. Enriched, correlated data gives analysts a complete picture from a single place.

Attack Path Analysis and Lateral Movement Detection

Advanced threats often unfold over multiple steps. Security analytics (especially when combined with graph-based techniques) helps identify how attackers move through the environment: from an initial foothold to escalated privileges to access of critical systems. This connected view reveals the bigger picture that isolated alerts can’t.

Cloud and Identity Analytics

In hybrid and cloud-native environments, security analytics integrates telemetry from cloud providers, IAM systems, and SaaS platforms. It detects suspicious access patterns (like token reuse, privilege abuse, or geography mismatches) and ensures that identity-driven attacks don’t go unnoticed in fragmented infrastructure.

Graph-Powered Security Analytics with PuppyGraph

Detecting threats isn’t just about spotting isolated anomalies—it’s about understanding how different events relate to one another. Lateral movement, privilege escalation, and indirect access often span multiple users, systems, and timeframes. Without visibility into these relationships, traditional tools miss the larger picture.

PuppyGraph helps solve this by making relationship-based detection practical. PuppyGraph is a real-time, zero-ETL graph query engine that lets you model and analyze connected data directly from your existing SQL-based systems. Instead of building a separate pipeline into a graph database, PuppyGraph connects to the relational data source directly and treats that data as graphs. With PuppyGraph, you define a graph schema that maps how entities like users, sessions, and resources relate to each other. You can then query those relationships using familiar graph languages like openCypher or Gremlin. 

This approach offers several advantages for security analytics:

  • Real-time, multi-hop investigation: PuppyGraph supports fast, recursive queries—such as tracing activity from a user to a session to a resource—making it easier to uncover hidden lateral movement and chained access.

  • Flexible graph modeling: You can define multiple graph views over the same data, allowing different teams to focus on what matters to them—whether it's identity relationships, API access paths, or infrastructure dependencies.

  • No data duplication: Because it queries directly against existing data, there's no need for separate storage systems or synchronization pipelines—reducing both complexity and overhead.

  • Built-in visualization: PuppyGraph includes a native interface for querying and exploring graphs visually, so analysts can investigate threats and relationships without building custom dashboards or exporting data.

In the next section, we’ll show how this model works in practice through a hands-on demo.

Demo

To ground the analysis in real-world context, this demo incorporates vulnerability data from the National Vulnerability Database (NVD), the U.S. government’s authoritative source for standardized CVE information. We use the JSON 2.0 CVE-2025 feed, which includes structured records of known software vulnerabilities.

We combine these CVE records with synthetic cloud infrastructure data to simulate how real vulnerabilities could affect virtual machines, subnets, and network interfaces in a cloud-like setting. Each simulated security finding is optionally linked to a real CVE, allowing us to construct meaningful relationships between vulnerabilities and the assets they impact.

We have uploaded the materials for the demo to GitHub. Please download them or clone the repository directly. There are detailed instructions for the demo. It is also recommended to follow the getting-started tutorial to become familiar with using PuppyGraph with PostgreSQL.

Prerequisite

  • Docker and Docker Compose
  • Python 3

Data preparation

1. Download and Unzip CVE Data

wget https://nvd.nist.gov/feeds/json/cve/2.0/nvdcve-2.0-2025.json.zip
unzip nvdcve-2.0-2025.json.zip -d cve_json

2. Convert CVE JSON to CSV

python3 cve_json_to_csv.py

3. Generate other CSV files

python3 gen_data.py

4. Copy cve.csv to csv_data Directory

cp cve.csv ./csv_data/

5. Copy csv_data to ./postgres-init/ directory.

cp -r gen_data/csv_data ./postgres-init/

Note that the CVE data feed keeps changing, and we also generate random synthetic data to simulate cloud infrastructure components, so the csv files will be different each time you prepare the data there.

Deployment

1. Start the Postgres services and PuppyGraph by running:

docker compose up -d

2. Wait a few seconds for PostgreSQL to load the CSVs. Then open the PuppyGraph UI.

Modeling the Graph

  1. Log into the PuppyGraph Web UI at http://localhost:8081 with the following credentials:
  • Username: puppygraph
  • Password: puppygraph123
  1. Upload the schema:
  • Select the file schema.json in the Upload Graph Schema JSON section and click on Upload.
  • Alternatively, you can upload the schema via the following command:
curl -XPOST -H "content-type: application/json" --data-binary @./schema.json --user "puppygraph:puppygraph123" localhost:8081/schema
Figure: The visualization of the graph model.

Querying the Graph using Cypher

Navigate to the Query panel on the left side. The Graph Query tab offers an interactive environment for querying the graph using Gremlin and Cypher.

After each query, remember to clear the graph panel before executing the next query to maintain a clean visualization. You can do this by clicking the Clear Canvas button located in the top-right corner of the page.

Here are some example Cypher queries:

  1. Find Instances Affected by a CVE.
MATCH (c:CVE {id: "CVE-2025-5222"})<-[:HAS_VULNERABILITY]-(f:AWSInspectorFinding)
      <-[:HAS_FINDING]-(inst:EC2Instance)
RETURN inst.id               AS instance_id,
       inst.instancetype     AS instance_type,
       f.severity            AS finding_severity,
       f.last_observed_at    AS update_time
ORDER BY finding_severity DESC, inst.id;

Or return the path:

MATCH path = (c:CVE {id: "CVE-2025-5222"})<-[:HAS_VULNERABILITY]-(f:AWSInspectorFinding)
    <-[:HAS_FINDING]-(inst:EC2Instance)
RETURN path;

2. List Recent High/Critical Vulnerable Instances in a Subnet.

MATCH (sub:EC2Subnet {id: "subnet-04f9ff1508a54905"})
      <-[:PART_OF_SUBNET]-(ni:NetworkInterface)
      <-[:HAS_NETWORK_INTERFACE]-(inst:EC2Instance)
      -[:HAS_FINDING]->(f:AWSInspectorFinding)
      -[:HAS_VULNERABILITY]->(c:CVE)
WHERE f.severity IN ["HIGH","CRITICAL"] AND f.last_observed_at > datetime('2025-03-01')
RETURN inst.id               AS instance_id,
       inst.availabilityzone AS az,
       f.severity            AS finding_severity,
       c.id                  AS cve_id,
       f.last_observed_at    AS update_time
ORDER BY f.severity, f.last_observed_at DESC;

3. Find Neighbors of a Given Instance with Recent Vulnerability Risks.

MATCH path = (target:EC2Instance {id: "i-03308b76dd2349b6"})
      -[:HAS_NETWORK_INTERFACE]->(targetNi:NetworkInterface)
      -[:PART_OF_SUBNET]->(sub:EC2Subnet)
      <-[:PART_OF_SUBNET]-(ni:NetworkInterface)
      <-[:HAS_NETWORK_INTERFACE]-(peer:EC2Instance)
      -[:HAS_FINDING]->(f:AWSInspectorFinding)
      -[:HAS_VULNERABILITY]->(c:CVE)
WHERE peer.id <> target.id AND f.last_observed_at > datetime('2025-05-01')
RETURN path;

4. Count Affected Instances per CVE per Subnet Recently.

MATCH (c:CVE)<-[:HAS_VULNERABILITY]-(f:AWSInspectorFinding)
      <-[:HAS_FINDING]-(inst:EC2Instance)
      -[:HAS_NETWORK_INTERFACE]->(ni:NetworkInterface)
      -[:PART_OF_SUBNET]->(sub:EC2Subnet)
WHERE f.last_observed_at > datetime('2025-03-01')
RETURN sub.id                  AS subnet_id,
       c.id                    AS cve_id,
       COUNT(DISTINCT inst.id) AS affected_instances
ORDER BY affected_instances DESC, subnet_id
LIMIT 1000;

5. Count Critical Vulnerable Instances per Availability Zone.

MATCH (inst:EC2Instance)-[:HAS_FINDING]->(f:AWSInspectorFinding)
      -[:HAS_VULNERABILITY]->(c:CVE)
WHERE f.severity = "CRITICAL"
RETURN inst.availabilityzone   AS az,
       COUNT(DISTINCT inst.id) AS critical_count
ORDER BY critical_count DESC;

Cleanup and Teardown

To stop and remove the containers, networks, and volumes, run:

docker compose down -v

Conclusion

Security analytics gives defenders the ability to move beyond isolated alerts and fragmented logs. By correlating and analyzing activity across systems, users, and environments, it brings clarity to complex incidents and enables faster, more confident responses.

To fully understand modern threats, though, teams need more than event-based detection. They need a way to analyze how different parts of their infrastructure are connected. This includes understanding how access flows, how privilege accumulates, and how attackers move across systems. Graph analytics provides that structure, and PuppyGraph makes it accessible.

PuppyGraph models relationships directly on top of existing data, which avoids the need for ETL, duplication, or new infrastructure. This enables real-time, multi-hop threat investigation at scale and helps analysts uncover the signals that traditional tools overlook.

If you're ready to add relationship-based visibility to your security stack, try the forever-free PuppyGraph Developer Edition or book a demo with our team.

Sa Wang is a Software Engineer with exceptional math abilities and strong coding skills. He earned his Bachelor's degree in Computer Science from Fudan University and has been studying Mathematical Logic in the Philosophy Department at Fudan University, expecting to receive his Master's degree in Philosophy in June this year. He and his team won a gold medal in the Jilin regional competition of the China Collegiate Programming Contest and received a first-class award in the Shanghai regional competition of the National Student Math Competition.

Sa Wang
Software Engineer

Sa Wang is a Software Engineer with exceptional math abilities and strong coding skills. He earned his Bachelor's degree in Computer Science from Fudan University and has been studying Mathematical Logic in the Philosophy Department at Fudan University, expecting to receive his Master's degree in Philosophy in June this year. He and his team won a gold medal in the Jilin regional competition of the China Collegiate Programming Contest and received a first-class award in the Shanghai regional competition of the National Student Math Competition.

No items found.
Join our newsletter

See PuppyGraph
In Action

See PuppyGraph
In Action

Graph Your Data In 10 Minutes.

Get started with PuppyGraph!

PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model.

Dev Edition

Free Download

Enterprise Edition

Developer

$0
/month
  • Forever free
  • Single node
  • Designed for proving your ideas
  • Available via Docker install

Enterprise

$
Based on the Memory and CPU of the server that runs PuppyGraph.
  • 30 day free trial with full features
  • Everything in Developer + Enterprise features
  • Designed for production
  • Available via AWS AMI & Docker install
* No payment required

Developer Edition

  • Forever free
  • Single noded
  • Designed for proving your ideas
  • Available via Docker install

Enterprise Edition

  • 30-day free trial with full features
  • Everything in developer edition & enterprise features
  • Designed for production
  • Available via AWS AMI & Docker install
* No payment required