Skip to content

marez8505/SplunkForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SplunkForge

Python License MITRE ATT&CK Tests

A Splunk-focused SIEM security event simulator and SPL detection rule library for detection engineers, SOC analysts, and cybersecurity students.

SplunkForge generates realistic, multi-source security event datasets — authentication logs, firewall records, Sysmon telemetry, proxy logs, and DNS queries — organized into complete multi-stage attack scenarios. It ships a library of 30+ validated SPL detection queries mapped to the MITRE ATT&CK framework, three importable Splunk dashboard XML files, and a text-based ATT&CK coverage matrix for gap analysis.


Why This Exists

Security Information and Event Management (SIEM) platforms are only as good as the detection rules running inside them. Writing good detection rules requires:

  1. Understanding what attack activity looks like across multiple log sources simultaneously
  2. Having realistic test data to run rules against without needing a live attack
  3. Knowing which MITRE ATT&CK techniques your rules cover — and which ones you're blind to

SplunkForge solves all three. It generates complete, realistic attack datasets that look like actual SIEM telemetry, provides production-quality SPL detection queries for every major attack tactic, and shows you exactly where your detection coverage has gaps.

This is the toolkit I wish existed when learning detection engineering. Every event generated here reflects real attacker behavior studied from incident response reports, threat intelligence, and the ATT&CK knowledge base.


Features

Event Generators

  • Windows Security Event Log — EventIDs 4624, 4625, 4648, 4688, 4698, 4720, 7045 with correct field names and logon type codes
  • Linux auth.log / sshd — PAM authentication success and failure messages
  • Sysmon Telemetry — EID 1 (process create), 3 (network connect), 8 (remote thread), 10 (process access), 11 (file create), 12/13 (registry), 22 (DNS query)
  • Firewall Logs — Palo Alto PAN-OS style traffic logs with bytes, packets, session IDs
  • DNS Query Logs — Splunk Stream DNS format with TTL, record type, response data
  • Web Proxy Logs — Squid/Zscaler Apache Combined Log format with URL, user-agent, bytes
  • Web Server Access Logs — Apache and IIS W3C format with attack signature detection
  • Configurable timing — realistic jitter, spread intervals, events per minute
  • Benign baseline mixing — configurable signal-to-noise ratio for authentic datasets

Attack Scenarios (The Showpiece)

Five complete, multi-stage attack kill chains with realistic timing, cross-source correlation, and MITRE ATT&CK metadata:

Scenario Phases Event Types ATT&CK Techniques
Brute Force Failures → Success → Discovery → Persistence WinEventLog, linux:auth, Sysmon, PowerShell T1110.001, T1110.003, T1078, T1059.001, T1053.005
Lateral Movement Phishing → Recon → Cred Dump → Pivot → Persist WinEventLog, Sysmon, Proxy, Firewall T1566.001, T1003.001, T1021.002, T1543.003
Data Exfiltration Recon → Stage → Exfil → Cleanup WinEventLog, DNS, Proxy, Sysmon T1135, T1074.001, T1048.003, T1070.001
Ransomware Phishing → Download → Disable AV → Encrypt → Ransom Note WinEventLog, Sysmon, Proxy, PowerShell T1566.001, T1562.001, T1490, T1486
Insider Threat After-hours → Sensitive Shares → Bulk Copy → Cloud/USB WinEventLog, Proxy, Sysmon T1078, T1039, T1567.002, T1025

SPL Detection Rule Library

30+ production-quality detection queries across all major MITRE ATT&CK tactics:

Rule ID Name Tactic Severity
AUTH-001 Multiple Failed Logins from Single Source Credential Access High
AUTH-002 Password Spraying - Low Rate Across Many Accounts Credential Access High
AUTH-003 Brute Force Followed by Successful Login Credential Access Critical
AUTH-004 Pass-the-Hash Indicator (Explicit Credential Use) Lateral Movement High
AUTH-005 Account Lockout Storm Credential Access Medium
AUTH-006 New Local Admin Account Created Persistence High
EXEC-001 PowerShell Encoded Command Execution Execution High
EXEC-002 LOLBin Spawned by Office Application (Macro) Execution Critical
EXEC-003 WMI Remote Command Execution Execution High
EXEC-004 Certutil Download Cradle Defense Evasion High
EXEC-005 PowerShell Script Block - Offensive Keywords Execution Critical
PERS-001 Suspicious Registry Run Key Modification Persistence High
PERS-002 Scheduled Task Created by Suspicious Process Persistence High
PERS-003 Service Installed in Suspicious Location Persistence High
PERS-004 WMI Event Subscription (Fileless Persistence) Persistence Critical
LAT-001 Pass-the-Hash / Lateral Movement via SMB Lateral Movement High
LAT-002 LSASS Memory Access (Credential Dumping) Credential Access Critical
LAT-003 PsExec-Style Remote Service Creation Lateral Movement High
LAT-004 Remote Thread Injection (Process Injection) Defense Evasion Critical
LAT-005 WinRM / PSRemoting Lateral Movement Lateral Movement High
EXFIL-001 DNS Tunneling - Long Subdomain Queries Exfiltration Critical
EXFIL-002 Bulk Upload to Cloud Storage Exfiltration High
EXFIL-003 Large Outbound Transfer to Non-Business IP Exfiltration High
EXFIL-004 Staging Directory - Bulk File Copy to Temp Collection Medium
EXFIL-005 USB Device Connected - Possible Data Theft Exfiltration Medium
C2-001 C2 Beaconing - Regular Connection Intervals C2 High
C2-002 Outbound Connection on Non-Standard Port C2 High
C2-003 DNS over HTTPS to Non-Corporate Resolver C2 Medium
C2-004 Protocol Tunneling - Suspicious POST Volume C2 High
C2-005 Self-Signed Certificate on External HTTPS C2 High

MITRE ATT&CK Coverage Matrix

A text-based coverage matrix showing which techniques have detection rules and which are gaps:

╔══════════════════════════════════════════════════════════════════════╗
║         SplunkForge — MITRE ATT&CK Enterprise Coverage Matrix        ║
╚══════════════════════════════════════════════════════════════════════╝

┌─ CREDENTIAL ACCESS
│  Coverage: [████████████░░░░░░░░] 4/6 (67%)
│
│  ✓  T1110.001     Brute Force: Password Guessing       [AUTH-001, AUTH-002, AUTH-003]
│  ✓  T1110.003     Brute Force: Password Spraying       [AUTH-002]
│  ✓  T1003.001     OS Credential Dumping: LSASS Memory  [LAT-002]
│  ✓  T1550.002     Pass the Hash                        [AUTH-004, LAT-001]
│  ✗  T1558         Steal Kerberos Tickets               [NO COVERAGE - GAP]
│  ✗  T1040         Network Sniffing                     [NO COVERAGE - GAP]

Splunk Dashboards

Three importable Splunk dashboard XML files:

  • Security Overview — Event volume trends, severity distribution, top source IPs, auth failure heatmap, high-severity alert feed
  • Threat Hunting — Interactive investigation workspace with process chain explorer, lateral movement indicators, DNS anomaly detection, C2 beacon analysis
  • Attack Timeline — Kill chain phase progression, per-host activity summary, cross-sourcetype correlation, attacker IP drill-down

Output Formatters

Format Use Case
Splunk JSON HTTP Event Collector (HEC) ingestion, one event per line
CSV Splunk CSV sourcetype, spreadsheet analysis
Syslog (RFC 5424) rsyslog, syslog-ng, universal SIEM input
CEF ArcSight, QRadar, any CEF-compatible SIEM

Installation

git clone https://github.com/marez8505/SplunkForge
cd SplunkForge
pip install -r requirements.txt
# or install as package
pip install -e .

Requirements: Python 3.8+, pyyaml, jinja2


Usage

Run an Attack Scenario

# Brute force attack → JSON output
python -m splunkforge scenario --type brute_force --output ./events/ --format json

# Ransomware with 100 encrypted files → CSV
python -m splunkforge scenario --type ransomware --target-count 100 --output ./events/ --format csv

# Lateral movement → syslog format (without background noise)
python -m splunkforge scenario --type lateral_movement --output ./events/ --format syslog --no-noise

# Data exfiltration → CEF format
python -m splunkforge scenario --type data_exfiltration --output ./events/ --format cef

# Insider threat scenario
python -m splunkforge scenario --type insider_threat --output ./events/ --format json

Generate Mixed Event Stream

# 1 hour of mixed events at 100/min with 5% attack ratio
python -m splunkforge generate --duration 60 --epm 100 --attack-ratio 0.05 --output ./events/

# 30 minutes, high volume, low noise
python -m splunkforge generate --duration 30 --epm 500 --attack-ratio 0.10 --format csv

Query Detection Rules

# List all rules
python -m splunkforge rules --list

# Rules for a specific MITRE technique
python -m splunkforge rules --technique T1110

# Rules for a tactic
python -m splunkforge rules --tactic "Credential Access"

# Full details for one rule
python -m splunkforge rules --id AUTH-001

# Filter by severity
python -m splunkforge rules --severity critical

# Export as Splunk savedsearches.conf
python -m splunkforge rules --export ./spl_rules.conf

ATT&CK Coverage Analysis

# Print matrix to terminal
python -m splunkforge coverage

# Write to file
python -m splunkforge coverage --output ./coverage_matrix.txt

# Show only gaps
python -m splunkforge coverage --gaps

# Full coverage report with all indicators
python -m splunkforge coverage --output ./coverage.txt --gaps

Demo Mode

# Run everything: all scenarios + rule library + coverage matrix
python -m splunkforge demo

# With custom output directory
python -m splunkforge demo --output ./demo_output/

Sample Output

Splunk JSON (HEC Format)

{"time": 1710505800.0, "host": "WKSTN-0042", "source": "splunkforge",
 "sourcetype": "WinEventLog:Security", "index": "main",
 "event": {"EventCode": 4625, "TargetUserName": "administrator",
           "IpAddress": "198.51.100.42", "LogonType": 3,
           "SubStatus": "0xC000006A", "FailureReason": "Wrong password"}}

Syslog (RFC 5424)

<131>1 2024-03-15T10:30:00.000Z WKSTN-0042 WinEvtSecurity 4248 WinEventLog_Security
[splunkforge@57032 EventCode="4625" src_ip="198.51.100.42" TargetUserName="administrator"]
EventID=4625 User=administrator Src=198.51.100.42 Severity=medium Malicious=true

CEF (Common Event Format)

<20>Mar 15 10:30:00 WKSTN-0042 CEF:0|Microsoft|Windows Security Event Log|1.0|4625|Windows Security Event|5|
rt=1710505800000 dhost=WKSTN-0042 src=198.51.100.42 duser=administrator
splunkforgeSeverity=medium splunkforgeMalicious=true

Attack Scenarios — Detailed Description

Brute Force Scenario

Duration: 30–45 minutes

Models a credential brute force attack against RDP or SSH:

  1. Phase 1 — Attacker generates 10–50 failed login attempts (Event 4625 / Failed password for...) from a single external IP, spread over a configurable window with realistic jitter
  2. Phase 2 — Successful login after the brute force (Event 4624 LogonType=10 for RDP, or Accepted password for in SSH)
  3. Phase 3 — Post-exploitation discovery: whoami, net user /domain, systeminfo, ipconfig /all, arp -a, netstat -ano, tasklist /v
  4. Phase 4 — Persistence installation via scheduled task (Event 4698) or registry Run key (Sysmon EID 13) plus optional PowerShell download cradle

Key detection opportunities: Spike in Event 4625 from one IP; brute force + success correlation; suspicious process parent-child chain; registry Run key modification in %TEMP%.


Lateral Movement Scenario

Duration: 60–120 minutes

Models an adversary who phishes in, dumps credentials, and moves through the environment:

  1. Phase 1 — Web proxy log shows download of malicious Office document; Word spawns mshta.exe (parent-child anomaly)
  2. Phase 2 — Local recon: whoami /groups, net group "Domain Admins" /domain, PowerShell LDAP query dumping user list to CSV
  3. Phase 3 — LSASS memory access (Sysmon EID 10, GrantedAccess=0x1010) simulating Mimikatz; process creation of suspicious tool
  4. Phase 4 — SMB connection to target hosts (firewall log), remote service creation (PSEXESVC pattern), network logon type 3 on pivot hosts
  5. Phase 5 — New service install on final pivot host; WMI event subscription for fileless persistence

Key detection opportunities: Office spawning LOLBin; LSASS access with suspicious grants; same user authenticating to 3+ hosts; PsExec service name pattern.


Data Exfiltration Scenario

Duration: 90–180 minutes

Models internal recon, data staging, and exfiltration with cleanup:

  1. Phase 1net view /all /domain, PowerShell ping sweep, wmic /node: share get
  2. Phase 2 — SMB share access events (Event 5140) to Finance/HR/Legal shares not normally accessed; robocopy staging to C:\Temp; file archive creation (Sysmon EID 11, large file size)
  3. Phase 3 (DNS) — 50–100 DNS queries with 30–60 char encoded subdomains (stream:dns), DNS tunneling tool process creation
  4. Phase 3 (HTTP) — Large HTTPS POST to external IP (proxy log, bytes_out > 50MB), Sysmon network connection event
  5. Phase 4wevtutil.exe cl Security, wevtutil.exe cl System, vssadmin.exe delete shadows /all /quiet, staging directory deletion

Key detection opportunities: DNS query length > 50 chars, high query rate; large POST to unknown IP; wevtutil clearing event logs; vssadmin shadow deletion.


Ransomware Scenario

Duration: 45–90 minutes

Models a Ryuk/LockBit-style attack from phishing to encryption:

  1. Phase 1 — Proxy log shows .docm download; Word spawns cmd.exe /c powershell.exe -enc <base64>
  2. Phase 2 — PowerShell encoded command event (EID 4104); HTTP GET for stage2.ps1; ransomware binary dropped (Sysmon EID 11)
  3. Phase 3Set-MpPreference -DisableRealtimeMonitoring $true; net stop WinDefend; vssadmin delete shadows /all /quiet; bcdedit /set {default} recoveryenabled no
  4. Phase 4 — Ransomware process spawned by cmd.exe; rapid Sysmon EID 11 file creation events (10–100+ files with encrypted extensions like .docx.a3f1b2)
  5. Phase 5 — Ransom note files (README_FOR_DECRYPT.txt) written to each directory; C2 callback (Sysmon network connect); registry Run key persistence

Key detection opportunities: Office macro spawning PowerShell with -enc; disabling Defender via PowerShell; mass file creation in rapid succession; vssadmin shadow deletion.


Insider Threat Scenario

Duration: 3–6 hours (spread across evening hours)

Models a malicious insider using legitimate credentials:

  1. Phase 1 — Workstation logon at 10 PM (Event 4624 LogonType=2 at unusual hour); VPN connection from home IP
  2. Phase 2 — Event 5140 share access to Finance$, HR$, Legal$ shares not normally accessed by this user; dir /s /b on sensitive share
  3. Phase 3robocopy bulk copy from shares to Desktop; large proxy traffic to OneDrive/Dropbox/Box (multiple POST requests, bytes_out > 100MB)
  4. Phase 4 (cloud) — HTTP PUT to cloud storage API (bytes_out > 200MB); OneDrive sync process network connection; elevated cloud domain DNS queries
  5. Phase 4 (USB) — Explorer.exe opening E:\ (USB drive letter); robocopy Desktop to USB drive
  6. Logoff — Normal Event 4634 logoff

Key detection opportunities: After-hours logon; share access outside of user's normal scope; large outbound to cloud storage; USB device insertion with subsequent file copy.


MITRE ATT&CK Coverage

The splunkforge coverage command generates a full coverage matrix. Current coverage by tactic:

Tactic Rules Coverage
Initial Access 2 T1566.001 via scenario metadata
Execution 5 T1059.001, T1047, T1204.002, T1218
Persistence 4 T1547.001, T1543.003, T1053.005, T1546.003
Defense Evasion 4 T1055, T1070.001, T1218, T1562.001
Credential Access 6 T1110.001/.003, T1003.001, T1550.002, T1136
Lateral Movement 5 T1021.002, T1021.006, T1055, T1550.002
Collection 2 T1074.001, T1039
Exfiltration 5 T1048.003, T1567.002, T1048, T1052.001, T1074.001
Command & Control 5 T1071.001, T1071.004, T1572, T1571, T1573

Detection gaps (identified via splunkforge coverage --gaps) represent opportunities to add new rules — this is the real value for detection engineering practice.


Splunk Integration

Importing Generated Events

Via HTTP Event Collector (HEC):

# Send JSON events via HEC
curl -k https://splunk-host:8088/services/collector/event \
  -H "Authorization: Splunk YOUR_HEC_TOKEN" \
  -d @./events/brute_force_20240315.json

Via Splunk CLI (one-shot indexing):

/opt/splunk/bin/splunk add oneshot ./brute_force_events.json \
  -index main -sourcetype splunkforge_json

Via Splunk Web:

  1. Settings → Data Inputs → Files & Directories → Add New
  2. Upload your generated .json, .csv, or .syslog file
  3. Set sourcetype: _json for JSON, csv for CSV, syslog for syslog

Importing Dashboards

  1. In Splunk Web: Settings → User Interface → Views
  2. Click Create New View → Import
  3. Paste the XML from dashboards/security_overview.xml, threat_hunting.xml, or attack_timeline.xml
  4. Save and navigate to the dashboard

Importing Detection Rules

Export rules as a savedsearches.conf file:

python -m splunkforge rules --export ./spl_rules.conf

Then place the conf file in $SPLUNK_HOME/etc/apps/YOUR_APP/local/savedsearches.conf and restart Splunk, or import individual searches through Splunk Web → Search & Reporting → Activity → Searches, Reports, and Alerts.


Detection Engineering Concepts

What is SPL?

Search Processing Language (SPL) is Splunk's query language. SplunkForge detection rules use core SPL constructs:

  • stats — Aggregate events: stats count as failures by src_ip counts failures grouped by source IP
  • where — Filter results: where failures > 10 applies the threshold
  • eval — Calculate fields: eval risk=case(failures>50,"HIGH", failures>10,"MEDIUM", true(),"LOW")
  • streamstats — Running calculations over time (used for beacon interval analysis)
  • timechart — Time-series aggregation for volume trending
  • transaction — Group related events: used for correlating brute-force + success

Correlation vs. Alerting

Detection rules trigger on individual event patterns. Correlation rules join patterns across multiple events or sourcetypes (e.g., AUTH-003 joins failures + success from the same IP). SplunkForge's more advanced rules demonstrate multi-step correlation using join, transaction, and lookup-based enrichment.

Alert Thresholds and False Positives

Every rule includes false_positive_notes. Good detection engineering starts with understanding what legitimate activity looks like before tuning thresholds. The --attack-ratio flag in generate mode lets you tune the signal-to-noise ratio to test rule sensitivity.

The Detection Gap Analysis Workflow

  1. Generate SplunkForge events for a scenario
  2. Run your detection rules against the events in Splunk
  3. Run splunkforge coverage to see which ATT&CK techniques have no rules
  4. Research the gap techniques and write new SPL rules
  5. Re-run the scenario and verify the new rules fire

This is the core loop of detection engineering.


Project Structure

SplunkForge/
├── splunkforge/
│   ├── main.py                    # CLI entry point
│   ├── utils.py                   # IP/hostname generators, constants
│   ├── generators/                # Event generators by log source
│   ├── scenarios/                 # Multi-stage attack simulators
│   ├── detection/
│   │   ├── spl_library.py         # Rule loading and query engine
│   │   ├── mitre_mapper.py        # ATT&CK coverage matrix generator
│   │   └── rules/                 # YAML rule definitions
│   └── formatters/                # JSON, CSV, syslog, CEF output
├── dashboards/                    # Splunk dashboard XML files
├── tests/                         # 145-test unittest suite
├── sample_output/                 # Output format documentation
├── requirements.txt
├── setup.py
└── LICENSE

Running Tests

pip install -r requirements.txt
python -m pytest tests/ -v

# Run specific test modules
python -m pytest tests/test_generators.py -v
python -m pytest tests/test_scenarios.py -v
python -m pytest tests/test_spl_rules.py -v
python -m pytest tests/test_formatters.py -v

145 tests covering event structure validation, scenario chronology, SPL rule YAML schema compliance, MITRE technique ID format validation, and formatter output correctness.


Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-scenario
  3. Write detection rules in the appropriate YAML file under splunkforge/detection/rules/
  4. Add corresponding tests in tests/
  5. Run python -m pytest tests/ — all tests must pass
  6. Submit a pull request

Adding Detection Rules

Rules follow a strict YAML schema (see detection/rules/authentication.yml for examples). Required fields: id, name, description, mitre_attack, severity, spl. The test suite validates schema compliance automatically.


License

MIT License — see LICENSE for full text.


References

About

Splunk-focused SIEM security event simulator and SPL detection rule library. Generates realistic attack scenario events (brute force, lateral movement, ransomware, data exfiltration, insider threat) with 30+ MITRE ATT&CK-mapped detection rules.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages