SplunkForge

A Splunk-focused SIEM security event simulator and SPL detection rule library for detection engineers, SOC analysts, and cybersecurity students.

SplunkForge generates realistic, multi-source security event datasets — authentication logs, firewall records, Sysmon telemetry, proxy logs, and DNS queries — organized into complete multi-stage attack scenarios. It ships a library of 30+ validated SPL detection queries mapped to the MITRE ATT&CK framework, three importable Splunk dashboard XML files, and a text-based ATT&CK coverage matrix for gap analysis.

Why This Exists

Security Information and Event Management (SIEM) platforms are only as good as the detection rules running inside them. Writing good detection rules requires:

Understanding what attack activity looks like across multiple log sources simultaneously
Having realistic test data to run rules against without needing a live attack
Knowing which MITRE ATT&CK techniques your rules cover — and which ones you're blind to

SplunkForge solves all three. It generates complete, realistic attack datasets that look like actual SIEM telemetry, provides production-quality SPL detection queries for every major attack tactic, and shows you exactly where your detection coverage has gaps.

This is the toolkit I wish existed when learning detection engineering. Every event generated here reflects real attacker behavior studied from incident response reports, threat intelligence, and the ATT&CK knowledge base.

Features

Event Generators

Windows Security Event Log — EventIDs 4624, 4625, 4648, 4688, 4698, 4720, 7045 with correct field names and logon type codes
Linux auth.log / sshd — PAM authentication success and failure messages
Sysmon Telemetry — EID 1 (process create), 3 (network connect), 8 (remote thread), 10 (process access), 11 (file create), 12/13 (registry), 22 (DNS query)
Firewall Logs — Palo Alto PAN-OS style traffic logs with bytes, packets, session IDs
DNS Query Logs — Splunk Stream DNS format with TTL, record type, response data
Web Proxy Logs — Squid/Zscaler Apache Combined Log format with URL, user-agent, bytes
Web Server Access Logs — Apache and IIS W3C format with attack signature detection
Configurable timing — realistic jitter, spread intervals, events per minute
Benign baseline mixing — configurable signal-to-noise ratio for authentic datasets

Attack Scenarios (The Showpiece)

Five complete, multi-stage attack kill chains with realistic timing, cross-source correlation, and MITRE ATT&CK metadata:

Scenario	Phases	Event Types	ATT&CK Techniques
Brute Force	Failures → Success → Discovery → Persistence	WinEventLog, linux:auth, Sysmon, PowerShell	T1110.001, T1110.003, T1078, T1059.001, T1053.005
Lateral Movement	Phishing → Recon → Cred Dump → Pivot → Persist	WinEventLog, Sysmon, Proxy, Firewall	T1566.001, T1003.001, T1021.002, T1543.003
Data Exfiltration	Recon → Stage → Exfil → Cleanup	WinEventLog, DNS, Proxy, Sysmon	T1135, T1074.001, T1048.003, T1070.001
Ransomware	Phishing → Download → Disable AV → Encrypt → Ransom Note	WinEventLog, Sysmon, Proxy, PowerShell	T1566.001, T1562.001, T1490, T1486
Insider Threat	After-hours → Sensitive Shares → Bulk Copy → Cloud/USB	WinEventLog, Proxy, Sysmon	T1078, T1039, T1567.002, T1025

SPL Detection Rule Library

30+ production-quality detection queries across all major MITRE ATT&CK tactics:

Rule ID	Name	Tactic	Severity
AUTH-001	Multiple Failed Logins from Single Source	Credential Access	High
AUTH-002	Password Spraying - Low Rate Across Many Accounts	Credential Access	High
AUTH-003	Brute Force Followed by Successful Login	Credential Access	Critical
AUTH-004	Pass-the-Hash Indicator (Explicit Credential Use)	Lateral Movement	High
AUTH-005	Account Lockout Storm	Credential Access	Medium
AUTH-006	New Local Admin Account Created	Persistence	High
EXEC-001	PowerShell Encoded Command Execution	Execution	High
EXEC-002	LOLBin Spawned by Office Application (Macro)	Execution	Critical
EXEC-003	WMI Remote Command Execution	Execution	High
EXEC-004	Certutil Download Cradle	Defense Evasion	High
EXEC-005	PowerShell Script Block - Offensive Keywords	Execution	Critical
PERS-001	Suspicious Registry Run Key Modification	Persistence	High
PERS-002	Scheduled Task Created by Suspicious Process	Persistence	High
PERS-003	Service Installed in Suspicious Location	Persistence	High
PERS-004	WMI Event Subscription (Fileless Persistence)	Persistence	Critical
LAT-001	Pass-the-Hash / Lateral Movement via SMB	Lateral Movement	High
LAT-002	LSASS Memory Access (Credential Dumping)	Credential Access	Critical
LAT-003	PsExec-Style Remote Service Creation	Lateral Movement	High
LAT-004	Remote Thread Injection (Process Injection)	Defense Evasion	Critical
LAT-005	WinRM / PSRemoting Lateral Movement	Lateral Movement	High
EXFIL-001	DNS Tunneling - Long Subdomain Queries	Exfiltration	Critical
EXFIL-002	Bulk Upload to Cloud Storage	Exfiltration	High
EXFIL-003	Large Outbound Transfer to Non-Business IP	Exfiltration	High
EXFIL-004	Staging Directory - Bulk File Copy to Temp	Collection	Medium
EXFIL-005	USB Device Connected - Possible Data Theft	Exfiltration	Medium
C2-001	C2 Beaconing - Regular Connection Intervals	C2	High
C2-002	Outbound Connection on Non-Standard Port	C2	High
C2-003	DNS over HTTPS to Non-Corporate Resolver	C2	Medium
C2-004	Protocol Tunneling - Suspicious POST Volume	C2	High
C2-005	Self-Signed Certificate on External HTTPS	C2	High

MITRE ATT&CK Coverage Matrix

A text-based coverage matrix showing which techniques have detection rules and which are gaps:

╔══════════════════════════════════════════════════════════════════════╗
║         SplunkForge — MITRE ATT&CK Enterprise Coverage Matrix        ║
╚══════════════════════════════════════════════════════════════════════╝

┌─ CREDENTIAL ACCESS
│  Coverage: [████████████░░░░░░░░] 4/6 (67%)
│
│  ✓  T1110.001     Brute Force: Password Guessing       [AUTH-001, AUTH-002, AUTH-003]
│  ✓  T1110.003     Brute Force: Password Spraying       [AUTH-002]
│  ✓  T1003.001     OS Credential Dumping: LSASS Memory  [LAT-002]
│  ✓  T1550.002     Pass the Hash                        [AUTH-004, LAT-001]
│  ✗  T1558         Steal Kerberos Tickets               [NO COVERAGE - GAP]
│  ✗  T1040         Network Sniffing                     [NO COVERAGE - GAP]

Splunk Dashboards

Three importable Splunk dashboard XML files:

Security Overview — Event volume trends, severity distribution, top source IPs, auth failure heatmap, high-severity alert feed
Threat Hunting — Interactive investigation workspace with process chain explorer, lateral movement indicators, DNS anomaly detection, C2 beacon analysis
Attack Timeline — Kill chain phase progression, per-host activity summary, cross-sourcetype correlation, attacker IP drill-down

Output Formatters

Format	Use Case
Splunk JSON	HTTP Event Collector (HEC) ingestion, one event per line
CSV	Splunk CSV sourcetype, spreadsheet analysis
Syslog (RFC 5424)	rsyslog, syslog-ng, universal SIEM input
CEF	ArcSight, QRadar, any CEF-compatible SIEM

Installation

git clone https://github.com/marez8505/SplunkForge
cd SplunkForge
pip install -r requirements.txt
# or install as package
pip install -e .

Requirements: Python 3.8+, pyyaml, jinja2

Usage

Run an Attack Scenario

# Brute force attack → JSON output
python -m splunkforge scenario --type brute_force --output ./events/ --format json

# Ransomware with 100 encrypted files → CSV
python -m splunkforge scenario --type ransomware --target-count 100 --output ./events/ --format csv

# Lateral movement → syslog format (without background noise)
python -m splunkforge scenario --type lateral_movement --output ./events/ --format syslog --no-noise

# Data exfiltration → CEF format
python -m splunkforge scenario --type data_exfiltration --output ./events/ --format cef

# Insider threat scenario
python -m splunkforge scenario --type insider_threat --output ./events/ --format json

Generate Mixed Event Stream

# 1 hour of mixed events at 100/min with 5% attack ratio
python -m splunkforge generate --duration 60 --epm 100 --attack-ratio 0.05 --output ./events/

# 30 minutes, high volume, low noise
python -m splunkforge generate --duration 30 --epm 500 --attack-ratio 0.10 --format csv

Query Detection Rules

# List all rules
python -m splunkforge rules --list

# Rules for a specific MITRE technique
python -m splunkforge rules --technique T1110

# Rules for a tactic
python -m splunkforge rules --tactic "Credential Access"

# Full details for one rule
python -m splunkforge rules --id AUTH-001

# Filter by severity
python -m splunkforge rules --severity critical

# Export as Splunk savedsearches.conf
python -m splunkforge rules --export ./spl_rules.conf

ATT&CK Coverage Analysis

# Print matrix to terminal
python -m splunkforge coverage

# Write to file
python -m splunkforge coverage --output ./coverage_matrix.txt

# Show only gaps
python -m splunkforge coverage --gaps

# Full coverage report with all indicators
python -m splunkforge coverage --output ./coverage.txt --gaps

Demo Mode

# Run everything: all scenarios + rule library + coverage matrix
python -m splunkforge demo

# With custom output directory
python -m splunkforge demo --output ./demo_output/

Sample Output

Splunk JSON (HEC Format)

{"time": 1710505800.0, "host": "WKSTN-0042", "source": "splunkforge",
 "sourcetype": "WinEventLog:Security", "index": "main",
 "event": {"EventCode": 4625, "TargetUserName": "administrator",
           "IpAddress": "198.51.100.42", "LogonType": 3,
           "SubStatus": "0xC000006A", "FailureReason": "Wrong password"}}

Syslog (RFC 5424)

<131>1 2024-03-15T10:30:00.000Z WKSTN-0042 WinEvtSecurity 4248 WinEventLog_Security
[splunkforge@57032 EventCode="4625" src_ip="198.51.100.42" TargetUserName="administrator"]
EventID=4625 User=administrator Src=198.51.100.42 Severity=medium Malicious=true

CEF (Common Event Format)

<20>Mar 15 10:30:00 WKSTN-0042 CEF:0|Microsoft|Windows Security Event Log|1.0|4625|Windows Security Event|5|
rt=1710505800000 dhost=WKSTN-0042 src=198.51.100.42 duser=administrator
splunkforgeSeverity=medium splunkforgeMalicious=true

Attack Scenarios — Detailed Description

Brute Force Scenario

Duration: 30–45 minutes

Models a credential brute force attack against RDP or SSH:

Phase 1 — Attacker generates 10–50 failed login attempts (Event 4625 / Failed password for...) from a single external IP, spread over a configurable window with realistic jitter
Phase 2 — Successful login after the brute force (Event 4624 LogonType=10 for RDP, or Accepted password for in SSH)
Phase 3 — Post-exploitation discovery: whoami, net user /domain, systeminfo, ipconfig /all, arp -a, netstat -ano, tasklist /v
Phase 4 — Persistence installation via scheduled task (Event 4698) or registry Run key (Sysmon EID 13) plus optional PowerShell download cradle

Key detection opportunities: Spike in Event 4625 from one IP; brute force + success correlation; suspicious process parent-child chain; registry Run key modification in %TEMP%.

Lateral Movement Scenario

Duration: 60–120 minutes

Models an adversary who phishes in, dumps credentials, and moves through the environment:

Phase 1 — Web proxy log shows download of malicious Office document; Word spawns mshta.exe (parent-child anomaly)
Phase 2 — Local recon: whoami /groups, net group "Domain Admins" /domain, PowerShell LDAP query dumping user list to CSV
Phase 3 — LSASS memory access (Sysmon EID 10, GrantedAccess=0x1010) simulating Mimikatz; process creation of suspicious tool
Phase 4 — SMB connection to target hosts (firewall log), remote service creation (PSEXESVC pattern), network logon type 3 on pivot hosts
Phase 5 — New service install on final pivot host; WMI event subscription for fileless persistence

Key detection opportunities: Office spawning LOLBin; LSASS access with suspicious grants; same user authenticating to 3+ hosts; PsExec service name pattern.

Data Exfiltration Scenario

Duration: 90–180 minutes

Models internal recon, data staging, and exfiltration with cleanup:

Phase 1 — net view /all /domain, PowerShell ping sweep, wmic /node: share get
Phase 2 — SMB share access events (Event 5140) to Finance/HR/Legal shares not normally accessed; robocopy staging to C:\Temp; file archive creation (Sysmon EID 11, large file size)
Phase 3 (DNS) — 50–100 DNS queries with 30–60 char encoded subdomains (stream:dns), DNS tunneling tool process creation
Phase 3 (HTTP) — Large HTTPS POST to external IP (proxy log, bytes_out > 50MB), Sysmon network connection event
Phase 4 — wevtutil.exe cl Security, wevtutil.exe cl System, vssadmin.exe delete shadows /all /quiet, staging directory deletion

Key detection opportunities: DNS query length > 50 chars, high query rate; large POST to unknown IP; wevtutil clearing event logs; vssadmin shadow deletion.

Ransomware Scenario

Duration: 45–90 minutes

Models a Ryuk/LockBit-style attack from phishing to encryption:

Phase 1 — Proxy log shows .docm download; Word spawns cmd.exe /c powershell.exe -enc <base64>
Phase 2 — PowerShell encoded command event (EID 4104); HTTP GET for stage2.ps1; ransomware binary dropped (Sysmon EID 11)
Phase 3 — Set-MpPreference -DisableRealtimeMonitoring $true; net stop WinDefend; vssadmin delete shadows /all /quiet; bcdedit /set {default} recoveryenabled no
Phase 4 — Ransomware process spawned by cmd.exe; rapid Sysmon EID 11 file creation events (10–100+ files with encrypted extensions like .docx.a3f1b2)
Phase 5 — Ransom note files (README_FOR_DECRYPT.txt) written to each directory; C2 callback (Sysmon network connect); registry Run key persistence

Key detection opportunities: Office macro spawning PowerShell with -enc; disabling Defender via PowerShell; mass file creation in rapid succession; vssadmin shadow deletion.

Insider Threat Scenario

Duration: 3–6 hours (spread across evening hours)

Models a malicious insider using legitimate credentials:

Phase 1 — Workstation logon at 10 PM (Event 4624 LogonType=2 at unusual hour); VPN connection from home IP
Phase 2 — Event 5140 share access to Finance$, HR$, Legal$ shares not normally accessed by this user; dir /s /b on sensitive share
Phase 3 — robocopy bulk copy from shares to Desktop; large proxy traffic to OneDrive/Dropbox/Box (multiple POST requests, bytes_out > 100MB)
Phase 4 (cloud) — HTTP PUT to cloud storage API (bytes_out > 200MB); OneDrive sync process network connection; elevated cloud domain DNS queries
Phase 4 (USB) — Explorer.exe opening E:\ (USB drive letter); robocopy Desktop to USB drive
Logoff — Normal Event 4634 logoff

Key detection opportunities: After-hours logon; share access outside of user's normal scope; large outbound to cloud storage; USB device insertion with subsequent file copy.

MITRE ATT&CK Coverage

The splunkforge coverage command generates a full coverage matrix. Current coverage by tactic:

Tactic	Rules	Coverage
Initial Access	2	T1566.001 via scenario metadata
Execution	5	T1059.001, T1047, T1204.002, T1218
Persistence	4	T1547.001, T1543.003, T1053.005, T1546.003
Defense Evasion	4	T1055, T1070.001, T1218, T1562.001
Credential Access	6	T1110.001/.003, T1003.001, T1550.002, T1136
Lateral Movement	5	T1021.002, T1021.006, T1055, T1550.002
Collection	2	T1074.001, T1039
Exfiltration	5	T1048.003, T1567.002, T1048, T1052.001, T1074.001
Command & Control	5	T1071.001, T1071.004, T1572, T1571, T1573

Detection gaps (identified via splunkforge coverage --gaps) represent opportunities to add new rules — this is the real value for detection engineering practice.

Splunk Integration

Importing Generated Events

Via HTTP Event Collector (HEC):

# Send JSON events via HEC
curl -k https://splunk-host:8088/services/collector/event \
  -H "Authorization: Splunk YOUR_HEC_TOKEN" \
  -d @./events/brute_force_20240315.json

Via Splunk CLI (one-shot indexing):

/opt/splunk/bin/splunk add oneshot ./brute_force_events.json \
  -index main -sourcetype splunkforge_json

Via Splunk Web:

Settings → Data Inputs → Files & Directories → Add New
Upload your generated .json, .csv, or .syslog file
Set sourcetype: _json for JSON, csv for CSV, syslog for syslog

Importing Dashboards

In Splunk Web: Settings → User Interface → Views
Click Create New View → Import
Paste the XML from dashboards/security_overview.xml, threat_hunting.xml, or attack_timeline.xml
Save and navigate to the dashboard

Importing Detection Rules

Export rules as a savedsearches.conf file:

python -m splunkforge rules --export ./spl_rules.conf

Then place the conf file in $SPLUNK_HOME/etc/apps/YOUR_APP/local/savedsearches.conf and restart Splunk, or import individual searches through Splunk Web → Search & Reporting → Activity → Searches, Reports, and Alerts.

Detection Engineering Concepts

What is SPL?

Search Processing Language (SPL) is Splunk's query language. SplunkForge detection rules use core SPL constructs:

stats — Aggregate events: stats count as failures by src_ip counts failures grouped by source IP
where — Filter results: where failures > 10 applies the threshold
eval — Calculate fields: eval risk=case(failures>50,"HIGH", failures>10,"MEDIUM", true(),"LOW")
streamstats — Running calculations over time (used for beacon interval analysis)
timechart — Time-series aggregation for volume trending
transaction — Group related events: used for correlating brute-force + success

Correlation vs. Alerting

Detection rules trigger on individual event patterns. Correlation rules join patterns across multiple events or sourcetypes (e.g., AUTH-003 joins failures + success from the same IP). SplunkForge's more advanced rules demonstrate multi-step correlation using join, transaction, and lookup-based enrichment.

Alert Thresholds and False Positives

Every rule includes false_positive_notes. Good detection engineering starts with understanding what legitimate activity looks like before tuning thresholds. The --attack-ratio flag in generate mode lets you tune the signal-to-noise ratio to test rule sensitivity.

The Detection Gap Analysis Workflow

Generate SplunkForge events for a scenario
Run your detection rules against the events in Splunk
Run splunkforge coverage to see which ATT&CK techniques have no rules
Research the gap techniques and write new SPL rules
Re-run the scenario and verify the new rules fire

This is the core loop of detection engineering.

Project Structure

SplunkForge/
├── splunkforge/
│   ├── main.py                    # CLI entry point
│   ├── utils.py                   # IP/hostname generators, constants
│   ├── generators/                # Event generators by log source
│   ├── scenarios/                 # Multi-stage attack simulators
│   ├── detection/
│   │   ├── spl_library.py         # Rule loading and query engine
│   │   ├── mitre_mapper.py        # ATT&CK coverage matrix generator
│   │   └── rules/                 # YAML rule definitions
│   └── formatters/                # JSON, CSV, syslog, CEF output
├── dashboards/                    # Splunk dashboard XML files
├── tests/                         # 145-test unittest suite
├── sample_output/                 # Output format documentation
├── requirements.txt
├── setup.py
└── LICENSE

Running Tests

pip install -r requirements.txt
python -m pytest tests/ -v

# Run specific test modules
python -m pytest tests/test_generators.py -v
python -m pytest tests/test_scenarios.py -v
python -m pytest tests/test_spl_rules.py -v
python -m pytest tests/test_formatters.py -v

145 tests covering event structure validation, scenario chronology, SPL rule YAML schema compliance, MITRE technique ID format validation, and formatter output correctness.

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/new-scenario
Write detection rules in the appropriate YAML file under splunkforge/detection/rules/
Add corresponding tests in tests/
Run python -m pytest tests/ — all tests must pass
Submit a pull request

Adding Detection Rules

Rules follow a strict YAML schema (see detection/rules/authentication.yml for examples). Required fields: id, name, description, mitre_attack, severity, spl. The test suite validates schema compliance automatically.

License

MIT License — see LICENSE for full text.

References

MITRE ATT&CK Enterprise Matrix
Splunk SPL Documentation
Sysmon Event ID Reference
Windows Security Event Log Reference
LOLBAS Project
Sigma Rules — Additional detection rule inspiration
CEF Implementation Standard

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dashboards		dashboards
sample_output		sample_output
splunkforge		splunkforge
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

SplunkForge

Why This Exists

Features

Event Generators

Attack Scenarios (The Showpiece)

SPL Detection Rule Library

MITRE ATT&CK Coverage Matrix

Splunk Dashboards

Output Formatters

Installation

Usage

Run an Attack Scenario

Generate Mixed Event Stream

Query Detection Rules

ATT&CK Coverage Analysis

Demo Mode

Sample Output

Splunk JSON (HEC Format)

Syslog (RFC 5424)

CEF (Common Event Format)

Attack Scenarios — Detailed Description

Brute Force Scenario

Lateral Movement Scenario

Data Exfiltration Scenario

Ransomware Scenario

Insider Threat Scenario

MITRE ATT&CK Coverage

Splunk Integration

Importing Generated Events

Importing Dashboards

Importing Detection Rules

Detection Engineering Concepts

What is SPL?

Correlation vs. Alerting

Alert Thresholds and False Positives

The Detection Gap Analysis Workflow

Project Structure

Running Tests

Contributing

Adding Detection Rules

License

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages