Privalyse goes beyond simple regex matching by building a semantic understanding of your code.
Privalyse parses your code (AST for Python, AST/Regex for JS) to build a graph where:
- Nodes represent variables, functions, API calls, and data sources.
- Edges represent data flow (assignments, function calls, return values).
This allows the scanner to trace data from a Source (e.g., user input) to a Sink (e.g., logging, external API).
The scanner identifies "tainted" data—variables containing PII or secrets. It then propagates this taint through the graph.
Example:
email = request.form['email']->emailis tainted (Source: User Input).log_msg = f"User: {email}"->log_msgis tainted (Propagation).logging.info(log_msg)-> Leak Detected (Sink: Logging).
Privalyse resolves imports to track data flow across multiple files. If a function in utils.py returns PII, and main.py logs the result of that function, Privalyse detects the leak.
- Discovery: Find all relevant files in the project.
- Import Resolution: Build a dependency graph of modules.
- Symbol Analysis: Index functions, classes, and variables (Global Symbol Table).
- Intra-file Analysis:
- Parse code to AST.
- Identify Sources (PII, Secrets).
- Identify Sinks (APIs, Logs, DBs).
- Track data flow within the file.
- Cross-file Propagation: Connect flows between modules using the Import Graph.
- Policy Check: Verify findings against configured policies (e.g., GDPR compliance).
- Reporting: Generate output in the requested format.
- Python: Full AST-based analysis with cross-file tracking.
- JavaScript/TypeScript: Hybrid analysis (Regex + partial AST) for detecting common patterns in React/Node.js apps.