Ziputil migration#231
Conversation
- Return cleanPath (not destPath) from sanitizeExtractPath so CodeQL's taint analysis sees the HasPrefix guard applied to the exact variable that is returned - Validate symlink targets on extraction: reject absolute targets and relative targets that resolve outside the extraction directory
Revert sanitizeExtractPath to (string, error): traversal entries are rejected outright rather than stripped and extracted. This pattern lets CodeQL's taint analysis see the HasPrefix guard applied to the exact variable used in file-system operations, resolving both go/zipslip and go/unsafe-unzip-symlink. extractEntry returns (true, nil) on traversal so UnZip continues processing remaining entries and returns the error at the end. Also updates function/doc comments to reflect actual behaviour.
CodeQL's taint tracking does not model custom sanitizer functions interprocedurally: it sees f.Name flowing into sanitizeExtractPath and treats the returned path as still tainted, regardless of the HasPrefix check inside the helper. Inlining the guard into extractEntry makes the check and all file-system operations visible in the same function scope, allowing CodeQL's intraprocedural analysis to confirm that destPath only reaches any sink after the HasPrefix guard passes. Remove the now-unused sanitizeExtractPath helper.
The (bool, error) skip-and-continue structure in extractEntry prevented CodeQL from verifying that file-system sinks are only reachable through a sanitized path: from UnZip's perspective, the loop kept executing even after a traversal was detected, so the taint flow from f.Name to the sinks appeared unbounded. Revert to the structure that CodeQL accepted: extractEntry returns error, sanitizeExtractPath rejects traversal entries with an error, and UnZip fails immediately on the first bad entry. CodeQL can then see that any execution path reaching a file-system operation must have passed through sanitizeExtractPath without error, meaning the HasPrefix guard held.
CodeQL's go/zipslip query explicitly recognises strings.Contains(name, "..") on the raw archive entry name as a taint sanitizer. Previous attempts using sanitizeExtractPath (a helper returning (string, error)) and the inlined HasPrefix guard both failed because CodeQL's latest analysis does not treat either pattern as sufficient at the interprocedural level. Checking strings.Contains(f.Name, "..") directly at the top of extractEntry, before any path derivation, provides a barrier that is visible to both intra- and inter-procedural analysis without going through a helper. The HasPrefix check remains as a belt-and-suspenders guard for edge cases the strings.Contains check does not cover.
The go/unsafe-unzip-symlink query requires filepath.EvalSymlinks to be called on the symlink destination path before creation. This catches chained-symlink attacks where a previously extracted entry makes the parent directory resolve outside intoDir even though the syntactic path appears safe. Both the extraction root (cleanDest) and the parent of the new symlink are resolved via EvalSymlinks so the containment check compares canonical paths. Resolving both sides is necessary on macOS where /tmp is itself a symlink to /private/tmp; without it the prefix check would fail for any extraction under /tmp. EvalSymlinks is added to OsProxy so it can be injected in future tests.
1. strings.Contains false positives: wrap the CodeQL-required outer check with slices.Contains on individual path components, so "module..framework" is accepted while ".." as a standalone component is still rejected. 2. ZipFile/ZipFiles symlink corruption: addFileToZip now detects symlinks via Lstat and routes them through addSymlinkToZip instead of following the link and storing the target's bytes as the symlink name. Matches v1 zip -y behaviour. 3. Partial zip on write failure: createZipFromDir, ZipDirs, and ZipFiles use named returns + defer to remove the destination file when any write error occurs, preventing a truncated-but-structurally-closed archive from being left behind (v1 relied on zip -T for this). 4. EvalSymlinks guard extended to file branch: the chained-symlink parent check (EvalSymlinks on cleanDest and filepath.Dir(destPath)) now runs before every regular file write, not only before symlink creation. 5. Integration test cleanup: remove TestUnZipFileCompat and TestUnZipDirectoryCompat (duplicate coverage of unit tests); rename remaining content-verification tests (TestZipFilesContents, TestZipDirsContents) and convert all t.Log/bare-block subtests to t.Run for clear failure attribution. 6. CodeQL guard comment: clarify that the strings.Contains check must remain inline on f.Name and must not be extracted into a helper function.
CodeQL's go/zipslip analysis traces f.Name from the loop in UnZip as the taint source; the strings.Contains check inside extractEntry is not visible at that level without interprocedural reasoning. Adding the same guard directly in the loop provides a barrier CodeQL can see at the site where taint originates.
The compound && guard in the UnZip loop created a code path where strings.Contains(f.Name, "..") evaluates true but execution continues to extractEntry (when the component check rules it non-traversal, e.g. module..framework). CodeQL sees f.Name containing ".." reaching file system sinks on that path and raises a new alert. The 0c32118 state — compound check only in extractEntry — had the alert fixed. Removing the loop-level duplicate restores that state.
Compound && guards break CodeQL: when strings.Contains(f.Name, "..") is true but the second operand is false (e.g. slices.Contains false for a name like module..framework), execution falls through to extractEntry with f.Name still containing "..". CodeQL sees that path as unsanitized. Use standalone strings.Contains(f.Name, "..") at both the UnZip loop and inside extractEntry. This is CodeQL's documented canonical sanitizer; it must appear at the loop source (UnZip) and must not be compounded or extracted into a helper. The HasPrefix check in extractEntry remains as the structural belt-and-suspenders guard. Remove the now-unused slices import and update TestUnZipTraversal to reflect that any entry containing ".." is rejected.
…omment 1. Standardise error messages: "dir (%s) not exist" and "directory (%s) not exist" both become "directory (%s) does not exist"; "file (%s) not exist" becomes "file (%s) does not exist". 2. Pre-compute cleanDest and realDest once in UnZip (after a MkdirAll to guarantee intoDir exists) and pass both to extractEntry. Eliminates the O(n) redundant EvalSymlinks(cleanDest) calls that previously ran on every symlink and regular-file entry. 3. Extend the strings.Contains comment in UnZip to state the tradeoff: entries whose name contains ".." as a substring (e.g. module..framework) are rejected; this is accepted because no iOS/Xcode artifact format uses such names.
Badlazzor
left a comment
There was a problem hiding this comment.
Pulled the branch, ran unit + integration tests on Go 1.26.3 — all green. The behavioral changes vs v1 are well-documented and the security posture (strict zip-slip reject, chained-symlink defense, deterministic merge semantics) is a solid upgrade. A few findings, mostly tests + small consistency nits — none blocking, listed roughly by impact:
1. Test gap on the security-critical paths. The PR body calls out the chained-symlink defense and the symlink target validation as the main hardening over v1, but no test currently exercises them:
extractEntryEvalSymlinksparent check (ziputil.go:324–330 for symlinks, :342–348 for regular files): a pre-existing symlink in the parent path should cause extraction to fail.filepath.IsAbs(targetStr)reject (ziputil.go:311–313).resolvedTargetescape via relative../../symlink target (ziputil.go:315).
These are the most attractive attack surfaces in an unzipper. Worth adding integration tests that craft archives with: (a) a benign first entry that creates evil -> /tmp/escape, then a second entry that writes through evil/foo; (b) a symlink entry with an absolute target; (c) a symlink entry with ../../etc/foo as target.
2. OsProxy interface is defined but never used for mocking. Every test calls NewZipManager(pathutil.NewPathChecker()) against the real OS, so the interface adds API surface with no current consumer. Two reasonable directions:
- Keep it and add mock-driven tests for OS error paths (e.g.,
Createfailure leaves no destination file;Symlinkfailure surfaces cleanly). - Drop it and revert to direct
os.*calls until a real second implementation exists.
I'd lean toward the second if there's no near-term plan to test OS errors — the current shape is interface-without-purpose.
3. Inconsistent abstraction in cleanup paths. ZipDirs, ZipFiles, and createZipFromDir all clean up partial output via raw os.Remove (ziputil.go:65, :115, :181) rather than z.osProxy. If osProxy is the OS boundary, this leaks — and prevents a mock from observing cleanup behavior on failure. Easy to route through osProxy.Remove (would need adding to the interface).
4. Stale comment on extractEntry (ziputil.go:288–292). "Belt-and-suspenders: UnZip pre-filters with strings.Contains before calling here, but this guard protects against any future direct callers." extractEntry is unexported, so the only possible "future direct callers" are within the same package — which is a much weaker argument. The real reason looks like defensive CodeQL-style hardening; worth saying that explicitly so future readers don't try to "simplify" by dropping the guard.
5. Tradeoff documented in the PR body isn't in godoc. The strings.Contains(f.Name, "..") guard (ziputil.go:163) rejects legitimate names like module..framework. PR body explicitly accepts the tradeoff for iOS/Xcode artifacts, but a user encountering the rejection later won't see that context. Worth a sentence in UnZip's godoc:
Entries whose name contains the substring
..anywhere (including names likefoo..bar) are rejected. This is broader than strict path-component traversal but is required for the static analysis used in this repo to recognise the sanitizer.
6. Tests are flat. Most fixtures are one-level directories with a handful of files. Worth one integration test with a deeper mix — nested subdirs, a symlink inside a subdir, an empty subdir — to stress addDirToZip + WalkDir + per-entry routing together. Right now the symlink + directory paths are tested, but not their combination.
Happy to send a follow-up PR with the test cases in #1 if useful.
Write the archive to a temp file and rename it over the destination only on success, so a pre-existing destination is left untouched on failure (matches v1's `zip -T` behaviour). Add Remove/Rename to OsProxy and consolidate the three zip-creation paths into a createZipFile helper, removing the direct os.Remove cleanup calls. Migrate the v1 zip-slip test assertion fix (check the parent of tmpDir) and add TestZipPreservesExistingDestinationOnFailure.
Document in UnZip's godoc that path-traversal rejection is by substring, so names like "foo..bar" are refused (broader than strict path-component traversal, required for the static analysis to recognise the sanitizer). Reword the duplicated guard comment in extractEntry to state the real reason it exists (keeping the sanitizer in the same function as the file-system sinks for CodeQL's go/zipslip query, plus defense-in-depth) instead of the weaker 'future direct callers' rationale.
Register ziputil's OsProxy in .mockery.yml and generate the testify mock under ziputil/mocks/osproxy, matching the fileutil convention. Add mock-driven tests for OS error paths that cannot be triggered deterministically against the real OS: temp-archive Create failure, mid-write cleanup, and final rename cleanup.
Cover the three previously-untested extraction guards: absolute symlink-target rejection, relative symlink-target escape rejection, and the chained-symlink parent check (EvalSymlinks) for both regular-file and symlink entries.
Exercise addDirToZip + WalkDir + per-entry routing together: nested subdirectories, a symlink inside a subdirectory, and an empty subdirectory, verifying the full structure and contents survive a ZipDir/UnZip round-trip.
Migrates the
ziputilpackage from v1 (shells out to/usr/bin/zipand/usr/bin/unzip) to a pure Go stdlib implementation backed byarchive/zip.PR 230 documents the v1 behavior in comments and tests to establish a clear behavioral contract before migration.
What changed
ZipDir,ZipDirs,ZipFile,ZipFiles: pure Go replacements for/usr/bin/zip -rTy/-Tyj. Directory modification times are preserved; symlinks are stored as symlinks (not followed).UnZip: pure Go replacement for/usr/bin/unzip. Restores file permissions and symlinks. Creates the destination directory if it does not exist (consistent withunzip -dbehavior).ZipManagerstruct withOsProxyinterface for testability without the real OS.Behavioral differences from v1
ZipFilesduplicate base names: v2 detects duplicates before creating the destination file and returns an error immediately, so no partial output is left behind. v1 relied on/usr/bin/zipto fail after the fact.ZipFilessymlinks: both v1 (zip -y) and v2 store symlinks as symlinks rather than following them. v2 achieves this by detectingModeSymlinkviaLstatand routing to the symlink-storage path.UnZippath-traversal (zip-slip): v1 used the systemunzip, which stripped../components, extracted the file to the sanitized path inside the destination directory, and returned a non-zero exit code. v2 rejects the first traversal entry and returns an error immediately; no further entries are extracted after detection. This is strictly safer.UnZipchained-symlink attack: v2 resolves pre-existing symlinks in the parent directory viaEvalSymlinksbefore placing each extracted file or symlink. This prevents a crafted archive from using a previously-extracted symlink to redirect an entry outside the destination. v1 had no such protection.ZipDirstemp-dir cleanup: v1 calledlog.Fatalif the temporary directory cleanup failed. v2 has no temporary directory.Tests
Every test scenario from the v1 clarification (PR 230) is carried over to v2 (some as unit tests, some as integration tests), adapted to the
ZipManagerAPI and to v2's behavioral differences (e.g. zip-slip is asserted as a hard rejection rather than strip-and-continue). This keeps the documented v1 behavioral contract under test through the migration.Unit tests cover all public functions. Integration tests (tagged
//go:build integration) verify content correctness, directory mtime preservation, same-basename merge semantics, the no-output guarantee on duplicate basenames, and zip-slip rejection.