Skip to content

Ziputil migration#231

Open
godrei wants to merge 22 commits into
masterfrom
ziputil-v2
Open

Ziputil migration#231
godrei wants to merge 22 commits into
masterfrom
ziputil-v2

Conversation

@godrei

@godrei godrei commented May 22, 2026

Copy link
Copy Markdown
Contributor

Migrates the ziputil package from v1 (shells out to /usr/bin/zip and /usr/bin/unzip) to a pure Go stdlib implementation backed by archive/zip.

PR 230 documents the v1 behavior in comments and tests to establish a clear behavioral contract before migration.

What changed

  • ZipDir, ZipDirs, ZipFile, ZipFiles: pure Go replacements for /usr/bin/zip -rTy / -Tyj. Directory modification times are preserved; symlinks are stored as symlinks (not followed).
  • UnZip: pure Go replacement for /usr/bin/unzip. Restores file permissions and symlinks. Creates the destination directory if it does not exist (consistent with unzip -d behavior).
  • ZipManager struct with OsProxy interface for testability without the real OS.
  • On a write failure, the destination is left in its prior state, same as v1 (v2 writes to a temp file and renames over the destination only on success).

Behavioral differences from v1

ZipFiles duplicate base names: v2 detects duplicates before creating the destination file and returns an error immediately, so no partial output is left behind. v1 relied on /usr/bin/zip to fail after the fact.

ZipFiles symlinks: both v1 (zip -y) and v2 store symlinks as symlinks rather than following them. v2 achieves this by detecting ModeSymlink via Lstat and routing to the symlink-storage path.

UnZip path-traversal (zip-slip): v1 used the system unzip, which stripped ../ components, extracted the file to the sanitized path inside the destination directory, and returned a non-zero exit code. v2 rejects the first traversal entry and returns an error immediately; no further entries are extracted after detection. This is strictly safer.

UnZip chained-symlink attack: v2 resolves pre-existing symlinks in the parent directory via EvalSymlinks before placing each extracted file or symlink. This prevents a crafted archive from using a previously-extracted symlink to redirect an entry outside the destination. v1 had no such protection.

ZipDirs temp-dir cleanup: v1 called log.Fatal if the temporary directory cleanup failed. v2 has no temporary directory.

Tests

Every test scenario from the v1 clarification (PR 230) is carried over to v2 (some as unit tests, some as integration tests), adapted to the ZipManager API and to v2's behavioral differences (e.g. zip-slip is asserted as a hard rejection rather than strip-and-continue). This keeps the documented v1 behavioral contract under test through the migration.

Unit tests cover all public functions. Integration tests (tagged //go:build integration) verify content correctness, directory mtime preservation, same-basename merge semantics, the no-output guarantee on duplicate basenames, and zip-slip rejection.

Comment thread ziputil/ziputil.go Fixed
Comment thread ziputil/ziputil.go Fixed
godrei added 4 commits May 22, 2026 11:11
- Return cleanPath (not destPath) from sanitizeExtractPath so CodeQL's
  taint analysis sees the HasPrefix guard applied to the exact variable
  that is returned
- Validate symlink targets on extraction: reject absolute targets and
  relative targets that resolve outside the extraction directory
Comment thread ziputil/ziputil.go Fixed
Comment thread ziputil/ziputil.go Fixed
Revert sanitizeExtractPath to (string, error): traversal entries are
rejected outright rather than stripped and extracted. This pattern lets
CodeQL's taint analysis see the HasPrefix guard applied to the exact
variable used in file-system operations, resolving both go/zipslip and
go/unsafe-unzip-symlink. extractEntry returns (true, nil) on traversal
so UnZip continues processing remaining entries and returns the error
at the end.

Also updates function/doc comments to reflect actual behaviour.
Comment thread ziputil/ziputil.go Fixed
Comment thread ziputil/ziputil.go Fixed
CodeQL's taint tracking does not model custom sanitizer functions
interprocedurally: it sees f.Name flowing into sanitizeExtractPath and
treats the returned path as still tainted, regardless of the HasPrefix
check inside the helper. Inlining the guard into extractEntry makes the
check and all file-system operations visible in the same function scope,
allowing CodeQL's intraprocedural analysis to confirm that destPath only
reaches any sink after the HasPrefix guard passes. Remove the now-unused
sanitizeExtractPath helper.
Comment thread ziputil/ziputil.go Fixed
godrei added 2 commits May 29, 2026 13:44
The (bool, error) skip-and-continue structure in extractEntry prevented
CodeQL from verifying that file-system sinks are only reachable through
a sanitized path: from UnZip's perspective, the loop kept executing even
after a traversal was detected, so the taint flow from f.Name to the
sinks appeared unbounded.

Revert to the structure that CodeQL accepted: extractEntry returns error,
sanitizeExtractPath rejects traversal entries with an error, and UnZip
fails immediately on the first bad entry. CodeQL can then see that any
execution path reaching a file-system operation must have passed through
sanitizeExtractPath without error, meaning the HasPrefix guard held.
CodeQL's go/zipslip query explicitly recognises strings.Contains(name, "..")
on the raw archive entry name as a taint sanitizer. Previous attempts using
sanitizeExtractPath (a helper returning (string, error)) and the inlined
HasPrefix guard both failed because CodeQL's latest analysis does not treat
either pattern as sufficient at the interprocedural level.

Checking strings.Contains(f.Name, "..") directly at the top of extractEntry,
before any path derivation, provides a barrier that is visible to both
intra- and inter-procedural analysis without going through a helper. The
HasPrefix check remains as a belt-and-suspenders guard for edge cases the
strings.Contains check does not cover.
Comment thread ziputil/ziputil.go Fixed
godrei added 3 commits May 29, 2026 13:58
The go/unsafe-unzip-symlink query requires filepath.EvalSymlinks to be
called on the symlink destination path before creation. This catches
chained-symlink attacks where a previously extracted entry makes the
parent directory resolve outside intoDir even though the syntactic path
appears safe.

Both the extraction root (cleanDest) and the parent of the new symlink
are resolved via EvalSymlinks so the containment check compares canonical
paths. Resolving both sides is necessary on macOS where /tmp is itself a
symlink to /private/tmp; without it the prefix check would fail for any
extraction under /tmp.

EvalSymlinks is added to OsProxy so it can be injected in future tests.
1. strings.Contains false positives: wrap the CodeQL-required outer check
   with slices.Contains on individual path components, so "module..framework"
   is accepted while ".." as a standalone component is still rejected.

2. ZipFile/ZipFiles symlink corruption: addFileToZip now detects symlinks via
   Lstat and routes them through addSymlinkToZip instead of following the link
   and storing the target's bytes as the symlink name. Matches v1 zip -y behaviour.

3. Partial zip on write failure: createZipFromDir, ZipDirs, and ZipFiles use
   named returns + defer to remove the destination file when any write error
   occurs, preventing a truncated-but-structurally-closed archive from being
   left behind (v1 relied on zip -T for this).

4. EvalSymlinks guard extended to file branch: the chained-symlink parent check
   (EvalSymlinks on cleanDest and filepath.Dir(destPath)) now runs before every
   regular file write, not only before symlink creation.

5. Integration test cleanup: remove TestUnZipFileCompat and
   TestUnZipDirectoryCompat (duplicate coverage of unit tests); rename remaining
   content-verification tests (TestZipFilesContents, TestZipDirsContents) and
   convert all t.Log/bare-block subtests to t.Run for clear failure attribution.

6. CodeQL guard comment: clarify that the strings.Contains check must remain
   inline on f.Name and must not be extracted into a helper function.
CodeQL's go/zipslip analysis traces f.Name from the loop in UnZip as the
taint source; the strings.Contains check inside extractEntry is not visible
at that level without interprocedural reasoning. Adding the same guard
directly in the loop provides a barrier CodeQL can see at the site where
taint originates.
Comment thread ziputil/ziputil.go Fixed
godrei and others added 4 commits May 29, 2026 15:08
The compound && guard in the UnZip loop created a code path where
strings.Contains(f.Name, "..") evaluates true but execution continues
to extractEntry (when the component check rules it non-traversal, e.g.
module..framework). CodeQL sees f.Name containing ".." reaching file
system sinks on that path and raises a new alert.

The 0c32118 state — compound check only in extractEntry — had the alert
fixed. Removing the loop-level duplicate restores that state.
Compound && guards break CodeQL: when strings.Contains(f.Name, "..") is
true but the second operand is false (e.g. slices.Contains false for a
name like module..framework), execution falls through to extractEntry with
f.Name still containing "..". CodeQL sees that path as unsanitized.

Use standalone strings.Contains(f.Name, "..") at both the UnZip loop and
inside extractEntry. This is CodeQL's documented canonical sanitizer; it
must appear at the loop source (UnZip) and must not be compounded or
extracted into a helper. The HasPrefix check in extractEntry remains as
the structural belt-and-suspenders guard.

Remove the now-unused slices import and update TestUnZipTraversal to
reflect that any entry containing ".." is rejected.
…omment

1. Standardise error messages: "dir (%s) not exist" and "directory (%s)
   not exist" both become "directory (%s) does not exist"; "file (%s) not
   exist" becomes "file (%s) does not exist".

2. Pre-compute cleanDest and realDest once in UnZip (after a MkdirAll to
   guarantee intoDir exists) and pass both to extractEntry. Eliminates the
   O(n) redundant EvalSymlinks(cleanDest) calls that previously ran on
   every symlink and regular-file entry.

3. Extend the strings.Contains comment in UnZip to state the tradeoff:
   entries whose name contains ".." as a substring (e.g. module..framework)
   are rejected; this is accepted because no iOS/Xcode artifact format uses
   such names.

@Badlazzor Badlazzor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled the branch, ran unit + integration tests on Go 1.26.3 — all green. The behavioral changes vs v1 are well-documented and the security posture (strict zip-slip reject, chained-symlink defense, deterministic merge semantics) is a solid upgrade. A few findings, mostly tests + small consistency nits — none blocking, listed roughly by impact:

1. Test gap on the security-critical paths. The PR body calls out the chained-symlink defense and the symlink target validation as the main hardening over v1, but no test currently exercises them:

  • extractEntry EvalSymlinks parent check (ziputil.go:324–330 for symlinks, :342–348 for regular files): a pre-existing symlink in the parent path should cause extraction to fail.
  • filepath.IsAbs(targetStr) reject (ziputil.go:311–313).
  • resolvedTarget escape via relative ../../ symlink target (ziputil.go:315).

These are the most attractive attack surfaces in an unzipper. Worth adding integration tests that craft archives with: (a) a benign first entry that creates evil -> /tmp/escape, then a second entry that writes through evil/foo; (b) a symlink entry with an absolute target; (c) a symlink entry with ../../etc/foo as target.

2. OsProxy interface is defined but never used for mocking. Every test calls NewZipManager(pathutil.NewPathChecker()) against the real OS, so the interface adds API surface with no current consumer. Two reasonable directions:

  • Keep it and add mock-driven tests for OS error paths (e.g., Create failure leaves no destination file; Symlink failure surfaces cleanly).
  • Drop it and revert to direct os.* calls until a real second implementation exists.

I'd lean toward the second if there's no near-term plan to test OS errors — the current shape is interface-without-purpose.

3. Inconsistent abstraction in cleanup paths. ZipDirs, ZipFiles, and createZipFromDir all clean up partial output via raw os.Remove (ziputil.go:65, :115, :181) rather than z.osProxy. If osProxy is the OS boundary, this leaks — and prevents a mock from observing cleanup behavior on failure. Easy to route through osProxy.Remove (would need adding to the interface).

4. Stale comment on extractEntry (ziputil.go:288–292). "Belt-and-suspenders: UnZip pre-filters with strings.Contains before calling here, but this guard protects against any future direct callers." extractEntry is unexported, so the only possible "future direct callers" are within the same package — which is a much weaker argument. The real reason looks like defensive CodeQL-style hardening; worth saying that explicitly so future readers don't try to "simplify" by dropping the guard.

5. Tradeoff documented in the PR body isn't in godoc. The strings.Contains(f.Name, "..") guard (ziputil.go:163) rejects legitimate names like module..framework. PR body explicitly accepts the tradeoff for iOS/Xcode artifacts, but a user encountering the rejection later won't see that context. Worth a sentence in UnZip's godoc:

Entries whose name contains the substring .. anywhere (including names like foo..bar) are rejected. This is broader than strict path-component traversal but is required for the static analysis used in this repo to recognise the sanitizer.

6. Tests are flat. Most fixtures are one-level directories with a handful of files. Worth one integration test with a deeper mix — nested subdirs, a symlink inside a subdir, an empty subdir — to stress addDirToZip + WalkDir + per-entry routing together. Right now the symlink + directory paths are tested, but not their combination.

Happy to send a follow-up PR with the test cases in #1 if useful.

godrei and others added 5 commits June 12, 2026 15:02
Write the archive to a temp file and rename it over the destination
only on success, so a pre-existing destination is left untouched on
failure (matches v1's `zip -T` behaviour). Add Remove/Rename to OsProxy
and consolidate the three zip-creation paths into a createZipFile helper,
removing the direct os.Remove cleanup calls.

Migrate the v1 zip-slip test assertion fix (check the parent of tmpDir)
and add TestZipPreservesExistingDestinationOnFailure.
Document in UnZip's godoc that path-traversal rejection is by substring,
so names like "foo..bar" are refused (broader than strict path-component
traversal, required for the static analysis to recognise the sanitizer).

Reword the duplicated guard comment in extractEntry to state the real
reason it exists (keeping the sanitizer in the same function as the
file-system sinks for CodeQL's go/zipslip query, plus defense-in-depth)
instead of the weaker 'future direct callers' rationale.
Register ziputil's OsProxy in .mockery.yml and generate the testify mock
under ziputil/mocks/osproxy, matching the fileutil convention.

Add mock-driven tests for OS error paths that cannot be triggered
deterministically against the real OS: temp-archive Create failure,
mid-write cleanup, and final rename cleanup.
Cover the three previously-untested extraction guards: absolute
symlink-target rejection, relative symlink-target escape rejection, and
the chained-symlink parent check (EvalSymlinks) for both regular-file and
symlink entries.
@godrei godrei marked this pull request as ready for review June 12, 2026 13:57
Exercise addDirToZip + WalkDir + per-entry routing together: nested
subdirectories, a symlink inside a subdirectory, and an empty
subdirectory, verifying the full structure and contents survive a
ZipDir/UnZip round-trip.
@godrei

godrei commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@Badlazzor

  • Security-check tests added in 14555e7
  • OsProxy mocking introduced in 3025331
  • Raw os.Remove cleanup in 773c963
  • Stale extractEntry comment fixed and .. godoc tradeoff documented in ff8b5ae
  • Deep-tree test added in fa80291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants