Ziputil migration by godrei · Pull Request #231 · bitrise-io/go-utils

godrei · 2026-05-22T09:07:44Z

Migrates the ziputil package from v1 (shells out to /usr/bin/zip and /usr/bin/unzip) to a pure Go stdlib implementation backed by archive/zip.

PR 230 documents the v1 behavior in comments and tests to establish a clear behavioral contract before migration.

What changed

ZipDir, ZipDirs, ZipFile, ZipFiles: pure Go replacements for /usr/bin/zip -rTy / -Tyj. Directory modification times are preserved; symlinks are stored as symlinks (not followed).
UnZip: pure Go replacement for /usr/bin/unzip. Restores file permissions and symlinks. Creates the destination directory if it does not exist (consistent with unzip -d behavior).
ZipManager struct with OsProxy interface for testability without the real OS.
On a write failure, the destination is left in its prior state, same as v1 (v2 writes to a temp file and renames over the destination only on success).

Behavioral differences from v1

ZipFiles duplicate base names: v2 detects duplicates before creating the destination file and returns an error immediately, so no partial output is left behind. v1 relied on /usr/bin/zip to fail after the fact.

ZipFiles symlinks: both v1 (zip -y) and v2 store symlinks as symlinks rather than following them. v2 achieves this by detecting ModeSymlink via Lstat and routing to the symlink-storage path.

UnZip path-traversal (zip-slip): v1 used the system unzip, which stripped ../ components, extracted the file to the sanitized path inside the destination directory, and returned a non-zero exit code. v2 rejects the first traversal entry and returns an error immediately; no further entries are extracted after detection. This is strictly safer.

UnZip chained-symlink attack: v2 resolves pre-existing symlinks in the parent directory via EvalSymlinks before placing each extracted file or symlink. This prevents a crafted archive from using a previously-extracted symlink to redirect an entry outside the destination. v1 had no such protection.

ZipDirs temp-dir cleanup: v1 called log.Fatal if the temporary directory cleanup failed. v2 has no temporary directory.

Tests

Every test scenario from the v1 clarification (PR 230) is carried over to v2 (some as unit tests, some as integration tests), adapted to the ZipManager API and to v2's behavioral differences (e.g. zip-slip is asserted as a hard rejection rather than strip-and-continue). This keeps the documented v1 behavioral contract under test through the migration.

Unit tests cover all public functions. Integration tests (tagged //go:build integration) verify content correctness, directory mtime preservation, same-basename merge semantics, the no-output guarantee on duplicate basenames, and zip-slip rejection.

- Return cleanPath (not destPath) from sanitizeExtractPath so CodeQL's taint analysis sees the HasPrefix guard applied to the exact variable that is returned - Validate symlink targets on extraction: reject absolute targets and relative targets that resolve outside the extraction directory

Revert sanitizeExtractPath to (string, error): traversal entries are rejected outright rather than stripped and extracted. This pattern lets CodeQL's taint analysis see the HasPrefix guard applied to the exact variable used in file-system operations, resolving both go/zipslip and go/unsafe-unzip-symlink. extractEntry returns (true, nil) on traversal so UnZip continues processing remaining entries and returns the error at the end. Also updates function/doc comments to reflect actual behaviour.

CodeQL's taint tracking does not model custom sanitizer functions interprocedurally: it sees f.Name flowing into sanitizeExtractPath and treats the returned path as still tainted, regardless of the HasPrefix check inside the helper. Inlining the guard into extractEntry makes the check and all file-system operations visible in the same function scope, allowing CodeQL's intraprocedural analysis to confirm that destPath only reaches any sink after the HasPrefix guard passes. Remove the now-unused sanitizeExtractPath helper.

The (bool, error) skip-and-continue structure in extractEntry prevented CodeQL from verifying that file-system sinks are only reachable through a sanitized path: from UnZip's perspective, the loop kept executing even after a traversal was detected, so the taint flow from f.Name to the sinks appeared unbounded. Revert to the structure that CodeQL accepted: extractEntry returns error, sanitizeExtractPath rejects traversal entries with an error, and UnZip fails immediately on the first bad entry. CodeQL can then see that any execution path reaching a file-system operation must have passed through sanitizeExtractPath without error, meaning the HasPrefix guard held.

CodeQL's go/zipslip query explicitly recognises strings.Contains(name, "..") on the raw archive entry name as a taint sanitizer. Previous attempts using sanitizeExtractPath (a helper returning (string, error)) and the inlined HasPrefix guard both failed because CodeQL's latest analysis does not treat either pattern as sufficient at the interprocedural level. Checking strings.Contains(f.Name, "..") directly at the top of extractEntry, before any path derivation, provides a barrier that is visible to both intra- and inter-procedural analysis without going through a helper. The HasPrefix check remains as a belt-and-suspenders guard for edge cases the strings.Contains check does not cover.

The go/unsafe-unzip-symlink query requires filepath.EvalSymlinks to be called on the symlink destination path before creation. This catches chained-symlink attacks where a previously extracted entry makes the parent directory resolve outside intoDir even though the syntactic path appears safe. Both the extraction root (cleanDest) and the parent of the new symlink are resolved via EvalSymlinks so the containment check compares canonical paths. Resolving both sides is necessary on macOS where /tmp is itself a symlink to /private/tmp; without it the prefix check would fail for any extraction under /tmp. EvalSymlinks is added to OsProxy so it can be injected in future tests.

1. strings.Contains false positives: wrap the CodeQL-required outer check with slices.Contains on individual path components, so "module..framework" is accepted while ".." as a standalone component is still rejected. 2. ZipFile/ZipFiles symlink corruption: addFileToZip now detects symlinks via Lstat and routes them through addSymlinkToZip instead of following the link and storing the target's bytes as the symlink name. Matches v1 zip -y behaviour. 3. Partial zip on write failure: createZipFromDir, ZipDirs, and ZipFiles use named returns + defer to remove the destination file when any write error occurs, preventing a truncated-but-structurally-closed archive from being left behind (v1 relied on zip -T for this). 4. EvalSymlinks guard extended to file branch: the chained-symlink parent check (EvalSymlinks on cleanDest and filepath.Dir(destPath)) now runs before every regular file write, not only before symlink creation. 5. Integration test cleanup: remove TestUnZipFileCompat and TestUnZipDirectoryCompat (duplicate coverage of unit tests); rename remaining content-verification tests (TestZipFilesContents, TestZipDirsContents) and convert all t.Log/bare-block subtests to t.Run for clear failure attribution. 6. CodeQL guard comment: clarify that the strings.Contains check must remain inline on f.Name and must not be extracted into a helper function.

CodeQL's go/zipslip analysis traces f.Name from the loop in UnZip as the taint source; the strings.Contains check inside extractEntry is not visible at that level without interprocedural reasoning. Adding the same guard directly in the loop provides a barrier CodeQL can see at the site where taint originates.

The compound && guard in the UnZip loop created a code path where strings.Contains(f.Name, "..") evaluates true but execution continues to extractEntry (when the component check rules it non-traversal, e.g. module..framework). CodeQL sees f.Name containing ".." reaching file system sinks on that path and raises a new alert. The 0c32118 state — compound check only in extractEntry — had the alert fixed. Removing the loop-level duplicate restores that state.

Compound && guards break CodeQL: when strings.Contains(f.Name, "..") is true but the second operand is false (e.g. slices.Contains false for a name like module..framework), execution falls through to extractEntry with f.Name still containing "..". CodeQL sees that path as unsanitized. Use standalone strings.Contains(f.Name, "..") at both the UnZip loop and inside extractEntry. This is CodeQL's documented canonical sanitizer; it must appear at the loop source (UnZip) and must not be compounded or extracted into a helper. The HasPrefix check in extractEntry remains as the structural belt-and-suspenders guard. Remove the now-unused slices import and update TestUnZipTraversal to reflect that any entry containing ".." is rejected.

…omment 1. Standardise error messages: "dir (%s) not exist" and "directory (%s) not exist" both become "directory (%s) does not exist"; "file (%s) not exist" becomes "file (%s) does not exist". 2. Pre-compute cleanDest and realDest once in UnZip (after a MkdirAll to guarantee intoDir exists) and pass both to extractEntry. Eliminates the O(n) redundant EvalSymlinks(cleanDest) calls that previously ran on every symlink and regular-file entry. 3. Extend the strings.Contains comment in UnZip to state the tradeoff: entries whose name contains ".." as a substring (e.g. module..framework) are rejected; this is accepted because no iOS/Xcode artifact format uses such names.

Badlazzor

Pulled the branch, ran unit + integration tests on Go 1.26.3 — all green. The behavioral changes vs v1 are well-documented and the security posture (strict zip-slip reject, chained-symlink defense, deterministic merge semantics) is a solid upgrade. A few findings, mostly tests + small consistency nits — none blocking, listed roughly by impact:

1. Test gap on the security-critical paths. The PR body calls out the chained-symlink defense and the symlink target validation as the main hardening over v1, but no test currently exercises them:

extractEntry EvalSymlinks parent check (ziputil.go:324–330 for symlinks, :342–348 for regular files): a pre-existing symlink in the parent path should cause extraction to fail.
filepath.IsAbs(targetStr) reject (ziputil.go:311–313).
resolvedTarget escape via relative ../../ symlink target (ziputil.go:315).

These are the most attractive attack surfaces in an unzipper. Worth adding integration tests that craft archives with: (a) a benign first entry that creates evil -> /tmp/escape, then a second entry that writes through evil/foo; (b) a symlink entry with an absolute target; (c) a symlink entry with ../../etc/foo as target.

2. OsProxy interface is defined but never used for mocking. Every test calls NewZipManager(pathutil.NewPathChecker()) against the real OS, so the interface adds API surface with no current consumer. Two reasonable directions:

Keep it and add mock-driven tests for OS error paths (e.g., Create failure leaves no destination file; Symlink failure surfaces cleanly).
Drop it and revert to direct os.* calls until a real second implementation exists.

I'd lean toward the second if there's no near-term plan to test OS errors — the current shape is interface-without-purpose.

3. Inconsistent abstraction in cleanup paths. ZipDirs, ZipFiles, and createZipFromDir all clean up partial output via raw os.Remove (ziputil.go:65, :115, :181) rather than z.osProxy. If osProxy is the OS boundary, this leaks — and prevents a mock from observing cleanup behavior on failure. Easy to route through osProxy.Remove (would need adding to the interface).

4. Stale comment on extractEntry (ziputil.go:288–292). "Belt-and-suspenders: UnZip pre-filters with strings.Contains before calling here, but this guard protects against any future direct callers." extractEntry is unexported, so the only possible "future direct callers" are within the same package — which is a much weaker argument. The real reason looks like defensive CodeQL-style hardening; worth saying that explicitly so future readers don't try to "simplify" by dropping the guard.

5. Tradeoff documented in the PR body isn't in godoc. The strings.Contains(f.Name, "..") guard (ziputil.go:163) rejects legitimate names like module..framework. PR body explicitly accepts the tradeoff for iOS/Xcode artifacts, but a user encountering the rejection later won't see that context. Worth a sentence in UnZip's godoc:

Entries whose name contains the substring .. anywhere (including names like foo..bar) are rejected. This is broader than strict path-component traversal but is required for the static analysis used in this repo to recognise the sanitizer.

6. Tests are flat. Most fixtures are one-level directories with a handful of files. Worth one integration test with a deeper mix — nested subdirs, a symlink inside a subdir, an empty subdir — to stress addDirToZip + WalkDir + per-entry routing together. Right now the symlink + directory paths are tested, but not their combination.

Happy to send a follow-up PR with the test cases in #1 if useful.

Write the archive to a temp file and rename it over the destination only on success, so a pre-existing destination is left untouched on failure (matches v1's `zip -T` behaviour). Add Remove/Rename to OsProxy and consolidate the three zip-creation paths into a createZipFile helper, removing the direct os.Remove cleanup calls. Migrate the v1 zip-slip test assertion fix (check the parent of tmpDir) and add TestZipPreservesExistingDestinationOnFailure.

Document in UnZip's godoc that path-traversal rejection is by substring, so names like "foo..bar" are refused (broader than strict path-component traversal, required for the static analysis to recognise the sanitizer). Reword the duplicated guard comment in extractEntry to state the real reason it exists (keeping the sanitizer in the same function as the file-system sinks for CodeQL's go/zipslip query, plus defense-in-depth) instead of the weaker 'future direct callers' rationale.

Register ziputil's OsProxy in .mockery.yml and generate the testify mock under ziputil/mocks/osproxy, matching the fileutil convention. Add mock-driven tests for OS error paths that cannot be triggered deterministically against the real OS: temp-archive Create failure, mid-write cleanup, and final rename cleanup.

Cover the three previously-untested extraction guards: absolute symlink-target rejection, relative symlink-target escape rejection, and the chained-symlink parent check (EvalSymlinks) for both regular-file and symlink entries.

Exercise addDirToZip + WalkDir + per-entry routing together: nested subdirectories, a symlink inside a subdirectory, and an empty subdirectory, verifying the full structure and contents survive a ZipDir/UnZip round-trip.

godrei · 2026-06-12T14:14:11Z

@Badlazzor

Security-check tests added in 14555e7
OsProxy mocking introduced in 3025331
Raw os.Remove cleanup in 773c963
Stale extractEntry comment fixed and .. godoc tradeoff documented in ff8b5ae
Deep-tree test added in fa80291

Ziputil migration

240cc24

github-advanced-security AI found potential problems May 22, 2026

View reviewed changes

Comment thread ziputil/ziputil.go Fixed

Comment thread ziputil/ziputil.go Fixed

godrei added 4 commits May 22, 2026 11:11

Fix golangci-lint revive warning on RealOS.OpenFile

bd2739b

Copy existing ziputil v1 integration tests

6f59954

Fix v2 integration tests

8fb85e6

github-advanced-security AI found potential problems May 22, 2026

View reviewed changes

Comment thread ziputil/ziputil.go Fixed

Comment thread ziputil/ziputil.go Fixed

github-advanced-security AI found potential problems May 27, 2026

View reviewed changes

Comment thread ziputil/ziputil.go Fixed

Comment thread ziputil/ziputil.go Fixed

github-advanced-security AI found potential problems May 27, 2026

View reviewed changes

Comment thread ziputil/ziputil.go Fixed

godrei added 2 commits May 29, 2026 13:44

github-advanced-security AI found potential problems May 29, 2026

View reviewed changes

Comment thread ziputil/ziputil.go Fixed

godrei added 3 commits May 29, 2026 13:58

github-advanced-security AI found potential problems May 29, 2026

View reviewed changes

Comment thread ziputil/ziputil.go Fixed

godrei and others added 4 commits May 29, 2026 15:08

Merge branch 'master' into ziputil-v2

9427c56

Badlazzor reviewed Jun 12, 2026

View reviewed changes

godrei and others added 5 commits June 12, 2026 15:02

ziputil: add unit tests for UnZip symlink security guards

14555e7

Cover the three previously-untested extraction guards: absolute symlink-target rejection, relative symlink-target escape rejection, and the chained-symlink parent check (EvalSymlinks) for both regular-file and symlink entries.

Merge branch 'master' into ziputil-v2

591b1bf

godrei marked this pull request as ready for review June 12, 2026 13:57

ziputil: add deep-tree round-trip integration test

fa80291

Exercise addDirToZip + WalkDir + per-entry routing together: nested subdirectories, a symlink inside a subdirectory, and an empty subdirectory, verifying the full structure and contents survive a ZipDir/UnZip round-trip.

Badlazzor approved these changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ziputil migration#231

Ziputil migration#231
godrei wants to merge 22 commits into
masterfrom
ziputil-v2

godrei commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Badlazzor left a comment

Uh oh!

godrei commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

godrei commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Behavioral differences from v1

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Badlazzor left a comment

Choose a reason for hiding this comment

Uh oh!

godrei commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

godrei commented May 22, 2026 •

edited

Loading