zstd: Allow specifying expected decompressed stream output size. #1129

chrisd8088 · 2026-02-15T20:18:56Z

chrisd8088
Feb 15, 2026

Thank you very much for your zstd package! It's been a pleasure to start working with it, and if I've misunderstood something, I apologize!

I encountered the behaviour illustrated by the reproduction program below while I was developing some tests to simulate an HTTP server sending a zstd-encoded response which gets truncated before all the data can be sent to the client. I noticed that if the truncation happens to occur on a block boundary, then zstd Decoder will convert an io.ErrUnexpectedEOF error into an io.EOF error. This makes it difficult for the caller to know if the full HTTP response has been received or not.

For context, I'm one of the maintainers of the Git LFS project, and I've been reviewing a PR to add zstd support to our client program, so that it could decode zstd-encoded HTTP responses.

Our client needs to handle interrupted downloads, and in these cases, there is often a Content-Length header supplied by the server, so the total size of the encoded data is known in advance. When the HTTP response body's length is shorter than the value of the Content-Length header, Go's net/http package returns an io.ErrUnexpectedErr error when reading from the Body field of the http.Response structure.

If we pass the Body field to a zstd.Decoder as an input stream, and then read from the decoder, we see the io.ErrUnexpectedErr error but only when it doesn't happen to fall at the end of an encoded block.

I wrote a short program to try to illustrate this behaviour, and it's the final example which best demonstrates the situation I encountered while writing tests for our test suite. If we insert an io.ErrUnexpectedErr error after the first encoded block, when we read from the zstd.Decoder we don't receive that error, just a regular io.EOF, so we see the following:

encoded double block input with unexpected EOF after first block:
read: "test "

but would prefer to see:

encoded double block input with unexpected EOF after first block:
read: "test "
got unexpected EOF

In the context of our client, this makes it difficult to know if we actually received a complete HTTP response or not. We can work around the problem by wrapping the http.Response structure's Body field with an io.Reader that counts the bytes read, then passing that to zstd.Decoder, and then wrapping its output with another io.Reader that checks when io.EOF is encountered if the length seen by the "counter" matches what we expect from the Content-Length header, and if too few bytes have been counted, returning an io.ErrUnexpectedEOF error. But this is more complicated than if the zstd.Decoder would simply pass through the io.ErrUnexpectedEOF error in all cases.

Reproduction Program

package main

import (
	"bytes"
	"fmt"
	"io"
//	"log"
	"strings"

	"github.com/klauspost/compress/zstd"
)

type UnexpectedEOFReader struct {
	reader io.Reader
	limit  int
	count  int
}

func NewUnexpectedEOFReader(r io.Reader, limit int) *UnexpectedEOFReader {
	return &UnexpectedEOFReader{r, limit, 0}
}

func (r *UnexpectedEOFReader) Read(b []byte) (int, error) {
	n, err := r.reader.Read(b)
//	log.Printf("UnexpectedEOFReader read: %d, %x\n", n, b)
	r.count += n

	if r.count >= r.limit {
		n -= r.count - r.limit
		r.count = r.limit

//		log.Println("UnexpectedEOFReader inserting unexpected EOF")
		err = io.ErrUnexpectedEOF
	}

	return n, err
}

func read(r io.Reader, b []byte) {
	for {
		n, err := r.Read(b)

		fmt.Printf("read: %q\n", b[:n])

		if err == io.EOF {
			break
		} else if err == io.ErrUnexpectedEOF {
			fmt.Println("got unexpected EOF")
			break
		} else if err != nil {
			fmt.Println(err)
			break
		}
	}

	fmt.Println()
}

func main() {
	b := make([]byte, 100)

	r := NewUnexpectedEOFReader(strings.NewReader("test input"), 6)

	fmt.Println("string input with unexpected EOF:")
	read(r, b)

	enc, err := zstd.NewWriter(nil)
	if err != nil {
		fmt.Println(err)
		return
	}
	defer enc.Close()

	dec, err := zstd.NewReader(nil)
	if err != nil {
		fmt.Println(err)
		return
	}
	defer dec.Close()

	data := enc.EncodeAll([]byte("test input"), make([]byte, 0, 40))
	dec.Reset(bytes.NewReader(data))

	fmt.Println("encoded single block input:")
	read(dec, b)

	dec.Reset(NewUnexpectedEOFReader(bytes.NewReader(data), len(data)-6))

	fmt.Println("encoded single block input with unexpected EOF:")
	read(dec, b)

	dec.Reset(NewUnexpectedEOFReader(bytes.NewReader(data), len(data)))

	fmt.Println("encoded single block input with unexpected EOF at end:")
	read(dec, b)

	data1 := enc.EncodeAll([]byte("test "), make([]byte, 0, 40))
	data2 := enc.EncodeAll([]byte("input"), make([]byte, 0, 40))
	data = append(data1, data2...)
	dec.Reset(bytes.NewReader(data))

	fmt.Println("encoded double block input:")
	read(dec, b)

	dec.Reset(NewUnexpectedEOFReader(bytes.NewReader(data), len(data1)+6))

	fmt.Println("encoded double block input with unexpected EOF in second block:")
	read(dec, b)

	dec.Reset(NewUnexpectedEOFReader(bytes.NewReader(data), len(data)))

	fmt.Println("encoded double block input with unexpected EOF at end:")
	read(dec, b)

	dec.Reset(NewUnexpectedEOFReader(bytes.NewReader(data), len(data1)))

	fmt.Println("encoded double block input with unexpected EOF after first block:")
	read(dec, b)
}

Output from the reproduction program:

string input with unexpected EOF:
read: "test i"
got unexpected EOF

encoded single block input:
read: "test input"

encoded single block input with unexpected EOF:
read: ""
got unexpected EOF

encoded single block input with unexpected EOF at end:
read: "test input"

encoded double block input:
read: "test input"

encoded double block input with unexpected EOF in second block:
read: "test "
got unexpected EOF

encoded double block input with unexpected EOF at end:
read: "test input"

encoded double block input with unexpected EOF after first block:
read: "test "

I think what's happening is that this case in the frameDec.reset() method converts both io.EOF and io.ErrUnexpectedEOF into io.EOF.

If I compile the zstd package with its internal debugging enabled and also uncomment the extra logging in our reproduction problem, the final example outputs the following, with these two lines in particular:

2026/02/15 11:11:25 readSmall: got 0 want 1 err unexpected EOF
2026/02/15 11:11:25 Frame decoder returned EOF

encoded double block input with unexpected EOF after first block:
2026/02/15 11:11:25 New frame...
2026/02/15 11:11:25 UnexpectedEOFReader read: 1, 28
2026/02/15 11:11:25 UnexpectedEOFReader read: 3, b52ffd
2026/02/15 11:11:25 Not skippable 28b52ffd 2a4d18
2026/02/15 11:11:25 UnexpectedEOFReader read: 1, 04
2026/02/15 11:11:25 UnexpectedEOFReader read: 1, 00
2026/02/15 11:11:25 raw: 0, mantissa: 0, exponent: 0
2026/02/15 11:11:25 Frame: Dict: 0 FrameContentSize: 18446744073709551615 singleseg: false window: 1024 crc: true
2026/02/15 11:11:25 decoding new block
2026/02/15 11:11:25 UnexpectedEOFReader read: 3, 290000
2026/02/15 11:11:25 UnexpectedEOFReader read: 5, 7465737420
2026/02/15 11:11:25 Alloc History: 2048
2026/02/15 11:11:25 UnexpectedEOFReader read: 4, 1660aaae
2026/02/15 11:11:25 UnexpectedEOFReader inserting unexpected EOF
2026/02/15 11:11:25 found crc to check: aeaa6016
2026/02/15 11:11:25 New frame...
2026/02/15 11:11:25 UnexpectedEOFReader read: 1, 28
2026/02/15 11:11:25 UnexpectedEOFReader inserting unexpected EOF
2026/02/15 11:11:25 readSmall: got 0 want 1 err unexpected EOF
2026/02/15 11:11:25 Frame decoder returned EOF
2026/02/15 11:11:25 Async 1: new history, recent: [1 4 8]
2026/02/15 11:11:25 Async 1: new history, recent: [1 4 8]
2026/02/15 11:11:25 Async 2: new history
2026/02/15 11:11:25 add raw block length: 5
2026/02/15 11:11:25 fcs ok true 18446744073709551615 5
2026/02/15 11:11:25 decoder goroutines finished
2026/02/15 11:11:25 got 5 bytes, error: <nil> data crc: [22 96 170 174]
2026/02/15 11:11:25 CRC ok aeaa6016
2026/02/15 11:11:25 re-adding current decoder 0x1400015a1e0
2026/02/15 11:11:25 got 0 bytes, error: EOF data crc: [153 233 216 81]
2026/02/15 11:11:25 cancelling current
2026/02/15 11:11:25 re-adding current decoder 0x1400015a2d0, decoders: 3
2026/02/15 11:11:25 returning 5 EOF 4
read: "test "

Thank you again so much for this zstd package, and if I've made any mistakes in my description or have misunderstood something, please accept my apologies!

/cc PR git-lfs/git-lfs#6196

klauspost · 2026-02-15T20:22:30Z

klauspost
Feb 15, 2026
Maintainer

Sounds like a solid analysis, and I could see that bug existing. I will take a look tomorrow!

0 replies

klauspost · 2026-02-16T12:41:09Z

klauspost
Feb 16, 2026
Maintainer

@chrisd8088 I think you are mixing up "blocks" and "frames".

A "frame" is a header for independent content. A frame can contain one or more blocks, and optionally a checksum for the content at the end.

In a frame can the last block must set a bit indicating it is the last block. So we know we cannot truncate inside a frame.

But when you start appending multiple frames there is no way to know that it should expect another frame to follow. So yes, you can truncate those - and the only way to safeguard against that is to not append independent decodes. That is why Flush flushes current block, but doesn't end the frame.

When you do EncodeAll you get a frame for each. When you truncate you are just sending the first frame, which is a perfectly valid input.

I was worried that it was possible to truncate between blocks inside a frame, which would be an obvious bug.

0 replies

chrisd8088 · 2026-02-16T20:19:28Z

chrisd8088
Feb 16, 2026
Author

I think you are mixing up "blocks" and "frames".

That could certainly be the case, my apologies!

I was trying to use the terminology as I understood it from this section of the documentation, where it says that the EncodeAll() method is for "compressing small blocks", and that "Encoded blocks can be concatenated". IIUC, this method actually generates complete frames, is that correct?

The larger issue for us is that we don't know exactly how a server might encode its data. I see from the RFC 8878 "Definitions" section that "Multiple frames can be appended into a single file or stream."

Perhaps I'm still overlooking some details, but that suggests to me that a server might zstd-encode a file as a sequence of frames and serve that to a client, even if that's not what it ideally ought to do. Is there something in the specification, though, which states that's not valid behaviour?

I first encountered this issue while simulating a server which sends a too-large Content-Length header but properly encoded and complete data. Because the (single) frame of encoded data terminates correctly, the Decoder returns io.EOF, even though the underlying http.Response's Body sends io.ErrUnexpectedEOF to the Decoder because the Content-Length is too large. To replicate this behaviour consistently I then started experimenting with multiple concatenated frames (not blocks, as I first called them).

Regardless, it would easier for us if we consistently received the io.ErrUnexpectedEOF error from the underlying source stream, even if a frame is complete and there's no additional data to be read from the stream.

We can work around this issue with something like the following, but it's a bit cumbersome. If it was possible instead to enable a "pass-through" mode for io.ErrUnexpectedEOF errors, perhaps as an option via a WithDecoder* function, that would of course be simpler.

Workaround Approach

type LengthCounterReader struct {
	reader io.Reader
	length int64
}

func NewLengthCounterReader(r io.Reader) *LengthCounterReader {
	return &LengthCounterReader{r, 0}
}

func (r *LengthCounterReader) Read(b []byte) (int, error) {
	n, err := r.reader.Read(b)
	r.length += int64(n)

	return n, err
}

func (r *LengthCounterReader) Length() int64 {
	return r.length
}

type ExpectedLengthReader struct {
	reader              io.Reader
	lengthCounterReader *LengthCounterReader
	expectedLength      int64
}

func NewExpectedLengthReader(r io.Reader, lcr *LengthCounterReader, expected int64) *ExpectedLengthReader {
	return &ExpectedLengthReader{r, lcr, expected}
}

func (r *ExpectedLengthReader) Read(b []byte) (int, error) {
	n, err := r.reader.Read(b)

	if err == io.EOF && r.lengthCounterReader.Length() < r.expectedLength {
		return n, io.ErrUnexpectedEOF
	}

	return n, err
}

lengthCounterReader := NewLengthCounterReader(response.Body)

zstdReader, _ := zstd.NewReader(lengthCounterReader)
defer zstdReader.Close()

reader := NewExpectedLengthReader(zstdReader, lengthCounterReader, response.ContentLength)

In any case, thanks very much for taking a look and explaining what's going on!

0 replies

klauspost · 2026-02-17T15:01:13Z

klauspost
Feb 17, 2026
Maintainer

Yeah. The documentation "simplifies" the frame/block aspect. But good point, I will see if I can clarify that.

The larger issue for us is that we don't know exactly how a server might encode its data. I see from the RFC 8878 "Definitions" section that "Multiple frames can be appended into a single file or stream."

Correct. And there is no real solution for that, other than the server shouldn't do multiple frames. By definition if frames are independent, there is no way to know if there should be more frames. "gzip" has similar functionality where you can append multiple gzip files for concatenated output.

I would say the "workaround" is just standard validation of the compressed output, that seems reasonable to do on the caller.

Let me know if I missed anything.

2 replies

chrisd8088 Feb 17, 2026
Author

The documentation "simplifies" the frame/block aspect. But good point, I will see if I can clarify that.

Got it, I see. Thank you!

I would say the "workaround" is just standard validation of the compressed output, that seems reasonable to do on the caller.

In our particular use case, we're fortunate in that we identify all our Git LFS objects with SHA-256 hashes and our client already validates every download afterwards using those hashes, so there's no chance we might provide corrupted data to the user. For us, the issue of handling EOF pertains to our download retry logic: if we get io.ErrUnexpectedEOF we'll try to fetch the remaining data; otherwise we repeat the original request if the SHAs don't match, which just wastes some time and bandwidth if there's a lot of data to download.

Since the RFC 8878 explicitly allows for multiple frames in a stream, I thought that other applications might be surprised if they think the source stream ended normally with io.EOF because the Decoder returns io.EOF, when in fact the source stream ended with io.ErrUnexpectedEOF to signal that it was interrupted; it just so happened that the interruption landed on a frame boundary.

However, the standard Go compress/gzip package appears to also mask io.ErrUnexpectedEOF in ways I don't quite understand, so this may be standard behaviour for encoding/decoding packages.

In any case, this is all outside my own area of expertise, so I'll let others take over the conversation from here. Again, thanks very much for taking a look and explaining how the pieces all fit together, and for providing such a useful Go package!

klauspost Feb 18, 2026
Maintainer

Concatenated gzip files will be seen "in the wild". There are even some compressors that use it to parallize work. So for gzip (and zstd) to behave the same as the commandline utils they have to behave the same.

For gzip I added Reader.Multistream(bool) that allows to change the default behaviour. While the same could potentially be added to zstd, I don't really see how that would benefit you, since the truncation question would be the same.

klauspost · 2026-02-17T15:58:52Z

klauspost
Feb 17, 2026
Maintainer

Converting to feature request.

Now that we have ResetWithOptions we could add a DecoderWantSize(n int64) option that validates output against an expected size and returns an error if size isn't exactly 'n'.

0 replies

zstd: Allow specifying expected decompressed stream output size. #1129

Uh oh!

chrisd8088 Feb 15, 2026

Replies: 5 comments · 2 replies

Uh oh!

Uh oh!

klauspost Feb 15, 2026 Maintainer

Uh oh!

Uh oh!

klauspost Feb 16, 2026 Maintainer

Uh oh!

chrisd8088 Feb 16, 2026 Author

Uh oh!

Uh oh!

klauspost Feb 17, 2026 Maintainer

Uh oh!

chrisd8088 Feb 17, 2026 Author

Uh oh!

klauspost Feb 18, 2026 Maintainer

Uh oh!

klauspost Feb 17, 2026 Maintainer

chrisd8088
Feb 15, 2026

Replies: 5 comments 2 replies

klauspost
Feb 15, 2026
Maintainer

klauspost
Feb 16, 2026
Maintainer

chrisd8088
Feb 16, 2026
Author

klauspost
Feb 17, 2026
Maintainer

chrisd8088 Feb 17, 2026
Author

klauspost Feb 18, 2026
Maintainer

klauspost
Feb 17, 2026
Maintainer