zstd: Allow specifying expected decompressed stream output size. #1129
Replies: 5 comments 2 replies
-
|
Sounds like a solid analysis, and I could see that bug existing. I will take a look tomorrow! |
Beta Was this translation helpful? Give feedback.
-
|
@chrisd8088 I think you are mixing up "blocks" and "frames". A "frame" is a header for independent content. A frame can contain one or more blocks, and optionally a checksum for the content at the end. In a frame can the last block must set a bit indicating it is the last block. So we know we cannot truncate inside a frame. But when you start appending multiple frames there is no way to know that it should expect another frame to follow. So yes, you can truncate those - and the only way to safeguard against that is to not append independent decodes. That is why Flush flushes current block, but doesn't end the frame. When you do I was worried that it was possible to truncate between blocks inside a frame, which would be an obvious bug. |
Beta Was this translation helpful? Give feedback.
-
That could certainly be the case, my apologies! I was trying to use the terminology as I understood it from this section of the documentation, where it says that the The larger issue for us is that we don't know exactly how a server might encode its data. I see from the RFC 8878 "Definitions" section that "Multiple frames can be appended into a single file or stream." Perhaps I'm still overlooking some details, but that suggests to me that a server might zstd-encode a file as a sequence of frames and serve that to a client, even if that's not what it ideally ought to do. Is there something in the specification, though, which states that's not valid behaviour? I first encountered this issue while simulating a server which sends a too-large Regardless, it would easier for us if we consistently received the We can work around this issue with something like the following, but it's a bit cumbersome. If it was possible instead to enable a "pass-through" mode for Workaround Approachtype LengthCounterReader struct {
reader io.Reader
length int64
}
func NewLengthCounterReader(r io.Reader) *LengthCounterReader {
return &LengthCounterReader{r, 0}
}
func (r *LengthCounterReader) Read(b []byte) (int, error) {
n, err := r.reader.Read(b)
r.length += int64(n)
return n, err
}
func (r *LengthCounterReader) Length() int64 {
return r.length
}
type ExpectedLengthReader struct {
reader io.Reader
lengthCounterReader *LengthCounterReader
expectedLength int64
}
func NewExpectedLengthReader(r io.Reader, lcr *LengthCounterReader, expected int64) *ExpectedLengthReader {
return &ExpectedLengthReader{r, lcr, expected}
}
func (r *ExpectedLengthReader) Read(b []byte) (int, error) {
n, err := r.reader.Read(b)
if err == io.EOF && r.lengthCounterReader.Length() < r.expectedLength {
return n, io.ErrUnexpectedEOF
}
return n, err
}
lengthCounterReader := NewLengthCounterReader(response.Body)
zstdReader, _ := zstd.NewReader(lengthCounterReader)
defer zstdReader.Close()
reader := NewExpectedLengthReader(zstdReader, lengthCounterReader, response.ContentLength)In any case, thanks very much for taking a look and explaining what's going on! |
Beta Was this translation helpful? Give feedback.
-
|
Yeah. The documentation "simplifies" the frame/block aspect. But good point, I will see if I can clarify that.
Correct. And there is no real solution for that, other than the server shouldn't do multiple frames. By definition if frames are independent, there is no way to know if there should be more frames. "gzip" has similar functionality where you can append multiple gzip files for concatenated output. I would say the "workaround" is just standard validation of the compressed output, that seems reasonable to do on the caller. Let me know if I missed anything. |
Beta Was this translation helpful? Give feedback.
-
|
Converting to feature request. Now that we have ResetWithOptions we could add a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you very much for your
zstdpackage! It's been a pleasure to start working with it, and if I've misunderstood something, I apologize!I encountered the behaviour illustrated by the reproduction program below while I was developing some tests to simulate an HTTP server sending a zstd-encoded response which gets truncated before all the data can be sent to the client. I noticed that if the truncation happens to occur on a block boundary, then zstd
Decoderwill convert anio.ErrUnexpectedEOFerror into anio.EOFerror. This makes it difficult for the caller to know if the full HTTP response has been received or not.For context, I'm one of the maintainers of the Git LFS project, and I've been reviewing a PR to add zstd support to our client program, so that it could decode zstd-encoded HTTP responses.
Our client needs to handle interrupted downloads, and in these cases, there is often a
Content-Lengthheader supplied by the server, so the total size of the encoded data is known in advance. When the HTTP response body's length is shorter than the value of theContent-Lengthheader, Go'snet/httppackage returns anio.ErrUnexpectedErrerror when reading from theBodyfield of thehttp.Responsestructure.If we pass the
Bodyfield to azstd.Decoderas an input stream, and then read from the decoder, we see theio.ErrUnexpectedErrerror but only when it doesn't happen to fall at the end of an encoded block.I wrote a short program to try to illustrate this behaviour, and it's the final example which best demonstrates the situation I encountered while writing tests for our test suite. If we insert an
io.ErrUnexpectedErrerror after the first encoded block, when we read from thezstd.Decoderwe don't receive that error, just a regulario.EOF, so we see the following:but would prefer to see:
In the context of our client, this makes it difficult to know if we actually received a complete HTTP response or not. We can work around the problem by wrapping the
http.Responsestructure'sBodyfield with anio.Readerthat counts the bytes read, then passing that tozstd.Decoder, and then wrapping its output with anotherio.Readerthat checks whenio.EOFis encountered if the length seen by the "counter" matches what we expect from theContent-Lengthheader, and if too few bytes have been counted, returning anio.ErrUnexpectedEOFerror. But this is more complicated than if thezstd.Decoderwould simply pass through theio.ErrUnexpectedEOFerror in all cases.Reproduction Program
Output from the reproduction program:
I think what's happening is that this case in the
frameDec.reset()method converts bothio.EOFandio.ErrUnexpectedEOFintoio.EOF.If I compile the
zstdpackage with its internal debugging enabled and also uncomment the extra logging in our reproduction problem, the final example outputs the following, with these two lines in particular:Thank you again so much for this
zstdpackage, and if I've made any mistakes in my description or have misunderstood something, please accept my apologies!/cc PR git-lfs/git-lfs#6196
Beta Was this translation helpful? Give feedback.
All reactions