Add HTTP body compression between pipeline-manager and pipelines, raise the limit#5626
Add HTTP body compression between pipeline-manager and pipelines, raise the limit#5626Karakatiza666 wants to merge 3 commits intomainfrom
Conversation
0356b4e to
77ce2ee
Compare
|
This needs unit and integration tests. I'm also nervous about disabling compression on proxied streaming requests. |
|
The compression was already disabled for streaming requests because it breaks the on-demand data streaming - the stream does not send the data as soon as it's available and instead buffers to collect a complete encoding chunk |
| .headers() | ||
| .into_iter() | ||
| .filter(|(h, _)| *h != "connection") | ||
| .filter(|(h, _)| *h != "connection" && *h != "accept-encoding") |
There was a problem hiding this comment.
This looks too subtle. I hope we find a more bullet-proof way not to compress stuff that shouldn't be compressed.
There was a problem hiding this comment.
Well, for now we have abstracted the proxying to the pipeline process behind two functions - streaming and non-streaming
…r and pipelines, raise the limit Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
…ests Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
77ce2ee to
2c83558
Compare
|
Based on the feedback in #5624: Added unit and integration tests that confirm compression and streaming behavior of circuit_profile endpoints |
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
|
It would be nice to merge this, if it could get an approving review. |
mythical-fred
left a comment
There was a problem hiding this comment.
The approach is sound: switching profile endpoints to streaming_proxy sidesteps RESPONSE_SIZE_LIMIT entirely rather than raising an arbitrary threshold. The compression passthrough for JSON profiles and forced identity for streaming endpoints are the right calls. Unit tests with wiremock are thorough, Python integration tests add real end-to-end coverage. Good work overall.
Two things need fixing before this lands.
Commit subjects are truncated and too long.
Both author commits have subjects that exceed 72 characters, visible from the … in GitHub's display:
Enable compression for non-streaming requests between pipeline-manage…Add tests to validate new streaming behavior for circuit_profile requ…
The commit body even starts with the truncated remainder (…r and pipelines, raise the limit / …ests), confirming the subject bled into the body. Subjects should be ≤50-72 chars. Use git rebase -i to rewrite. The second commit can be folded into the first, or kept as a clean standalone. Suggested subject for the first:
proxy: add compression and streaming for profile endpoints
PR description contradicts the code on RESPONSE_SIZE_LIMIT.
The description says: "Raised RESPONSE_SIZE_LIMIT from 20 MiB to 100 MiB" — but the constant is unchanged at 20 * 1024 * 1024 in the diff (see inline comment). The profile endpoints now bypass the limit entirely by using streaming_proxy instead of the buffered path. Please correct the description: something like "Profile endpoints now stream directly and no longer go through the buffered path, so RESPONSE_SIZE_LIMIT no longer applies to them." If other endpoints still use the buffered path and 20 MiB is too low there too, that should be addressed separately.
| //github.com/ Max non-streaming decompressed HTTP response body size returned by the pipeline. | ||
| //github.com/ The awc default is 2MiB, which is not enough to, for example, retrieve | ||
| //github.com/ a large circuit profile. | ||
| const RESPONSE_SIZE_LIMIT: usize = 20 * 1024 * 1024; |
There was a problem hiding this comment.
The PR description says this was raised from 20 MiB to 100 MiB, but the constant is still 20 * 1024 * 1024 here — unchanged. The profile endpoints now bypass this constant entirely via streaming_proxy (no buffering at all), so the description is misleading. If there are still endpoints going through the buffered path (forward_http_request_to_pipeline_by_name) where 20 MiB is too low, that should be addressed separately and explicitly.
| pub(crate) async fn streaming_proxy( | ||
| client: &awc::Client, | ||
| url: &str, | ||
| pipeline_name: &str, |
There was a problem hiding this comment.
A bare bool named compress is opaque at every call site — callers pass literal true or false with no indication of which means what without reading the function signature. Consider a small enum:
pub(crate) enum CompressionMode {
//github.com/ Keep Accept-Encoding; let the upstream compress and pass bytes through.
PassThrough,
//github.com/ Strip Accept-Encoding; force Content-Encoding: identity on the response.
ForceIdentity,
}Then compress=false call sites become CompressionMode::ForceIdentity — self-documenting without needing to read the doc comment.
| to_bytes(resp.into_body()).await.unwrap().to_vec() | ||
| } | ||
|
|
||
| //github.com/ Test: compress=true forwards Accept-Encoding and passes through |
There was a problem hiding this comment.
This //github.com/ doc comment is not attached to any item — CompressAwareResponder is a blank line away and gets its own separate comment. The orphaned comment will produce a compiler warning on stricter lints and is confusing to read. Either remove it or move it to the test_streaming_proxy_compress_passthrough function directly above where it belongs.
The issue was caused by the artifical RESPONSE_SIZE_LIMIT for the uncompressed response body size for requests from pipeline-manager to the pipeline server.
Fix #5624: [PROFILE] The circuit profile should be compressed
Testing:
With an artificially lowered RESPONSE_SIZE_LIMIT I saw the same error when downloading the support bundle for a small pipeline.
With these changes, after recompiling the pipeline-manager and the pipeline Performance, Logs and Changes Stream tabs that use proxied streaming requests worked as expected, and I was able to download the support bundle with valid contents.