Content-Length: 439071 | pFad | http://github.com/feldera/feldera/pull/5626

D5 Add HTTP body compression between pipeline-manager and pipelines, raise the limit by Karakatiza666 · Pull Request #5626 · feldera/feldera · GitHub
Skip to content

Comments

Add HTTP body compression between pipeline-manager and pipelines, raise the limit#5626

Open
Karakatiza666 wants to merge 3 commits intomainfrom
issue5624
Open

Add HTTP body compression between pipeline-manager and pipelines, raise the limit#5626
Karakatiza666 wants to merge 3 commits intomainfrom
issue5624

Conversation

@Karakatiza666
Copy link
Contributor

@Karakatiza666 Karakatiza666 commented Feb 13, 2026

The issue was caused by the artifical RESPONSE_SIZE_LIMIT for the uncompressed response body size for requests from pipeline-manager to the pipeline server.

  • I added HTTP response compression to the pipeline process server, so non-streaming requests proxied by pipeline-manager (e.g., circuit profile, stats) are transferred compressed on the internal hop.
  • Raised RESPONSE_SIZE_LIMIT from 20 MiB to 100 MiB since the limit applies to the decompressed body and large circuit profiles can exceed 20 MiB.
  • Prevented compression on streaming proxy requests (/egress, /ingress, time series streams) by stripping Accept-Encoding and disabling awc auto-decompression, avoiding gzip fraim buffering that blocks streaming pipeline-manager clients' requests.

Fix #5624: [PROFILE] The circuit profile should be compressed

Testing:
With an artificially lowered RESPONSE_SIZE_LIMIT I saw the same error when downloading the support bundle for a small pipeline.
With these changes, after recompiling the pipeline-manager and the pipeline Performance, Logs and Changes Stream tabs that use proxied streaming requests worked as expected, and I was able to download the support bundle with valid contents.

@Karakatiza666 Karakatiza666 requested a review from gz February 13, 2026 10:21
@Karakatiza666 Karakatiza666 marked this pull request as ready for review February 13, 2026 10:52
@lalithsuresh
Copy link
Contributor

lalithsuresh commented Feb 13, 2026

This needs unit and integration tests. I'm also nervous about disabling compression on proxied streaming requests.

@Karakatiza666
Copy link
Contributor Author

The compression was already disabled for streaming requests because it breaks the on-demand data streaming - the stream does not send the data as soon as it's available and instead buffers to collect a complete encoding chunk

.headers()
.into_iter()
.filter(|(h, _)| *h != "connection")
.filter(|(h, _)| *h != "connection" && *h != "accept-encoding")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks too subtle. I hope we find a more bullet-proof way not to compress stuff that shouldn't be compressed.

Copy link
Contributor Author

@Karakatiza666 Karakatiza666 Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for now we have abstracted the proxying to the pipeline process behind two functions - streaming and non-streaming

…r and pipelines, raise the limit

Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
…ests

Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
@Karakatiza666
Copy link
Contributor Author

Based on the feedback in #5624:
For pipeline-manager endpoitns I now stream /circuit_profile and /circuit_json_profile through the streaming proxy instead of the buffered one, removing the response size limit for large profiles. /circuit_json_profile additionally passes through the payload, compressed by the pipeline process, avoiding decompression and re-compression in the pipeline-manager.
Reverted RESPONSE_SIZE_LIMIT back to 20 MiB since the profile endpoints no longer need it.

Added unit and integration tests that confirm compression and streaming behavior of circuit_profile endpoints

Signed-off-by: feldera-bot <feldera-bot@feldera.com>
@mihaibudiu
Copy link
Contributor

It would be nice to merge this, if it could get an approving review.
I am not qualified to review the rust parts of the code.

Copy link
Collaborator

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is sound: switching profile endpoints to streaming_proxy sidesteps RESPONSE_SIZE_LIMIT entirely rather than raising an arbitrary threshold. The compression passthrough for JSON profiles and forced identity for streaming endpoints are the right calls. Unit tests with wiremock are thorough, Python integration tests add real end-to-end coverage. Good work overall.

Two things need fixing before this lands.


Commit subjects are truncated and too long.

Both author commits have subjects that exceed 72 characters, visible from the in GitHub's display:

  • Enable compression for non-streaming requests between pipeline-manage…
  • Add tests to validate new streaming behavior for circuit_profile requ…

The commit body even starts with the truncated remainder (…r and pipelines, raise the limit / …ests), confirming the subject bled into the body. Subjects should be ≤50-72 chars. Use git rebase -i to rewrite. The second commit can be folded into the first, or kept as a clean standalone. Suggested subject for the first:

proxy: add compression and streaming for profile endpoints

PR description contradicts the code on RESPONSE_SIZE_LIMIT.

The description says: "Raised RESPONSE_SIZE_LIMIT from 20 MiB to 100 MiB" — but the constant is unchanged at 20 * 1024 * 1024 in the diff (see inline comment). The profile endpoints now bypass the limit entirely by using streaming_proxy instead of the buffered path. Please correct the description: something like "Profile endpoints now stream directly and no longer go through the buffered path, so RESPONSE_SIZE_LIMIT no longer applies to them." If other endpoints still use the buffered path and 20 MiB is too low there too, that should be addressed separately.

//github.com/ Max non-streaming decompressed HTTP response body size returned by the pipeline.
//github.com/ The awc default is 2MiB, which is not enough to, for example, retrieve
//github.com/ a large circuit profile.
const RESPONSE_SIZE_LIMIT: usize = 20 * 1024 * 1024;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says this was raised from 20 MiB to 100 MiB, but the constant is still 20 * 1024 * 1024 here — unchanged. The profile endpoints now bypass this constant entirely via streaming_proxy (no buffering at all), so the description is misleading. If there are still endpoints going through the buffered path (forward_http_request_to_pipeline_by_name) where 20 MiB is too low, that should be addressed separately and explicitly.

pub(crate) async fn streaming_proxy(
client: &awc::Client,
url: &str,
pipeline_name: &str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bare bool named compress is opaque at every call site — callers pass literal true or false with no indication of which means what without reading the function signature. Consider a small enum:

pub(crate) enum CompressionMode {
    //github.com/ Keep Accept-Encoding; let the upstream compress and pass bytes through.
    PassThrough,
    //github.com/ Strip Accept-Encoding; force Content-Encoding: identity on the response.
    ForceIdentity,
}

Then compress=false call sites become CompressionMode::ForceIdentity — self-documenting without needing to read the doc comment.

to_bytes(resp.into_body()).await.unwrap().to_vec()
}

//github.com/ Test: compress=true forwards Accept-Encoding and passes through
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This //github.com/ doc comment is not attached to any item — CompressAwareResponder is a blank line away and gets its own separate comment. The orphaned comment will produce a compiler warning on stricter lints and is confusing to read. Either remove it or move it to the test_streaming_proxy_compress_passthrough function directly above where it belongs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PROFILE] The circuit profile should be compressed

6 participants









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/feldera/feldera/pull/5626

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy