Content-Length: 437094 | pFad | http://github.com/rustls/rustls/pull/2795

B1 Proof of concept of a reader-writer split (#2761) by bal-e · Pull Request #2795 · rustls/rustls · GitHub
Skip to content

Proof of concept of a reader-writer split (#2761)#2795

Draft
bal-e wants to merge 34 commits intorustls:mainfrom
bal-e:split-reader-writer-api
Draft

Proof of concept of a reader-writer split (#2761)#2795
bal-e wants to merge 34 commits intorustls:mainfrom
bal-e:split-reader-writer-api

Conversation

@bal-e
Copy link

@bal-e bal-e commented Dec 5, 2025

This is a proof of concept adding module rustls::split, an alternative (currently buffered) API which splits a post-handshake TLS connection into a Reader and Writer. It demonstrates a possible API and implementation for #2761.

The API

The split() function can split a TLS 1.2/1.3 client/server connection with a completed handshake into (Reader, Writer). These types can be used independently, e.g. from different threads or asynchronous tasks. It is sometimes necessary to communicate between them (e.g. to send alerts to a misbehaving peer, or if the peer requests we update our encryption keys). But this happens quite rarely, and the user can implement that communication using their own synchronization mechanisms (e.g. mutexes or channels).

Reader implements the reader-side functionality of ConnectionCommon. It offers recv_tls() to read from the underlying stream (equivalent to read_tls() + process_new_packets()) and reader() to access buffered plaintext data. The most important difference is that recv_tls() can return a WriterAction, which the user must communicate to the Writer.

Writer implements the writer-side functionality of ConnectionCommon. It offers send_tls() (i.e. write_tls()) and writer(). In addition, it has an enact() method to consume a WriterAction from the reader.

Justification

Splitting the reader and writer can make life a lot easier for request-response protocols. Even though TLS is an ordered stream protocol, the read and write halves of a connection are not ordered wrt each other. Instead of sending a single request, waiting for a response, and then repeating, implementations can write as many requests as I/O allows and wait for the responses to arrive. In theory, this could significantly improve throughput, especially for connections and requests with high latency; but testing it with rustls has been difficult.

With this API, it is possible that WriterActions are somewhat delayed. Is this a correctness issue? I would argue not. A TLS peer's delayed response to a message could occur because of a delayed WriterAction, or simply delays in the network; the two are indistinguishable. As long as WriterActions are processed relatively quickly, and sent to the right Writer in the right order, it will be fine. The improvement in throughput (which can now be benchmarked, with this PR) is adequate justification in my eyes.

Defensive Measures

Firstly, I want to emphasize the low-level nature of this API. Even with the unbuffered API we have today, implementation error on the user's part can lead to problems. The same is true here. We can still implement defensive measures to mitigate this, but an impossible-to-misuse API is unnecessary and unwarranted.

It is important to ensure that all WriterActions are sent to the right Writer in the right order. I am ignoring timeliness.

  • I have already implemented some defensive measures with #[must_use] annotations on Received and WriterAction; these mitigate the possibility of the WriterAction being dropped or ignored.
  • To ensure ordering, WriterActions could include a simple monotonic counter.
  • Ensuring that they reach the right Writer is more difficult; one way would be to maintain an Arc internally (as the current code does), include it in the WriterAction, and test Arc::ptr_eq.

Implementation

The rustls::split module inlines the state and methods of ClientConnection / ServerConnection, ConnectionCommon, ConnectionCore, Box, CommonState, and RecordLayer. These types combined reader and writer state and could not be used; I chose to duplicate their code rather than to break the existing codebase. The only changes to them are to make some items pub(crate)` so that I can access them.

This has resulted in a 1500-line module that combines all the important post-handshake TLS logic. It's not pretty, but it does have some benefits; every step in the process is visible at a glance and it's easier to simplify control flow. IMO, in-place encryption/decryption would be much easier to implement on this, than on the existing code; #2758 would not have been hindered the way it was.

It should be possible to refactor this module and split up its data into smaller types again; but I suspect ConnectionCommon etc. are not the ideal layout. But I don't want to put too much more effort into this right now; it's good enough to judge the API concretely and to benchmark. I don't want to waste my time if you (the maintainers) think a different approach is necessary.

Testing

Some basic initial testing shows that this code works! I was able to modify simpleclient and I can happily talk to rust-lang.org. I'll try to find tests that focus on post-handshake state, and perhaps look at BoGo.

This code needs to be integrated with rustls' benchmarking suite. I need to investigate that and figure out how to measure the maximum throughput of a request-response connection, with and without the reader-writer split. Suggestions for alternative benchmarking approaches are welcome.

Future Direction

If this approach seems feasible, I would like to make this implementation unbuffered. Now that all of the different layers of TLS logic are in one place, it's much easier to implement in-place encryption/decryption and wipe out all allocations from the hot path.

The code is only applicable to connections that have finished their handshakes. The handshaking process is significantly more complicated and would be incredibly difficult to implement like this. The core idea behind this PR is that readers and writers don't communicate that much; but that's not true during a handshake. I don't know if or how ClientConnection etc. can be implemented in terms of this code. Still, this module is already useful for users of long-lived connections; they can trade extra handshaking work for better post-handshake performance.

The code currently lacks support for TLS 1.3 session tickets. This made my life way easier, as I could then define Reader and Writer as simple concrete types. Session ticket support will require distinguishing client and server types, as well as TLS 1.2 and TLS 1.3 use. It doesn't sound fun, but it will be necessary.

Commit History / Review

I've left the commit history as-is for now. I'm happy to try to clean it up if merging this PR is on the table. While the overall diff is +1508 -40, the sum of diffs across commits is +2712 -1234; a lot of refactoring happened in-place. It may be possible to review the commits in order, but I'm sure that won't be fun. I've tried to mark down where functions come from (e.g. ConnectionCore vs CommonState), but it might not always be clear. A lot of QUIC and handshake related logic have also been removed.

Sorry about the huge chunk of code to review. I know y'all have limited time. I would suggest prioritizing the API and judging whether this design is feasible. We can talk further once I have some benchmarks.

arya dradjica added 30 commits December 5, 2025 11:17
This provides a simple, buffered API that uses locking internally.  The
goal is to incrementally strip away layers from the 'ClientConnection',
until the lock can be removed.
This introduces the reader-to-writer communication we expect, for
passing forward 'sendable_plaintext'.  At the moment, I haven't
implemented any defensive measures to ensure 'WriterActions' are
used.

NOTE: The code currently contains a logic error, similar to
<rustls#2758>.  If the traffic keys are
refreshed, the key update message is accidentally sent with the new
keys.  This will get resolved with enough destructuring.
Turns out it's not used in post-handshake 'State's.
This is a specialization of 'State::handle()' for 'conn::split'.
The common case (application data) is now handled outside the trait
object.  'Context::receive_payload()' is also inlined, making it easier
to eliminate it.
Now that the 'SideData' type isn't necessary, all the existing code is
adapted to work for both client and server sides.  This also required
splitting out the new 'State' methods into a separate trait.
'ConnectionInfo' holds the immutable parts of 'CommonState'.
In particular:
- 'has_seen_eof'
- 'has_received_close_notify'
- 'received_plaintext'
@djc
Copy link
Member

djc commented Dec 8, 2025

Thanks for working on this. I haven't looked at it in much detail, but here are some initial responses:

  • I think we should support this use case with minimal overhead in a future release.
  • I don't think we want to add ~1500 LOC just to support this case.
  • This is a bit of an awkward moment as we're deep into extracting a smaller core from the library that would potentially make this kind of thing easier, but I don't think it's the right time yet to be thinking about this aspect.

We've considered the separate unbuffered API to be a mistake, since it adds a whole bunch of parallel API which makes it difficult/tedious to make sure it has the same level of quality as the "main" API. As such, we're working on refactoring that will make the core library more modular, which I think will make it easier to slot in a reader/writer split.

@ctz
Copy link
Member

ctz commented Dec 9, 2025

Thanks for working on this.

Yes, thanks for sketching this out. I think the end result is substantially OK. I think the WriterAction is alright for low-assurance items, or places where the protocol allows some delay (like KeyUpdate). If we need anything with higher assurance there are some other options (eg, shared mutable state hidden within the objects, given care that there is no common-case contention). For alerts I believe just a single atomic would be sufficient and allow us to meet the requirements of 6.2

@bal-e
Copy link
Author

bal-e commented Dec 9, 2025

Thanks for having a look! I'm glad to hear that such a use case is considered important.

  • I don't think we want to add ~1500 LOC just to support this case.

I agree that the module can't be merged in anything close to its current state. But a nice way forward could be to implement Connection in terms of this module. That could be taken further, by removing all buffering from this module and also using it to implement UnbufferedConnection.

  • This is a bit of an awkward moment as we're deep into extracting a smaller core from the library that would potentially make this kind of thing easier, but I don't think it's the right time yet to be thinking about this aspect.

Ah, that's unfortunate. I think the ideal way to merge this API is to use it as the basis for the existing Connection / UnbufferedConnection APIs, but that would clearly require larger refactoring. I'm happy to try tackling that -- working on this has been fun, and I think I have an idea for how to approach it. However:

  1. I'm new and you have good reason not to rely on me for it.
  2. I don't know the progress of your refactoring work, and whether you want to this API before it.
  3. Redundant work is a waste of time for everyone.

I can see three paths forward:

  1. We can postpone this work and you can continue the core refactoring. Hopefully this PR helps guide the design for the new core, so it is amenable to this addition in the future.

  2. I can clean up this PR and try to implement Connection in terms of it. AFAICT that would be the fastest acceptable way to get this API merged. But Connection has to deal with handshakes, and I'm worried that would cause difficulties.

  3. Depending on the progress of your refactoring, I can try to help out there, and make sure that this use case is nicely satisfied. I'm also happy to generally help with optimizations there, particularly towards trimming allocations and trait objects.

What would you like (me) to do?

@djc
Copy link
Member

djc commented Dec 9, 2025

I can see three paths forward:

  1. We can postpone this work and you can continue the core refactoring. Hopefully this PR helps guide the design for the new core, so it is amenable to this addition in the future.

  2. I can clean up this PR and try to implement Connection in terms of it. AFAICT that would be the fastest acceptable way to get this API merged. But Connection has to deal with handshakes, and I'm worried that would cause difficulties.

  3. Depending on the progress of your refactoring, I can try to help out there, and make sure that this use case is nicely satisfied. I'm also happy to generally help with optimizations there, particularly towards trimming allocations and trait objects.

What would you like (me) to do?

I think either 1 or 3 could make sense depending on how much context transfer we think is needed to have you help out. @ctz I think you're currently working on this direction AIUI, do you think there's a meaningful chunk of work that @bal-e could shave off?

@ctz ctz mentioned this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/rustls/rustls/pull/2795

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy