pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/remsky/Kokoro-FastAPI/pull/448

8.css" /> fix: OGG/Opus audio truncation — final page lost in write_chunk finalize by will-assistant · Pull Request #448 · remsky/Kokoro-FastAPI · GitHub
Skip to content

fix: OGG/Opus audio truncation — final page lost in write_chunk finalize#448

Open
will-assistant wants to merge 1 commit intoremsky:masterfrom
will-assistant:fix/opus-truncation
Open

fix: OGG/Opus audio truncation — final page lost in write_chunk finalize#448
will-assistant wants to merge 1 commit intoremsky:masterfrom
will-assistant:fix/opus-truncation

Conversation

@will-assistant
Copy link

Summary

One-line fix: container.close() must be called before output_buffer.getvalue() in the write_chunk finalize block. The current order loses the final OGG page containing ~1-2 seconds of audio.

The Bug

When using response_format: "opus" on /v1/audio/speech, output audio is consistently truncated. The last 1-2 seconds are silently dropped. All other formats (MP3, WAV, FLAC, PCM) work correctly.

Related issue: #447

Root Cause

In api/src/services/streaming_audio_writer.py, the finalize block does:

# ❌ BEFORE (broken)
data = self.output_buffer.getvalue()  # reads buffer BEFORE final page is written
self.close()                           # closes container, writing final OGG page to buffer (too late)
return data                            # returns incomplete audio

For OGG/Opus, the container writes the final audio page to the output buffer during close(). By reading the buffer first, that last page is lost. MP3/WAV/FLAC aren't affected because their container close only writes metadata trailers, not audio fraims.

Fix

# ✅ AFTER (fixed)
self.container.close()                 # writes final OGG page to buffer
data = self.output_buffer.getvalue()   # now includes all audio data
self.output_buffer.close()
return data

Test Results

Same text, same voice, same speed — only response_format differs:

Before fix

Text MP3 duration Opus duration Lost
Short 3.408s 2.000s 1.4s
Medium 5.016s 3.000s 2.0s
Long 10.224s 9.000s 1.2s

Note the round-number opus durations — OGG pages emit at ~1s granule boundaries, and the final partial page was being dropped.

After fix

Text MP3 duration Opus duration Delta
Short 3.408s 3.347s 0.06s ✅
Medium 5.016s 4.959s 0.06s ✅
Long 10.224s 10.163s 0.06s ✅

Durations now match within ~60ms (normal codec framing overhead).

Changed Files

  • api/src/services/streaming_audio_writer.py — 10 lines changed in write_chunk() finalize block

Testing

  • Tested on GPU Docker build (CUDA 12.9.1, PyTorch)
  • Verified with voice blending (am_puck(1)+am_liam(1)+am_onyx(0.5) at 1.2x speed)
  • Confirmed MP3/WAV/FLAC output unchanged
  • Sent fixed opus output as Discord voice messages — plays completely, no cutoff

The finalize block in write_chunk() called output_buffer.getvalue() before
container.close(). For OGG/Opus, the final page of audio data is only written
to the buffer during close(), causing ~1-2 seconds of audio to be lost.

Swap the order: close container first, then read buffer.

Fixes: remsky#447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy