pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/speechbrain/speechbrain/pull/2695

s" /> Inference audio normalizer changes and use `load_audio` in more places by asumagic · Pull Request #2695 · speechbrain/speechbrain · GitHub
Skip to content

Inference audio normalizer changes and use load_audio in more places#2695

Draft
asumagic wants to merge 10 commits intospeechbrain:developfrom
asumagic:interface-read-audio-fixes
Draft

Inference audio normalizer changes and use load_audio in more places#2695
asumagic wants to merge 10 commits intospeechbrain:developfrom
asumagic:interface-read-audio-fixes

Conversation

@asumagic
Copy link
Collaborator

@asumagic asumagic commented Sep 23, 2024

What does this PR do?

WIP + need to ensure that it doesn't break things. Should behave strictly identically save for hparams misconfigurations.

Changes:

Fixes #2650

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

`load_audio` is preferred as it goes through common path handling code and normalization (for resampling and downmixing).

This relies on the audio normalizer being correctly configured for the model sample rate. Commit ccd0ed introduced functionality to infer the sample rate from `hparams.sample_rate` by default when the audio normalizer is not specified.

This should result in strictly identical behavior save for misconfigurations of the normalizer in hparams (if this code actually uses the proper tensor format that is).
Whether this is technically superior or not doesn't really matter: The rest of interfaces use the working directory, so move to that.

Additionally, v1.0.1 avoids creating symlinks, so this should avoid issues in the majority of cases anyway.
@asumagic asumagic changed the title Refactor audio loading code in some interfaces Inference audio normalizer changes and use load_audio in more places Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant

Comments

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy