pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/speechbrain/speechbrain/pull/3036

" /> fix spectrogram drop mask staring position by gfdb · Pull Request #3036 · speechbrain/speechbrain · GitHub
Skip to content

fix spectrogram drop mask staring position#3036

Open
gfdb wants to merge 1 commit intospeechbrain:developfrom
gfdb:fix-spectrogram-drop
Open

fix spectrogram drop mask staring position#3036
gfdb wants to merge 1 commit intospeechbrain:developfrom
gfdb:fix-spectrogram-drop

Conversation

@gfdb
Copy link
Contributor

@gfdb gfdb commented Feb 17, 2026

What does this PR do?

In the implementation of SpectrogramDrop we get the bounds for the masks by first randomly sampling the mask starting position, i.e. the index of where the mask should begin. The correct way to set the upper bound for sampling this index is to take the length of the dimension we are applying the mask over, let's call it D, and subtract the mask's length from it. This is done so the mask doesn't go out of bounds.

However, in the current implementation, instead of subtracting the mask_len from D, we are accidentally passing it as an argument to max() due to what is almost certainly a typo in the form of an extra comma. Resulting in us sampling from [0, D[ instead of [0, D-mask_len[.

How does it not go out of bounds?

The final mask gets computed as

arange = torch.arange(D, device=spectrogram.device).view(1, 1, -1)
mask = (mask_pos <= arange) * (arange < (mask_pos + mask_len))

Where arange only goes up to D. We are comparing indices, so even if mask_pos + mask_len exceeds D, the arange comparison just truncates it.

Why is this bad?

This is bad because the current implementation allows, for example, the last element in the slice to be sampled as the starting position for the mask i.e. no masking. Essentially, any sample drawn from [D-mask_len, D] will result in a partial mask.

How does this affect augmentation strength in existing recipes?

This bug makes the augmentation weaker. The longer the utterances are, the less it matters, but for extremely short utterances it matters a lot. Existing recipes that had their hparams tuned based on the current implementation may have higher drop_count / drop_length upper and lower bounds. But again, the longer the utterances, the less of a big deal it is.

How to fix?

If we wanted to keep the existing augmentation strength, we could just set the upper bound directly to D instead of doing the max call that always yields D.

If we think the change in strength is nbd, the simplest way to correct the mistake is the current fix I have on the branch (just removing the comma), but this bases the maximum mask starting position on the longest mask we sampled, again not totally correct but probably not a huge deal.

The most correct way to fix it would be to do something to the effect of:

mask_pos = (torch.rand(batch_size, n_masks, 1, device=spectrogram.device) * (D - mask_len))

so that way each gets their own correct maximum mask starting position.

The comma key and it's consequences...

Fixes #<issue_number>

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy