pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

gfdb · 2026-02-17T17:46:17Z

What does this PR do?

In the implementation of SpectrogramDrop we get the bounds for the masks by first randomly sampling the mask starting position, i.e. the index of where the mask should begin. The correct way to set the upper bound for sampling this index is to take the length of the dimension we are applying the mask over, let's call it D, and subtract the mask's length from it. This is done so the mask doesn't go out of bounds.

However, in the current implementation, instead of subtracting the mask_len from D, we are accidentally passing it as an argument to max() due to what is almost certainly a typo in the form of an extra comma. Resulting in us sampling from [0, D[ instead of [0, D-mask_len[.

How does it not go out of bounds?

The final mask gets computed as

arange = torch.arange(D, device=spectrogram.device).view(1, 1, -1)
mask = (mask_pos <= arange) * (arange < (mask_pos + mask_len))

Where arange only goes up to D. We are comparing indices, so even if mask_pos + mask_len exceeds D, the arange comparison just truncates it.

Why is this bad?

This is bad because the current implementation allows, for example, the last element in the slice to be sampled as the starting position for the mask i.e. no masking. Essentially, any sample drawn from [D-mask_len, D] will result in a partial mask.

How does this affect augmentation strength in existing recipes?

This bug makes the augmentation weaker. The longer the utterances are, the less it matters, but for extremely short utterances it matters a lot. Existing recipes that had their hparams tuned based on the current implementation may have higher drop_count / drop_length upper and lower bounds. But again, the longer the utterances, the less of a big deal it is.

How to fix?

If we wanted to keep the existing augmentation strength, we could just set the upper bound directly to D instead of doing the max call that always yields D.

If we think the change in strength is nbd, the simplest way to correct the mistake is the current fix I have on the branch (just removing the comma), but this bases the maximum mask starting position on the longest mask we sampled, again not totally correct but probably not a huge deal.

The most correct way to fix it would be to do something to the effect of:

mask_pos = (torch.rand(batch_size, n_masks, 1, device=spectrogram.device) * (D - mask_len))

so that way each gets their own correct maximum mask starting position.

The comma key and it's consequences...

Fixes #<issue_number>

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

fix max mask staring position

052132a

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix spectrogram drop mask staring position#3036

fix spectrogram drop mask staring position#3036
gfdb wants to merge 1 commit intospeechbrain:developfrom
gfdb:fix-spectrogram-drop

gfdb commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Conversation

gfdb commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How does it not go out of bounds?

Why is this bad?

How does this affect augmentation strength in existing recipes?

How to fix?

PR review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

gfdb commented Feb 17, 2026 •

edited

Loading