Multi-Window Multi-Head Attention implementation for ASR transformer#2675
Multi-Window Multi-Head Attention implementation for ASR transformer#2675NikolaiKyhne wants to merge 27 commits intospeechbrain:developfrom
Conversation
|
Hey guys! Hope you are doing great --- this is a very nice PR! I just turned this PR draft for now, please turn it public when you think it will be ready to be reviewed. You can ping me as well so that I can have a closer look as soon as possible :) Thanks for your contribution :) Best, |
|
Hey @Adel-Moumen! Thanks for your comment, we have now finished the draft and turned it ready for review :) Best, |
| ## Transformer | ||
| | Language | CV version | hyperparams file | LM | Val. CER | Val. WER | Test CER | Test WER | Hugging Face link | Model link | GPUs | | ||
| | ------------- |:-------------:|:---------------------------:| -----:| -----:| -----:| -----:| -----:|:-----------:| :-----------:| :-----------:| | ||
| | English | 16.1 | mwmha_transformer_large.yaml | No | 4.72 | 10.97 | 6.68 | 13.69 | - | [model](https://1drv.ms/f/c/039f8ffe91e06416/Et7KEbSlWNdJhkjLIi7_vGQBMVhGwRRBzCSljh6aA4sJSw?e=dXeuiY) | 1xL40 48GB | |
There was a problem hiding this comment.
Why is the Val WER so high? I think you swapped CER and WER right ?
There was a problem hiding this comment.
No that's right I just double checked, it is the same for Conformer English on CV 16.1 :)
There was a problem hiding this comment.
@Adel-Moumen Val WER for MWMHA (10.97) follows the same trend and is quite close to that of the Conformer model (10.48) and is reported correctly, CER and WER are not swapped.
|
We've been waiting for a review for some time now. Any chance you can take a look at it soon? :) |
Added Multi-Window Multi-Head attention (MWMHA) module for Transformer ASR (https://openreview.net/forum?id=Q53QLftNkA).
In general, this contribution adds: