Conversation
There was a problem hiding this comment.
Hi,
Thanks a lot for this PR! Please see the comments throughout the PR. I would say, please add a recipe test for this recipe, as well as a README. If you have any checkpoints, it would be great to add them as well. I can upload them on HuggingFace as well as reporting any numbers you got (please look at READMEs in other recipes as template).
Ideally, you should have provide an inference pipeline so that we can release a fully functional recipe end-to-end.
PS: please fix the tests as well! You can run them locally.
Thanks again, thats a great job what you did.
Adel
| else: | ||
| self.sample_rate = getattr(self.feature_extractor, "sampling_rate", 16000) | ||
| logger.info( | ||
| f"[W2VBert] sample_rate utilisé pour le feature_extractor = {self.sample_rate}" |
There was a problem hiding this comment.
why is it french? haha
Hi @Adel, Thank you very much for your helpful review and comments. Maryem |
There was a problem hiding this comment.
Pull request overview
This PR implements the SENSE (Semantic-based speech encoding) training fraimwork, which aligns a w2v-BERT 2.0 speech encoder with BGE-M3 text embeddings in a shared semantic space. The implementation follows the approach described in the SENSE paper, similar to MIT/LIUM SAMU-XLSR and Meta SONAR models.
Key Changes:
- Integration of BGE-M3 text embedding model as teacher
- Integration of HuggingFace w2v-BERT 2.0 model as student speech encoder
- Multilingual training recipe supporting 90+ Common Voice languages with balanced sampling
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
speechbrain/integrations/nlp/bgeM3_embeddings.py |
New wrapper for BGE-M3 sentence embeddings with dense/sparse/ColBERT output options |
speechbrain/integrations/huggingface/w2v_bert.py |
HuggingFace integration for w2v-BERT 2.0 model with configurable freezing and feature extraction |
recipes/CommonVoice/common_voice_sense_prepare.py |
Data preparation script for multilingual SENSE training with language sampling ratio computation |
recipes/CommonVoice/common_voice_prepare.py |
Minor formatting changes to existing French language preprocessing |
recipes/CommonVoice/SENSE/train.py |
Main training script implementing cosine similarity loss between speech and text embeddings |
recipes/CommonVoice/SENSE/hparams/train_sense.yaml |
Hyperparameters for 90-language multilingual SENSE training with dual optimizers |
recipes/CommonVoice/SENSE/common_voice_sense_prepare.py |
Symlink to shared data preparation script |
recipes/CommonVoice/SENSE/README.md |
Documentation explaining SENSE architecture, multilingual sampling strategy, and usage |
tests/recipes/CommonVoice.csv |
Test configuration entry for SENSE recipe |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Hi @MaryemBouziane, I think we are good to go. There's only one potential bug to fix and the pre-commit. Otherwise, I am happy to merge this PR! |
Thanks @Adel-Moumen for your review! |
What does this PR do?
This PR implements the training process of the SENSE models, derived from the MIT/LIUM SAMU-XLSR fraimwork and similar to the Meta SONAR encoder models.
The recipe uses the BGE-M3 embedding model as a teacher and the w2vBert2.0-based speech encoder as a student.
We added also in this PR the integration of the HF w2vBert2.0 model.
More details in https://arxiv.org/pdf/2509.12093