pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/speechbrain/speechbrain/pull/2550

s" /> Add recipe for audio/speech LLM (ltu-as with llama3) by BenoitWang · Pull Request #2550 · speechbrain/speechbrain · GitHub
Skip to content

Add recipe for audio/speech LLM (ltu-as with llama3)#2550

Open
BenoitWang wants to merge 26 commits intospeechbrain:developfrom
BenoitWang:speech_llm
Open

Add recipe for audio/speech LLM (ltu-as with llama3)#2550
BenoitWang wants to merge 26 commits intospeechbrain:developfrom
BenoitWang:speech_llm

Conversation

@BenoitWang
Copy link
Collaborator

@BenoitWang BenoitWang commented May 16, 2024

Hi @mravanelli, here's the ltu-as PR as discussed. I am collecting several new datasets and will start a new round of training but this may take time, so meanwhile I start this PR and carry on little by little. @poonehmousavi you are welcome to review the PR as well 😊.

What does this PR do?

  1. Add a recipe for training the LTU-AS model (an LLM that jointly understands audio and speech).
  2. Slight modifs to the LinearWarmupScheduler class.
  3. Adapt the multiwoz llama2 recipe to the latest changes.

To be done

  • For now, the model is trained with only half the data than in the paper. Though access to certain datasets seems limited, a new training round needs to be carried out with more datasets being collected.
  • Prepare downloadable json files that facilitate the data preparation stage.
  • Better to add a tiny validation set for stage 1 and 2.
  • An evaluation needs to be implemented at the end of stage 3 and the evaluation data needs to be prepared.
  • Upload training logs and prepare a huggingface interface.
  • Recipe tests.
  • Update results and training details in readme.

@mravanelli
Copy link
Collaborator

Thank you @BenoitWang for this contribution. It looks like some tests are failing. Could you please take a look?

@mravanelli mravanelli self-requested a review June 17, 2024 16:24
@mravanelli mravanelli added the enhancement New feature or request label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy