How to use VitsModel with speaker embedding

20 views Asked by At

I want to do TTS for German, and this code works perfectly:

from transformers import VitsModel, AutoTokenizer
import torch

model = VitsModel.from_pretrained("facebook/mms-tts-deu")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-deu")

text = "some example text in the German, Standard language"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

But I want it to be with my voice. Is there any way to use speaker_embedding with VitsModel?

0

There are 0 answers