XTTS-v2 Text to Speech Installation guide

This is a synthetic guide to my experiments with TTS models using XTTS-v2. The other guides of TTS are: Bark Guide, Cloning anyone voice tutorial, Create a million dollar business with this simple pipeline.

After testing both models, this model produce much better quality

for example:

This was done with Bark, and the video below with XTTS-v2

https://www.youtube.com/playlist?list=PLEST2cB76tXvU206gFPeIaCEbHMzTuk7q

Guide

1

git clone https://huggingface.co/coqui/XTTS-v2

2

 python -m venv .env

3

Activate virtual environment (my guide to virtual environments)

# Activate the virtual environment# On Windows:.\.env\Scripts\activate# On Linux/macOS:source .env/bin/activate

4

install requirements.txt

pip install -r requirements.txt

create a requirements.txt with the following dependencies

absl-py==2.1.0aiohappyeyeballs==2.4.3aiohttp==3.11.2aiosignal==1.3.1annotated-types==0.7.0anyascii==0.3.2asttokens==2.4.1attrs==24.2.0audioread==3.0.1babel==2.16.0bangla==0.0.2blinker==1.9.0blis==0.7.11bnnumerizer==0.0.2bnunicodenormalizer==0.1.7catalogue==2.0.10certifi==2024.8.30cffi==1.17.1charset-normalizer==3.4.0click==8.1.7cloudpathlib==0.20.0colorama==0.4.6confection==0.1.5contourpy==1.3.1coqpit==0.0.17cycler==0.12.1cymem==2.0.8Cython==3.0.11dateparser==1.1.8decorator==5.1.1docopt==0.6.2einops==0.8.0encodec==0.1.1executing==2.1.0filelock==3.13.1Flask==3.1.0fonttools==4.55.0frozenlist==1.5.0fsspec==2024.2.0g2pkk==0.1.2grpcio==1.68.0gruut==2.2.3gruut-ipa==0.13.0gruut_lang_de==2.0.1gruut_lang_en==2.0.1gruut_lang_es==2.0.1gruut_lang_fr==2.0.2hangul-romanize==0.1.0huggingface-hub==0.26.2idna==3.10inflect==7.4.0ipython==8.29.0itsdangerous==2.2.0jamo==0.4.1jedi==0.19.2jieba==0.42.1Jinja2==3.1.3joblib==1.4.2jsonlines==1.2.0kiwisolver==1.4.7langcodes==3.4.1language_data==1.2.0lazy_loader==0.4librosa==0.10.2.post1llvmlite==0.43.0marisa-trie==1.2.1Markdown==3.7markdown-it-py==3.0.0MarkupSafe==2.1.5matplotlib==3.9.2matplotlib-inline==0.1.7mdurl==0.1.2more-itertools==10.5.0mpmath==1.3.0msgpack==1.1.0multidict==6.1.0murmurhash==1.0.10networkx==2.8.8nltk==3.9.1num2words==0.5.13numba==0.60.0numpy==1.26.3packaging==24.2pandas==1.5.3parso==0.8.4pillow==10.2.0platformdirs==4.3.6pooch==1.8.2preshed==3.0.9prompt_toolkit==3.0.48propcache==0.2.0protobuf==5.28.3psutil==6.1.0pure_eval==0.2.3pycparser==2.22pydantic==2.9.2pydantic_core==2.23.4Pygments==2.18.0pynndescent==0.5.13pyparsing==3.2.0pypinyin==0.53.0pysbd==0.3.4python-crfsuite==0.9.11python-dateutil==2.9.0.post0pytz==2024.2PyYAML==6.0.2regex==2024.11.6requests==2.32.3rich==13.9.4safetensors==0.4.5scikit-learn==1.5.2scipy==1.14.1shellingham==1.5.4six==1.16.0smart-open==7.0.5soundfile==0.12.1soxr==0.5.0.post1spacy==3.7.5spacy-legacy==3.0.12spacy-loggers==1.0.5srsly==2.4.8stack-data==0.6.3SudachiDict-core==20241021SudachiPy==0.6.8sympy==1.13.1tensorboard==2.18.0tensorboard-data-server==0.7.2thinc==8.2.5threadpoolctl==3.5.0tokenizers==0.20.3torch==2.5.1+cu124torchaudio==2.5.1+cu124torchvision==0.20.1+cu124tqdm==4.67.0trainer==0.0.36traitlets==5.14.3transformers==4.46.2typeguard==4.4.1typer==0.13.0typing_extensions==4.12.2tzdata==2024.2tzlocal==5.2umap-learn==0.5.7Unidecode==1.3.8urllib3==2.2.3wasabi==1.1.3wcwidth==0.2.13weasel==0.4.1Werkzeug==3.1.3wrapt==1.16.0yarl==1.17.1

5

The installation can take up to 40 min, so don’t worry if is super slow

pip install TTS==0.22.0 --no-deps

You are going to read a lot of this stuffs and that’s okay, in the beginning i thought it was broken but i went for a walk and back and it was fine.

Collecting transformers>=4.33.0 (from TTS)  Obtaining dependency information for transformers>=4.33.0 from https://files.pythonhosted.org/packages/75/d5/294a09a62bdd88da9a1007a341d4f8fbfc43be520c101e6afb526000e9f4/transformers-4.46.1-py3-none-any.whl.metadata  Using cached transformers-4.46.1-py3-none-any.whl.metadata (44 kB)  Obtaining dependency information for transformers>=4.33.0 from https://files.pythonhosted.org/packages/f9/9d/030cc1b3e88172967e22ee1d012e0d5e0384eb70d2a098d1669d549aea29/transformers-4.45.2-py3-none-any.whl.metadata  Using cached transformers-4.45.2-py3-none-any.whl.metadata (44 kB)

6

Visit this page and install pytorch

https://pytorch.org/get-started/locally/

Resources:

Stackademic 🎓

Thank you for reading until the end. Before you go:

Follow us X | LinkedIn | YouTube | Discord | Newsletter | Podcast
Create a free AI-powered blog on Differ.
More content at Stackademic.com

XTTS-v2 Text to Speech Installation guide

Follow me

XTTS VS Bark

Guide

1

2

3

4

5

6

Resources:

Stackademic 🎓