
42dot
VoxtLM: Unified Decoder-Only Models for Speech Tasks
Pages
5
Time to read
25 mins
Publication
Language
English

Pages
5
Time to read
25 mins
Publication
Language
English
This research article presents VoxtLM, a novel decoder-only language model designed to perform multiple speech-related tasks, including speech recognition, synthesis, text generation, and speech continuation. By integrating text vocabulary with discrete speech tokens, VoxtLM demonstrates significant improvements in performance over traditional single-task models, particularly in speech synthesis. The model is trained on publicly available data, ensuring reproducibility and accessibility for furt