Vocapia Research SAS
Speaker Diarization System for Lecture Data
Pages
12
Time to read
23 mins
Language
English
Pages
12
Time to read
23 mins
Language
English
This research article presents the LIMSI speaker diarization system specifically designed for lecture data, developed as part of the Rich Transcription 2006 Spring (RT-06S) meeting recognition evaluation. The system builds upon a baseline diarization framework initially created for broadcast news data, which utilizes agglomerative clustering based on the Bayesian information criterion along with advanced speaker identification techniques. The article details the challenges faced in adapting the system for lecture data, particularly the high missed speech error rate observed. A new speech activity detection (SAD) approach based on the log-likelihood ratio was explored to address these issues. The paper outlines the methodologies employed in the system, including feature extraction, initial segmentation, and clustering techniques. Experimental results indicate that the adapted system achieved a diarization error of 20.2% on the RT-06S Multiple Distant Microphone data, illustrating the effectiveness of the modifications made for lecture settings.