This plugin enables Praat to search in the orthographic transcriptions of the NCHLT Speech Corpus and open the audio files of corresponding search results.
The NCHLT Speech Corpus (National Centre for Human Language Technologies, Council for Scientific and Industrial Research, South Africa) contains orthographically transcribed broadband speech corpora for all of South Africa’s eleven official languages and must be available on your machine (at least one language) before using this plugin. The corpus can be obtained from the South African Centre for Digital Language Resources (SADiLaR) website.
After launching this plugin, you select one of your installed languages and specify a search pattern (simple pattern or regular expression). XML parsing and search is done inside the Praat script (this is considerably slower but more robust than doing it with Python like before). You can view the results in a table (including orthographic transcription, speaker ID, age, gender, and location) and open corresponding audio files one after the other or specifically for certain items in the results list. It’s also possible now to refine the search results using filters (age, gender, and location).
- Praat 5.4.x or newer
- NCHLT Speech Corpus (at least one language)
- support for new SADiLaR folder naming pattern
- available on Github
- detailed manual added
- Python no longer required
- regular expression search
- filtering of search results
- usability improvements and bug fixes
- initial release