Archív kategorií: projekty

Projekt Jasper – rozprávanie

  • default: espeak – nízka kvalita rozprávania, veľmi robotická (GUI tool gespeak)
  • festival ( )- vyššia kvalita. Dokonca má aj pluginy na veľmi kvalitné rozprávanie po anglicky, viď =
  • mbrola ( ) : neviem ako použiť, ale vcelku kvalitné podľa web samplov.

Big comparison of speach systems:


Research on using Slovak language in Jasper:
CMUSphinx – Open Source Toolkit For Speech Recognition – Project by Carnegie Mellon University
VoxForge was set up to collect transcribed speech for use in Open Source Speech Recognition Engines („SRE“s) such as such as ISIP, HTK, Julius and Sphinx.  We will categorize and make available all submitted audio files (also called a ‚Speech Corpus“) and Acoustic Models in GPL format.
Olympus is a complete framework for implementing spoken dialog systems. It was created at Carnegie Mellon University (CMU) during the late 2000’s and benefits from ongoing improvements in functionality. It’s main purpose is to help researchers interested in conversational agents to implement and test their ideas on complete systems, without having to build them on their own. To this end, Olympus incorporates the Ravenclaw dialog manager, which supports mixed-initiative interaction, as well as components that handle speech recognition, understanding, generation and synthesis. Olympus uses a Galaxy message passing layer to integrate its components ans supports multi-modal interaction. The Olympus/Ravenclaw distribution includes several example systems that demonstrate the operations of its various features.

The Olympus architecture incorporates modules developed by researchers at Carnegie Mellon and by others, in previous and ongoing research projects. These include:

Dialogue management is handled by RavenClaw , a task-independent dialogue engine based on the AGENDA dialog manager first introduced as part of the CMU Communicator system.
Low-level interaction management (e.g. exact timing of start and end of utterances, handling of interruptions, etc) is performed by the Apollo interaction manager .
For speech recognition, Olympus currently supports engines from the CMU Sphinx family (Sphinx 2, Sphinx 3, PocketSphinx), and provides an interface for support for other engines.
Natural language understanding is done by Phoenix , a robust parser based on CFG-like grammars.
The Helios components integrates information from various levels and assigns a confidence measure to all user inputs.
Natural language generation uses the Rosetta template-based generation system.
Kalliope, the synthesis interface, currently allows the use of SAPI 5-compliant TTS engines, CMU’s Flite, and the proprietary Cepstral Swift engine.
The communication between the different modules is handled by the MIT/MITRE Galaxy Communicator architecture.
Rosetta is a language generation system originally developed for the CMU Communicator system by Kevin Lenzo. Rosetta is an active template system written in Perl; it separates generic processing (such as interfaces to the dialog system) from domain-specific template processing.