Debian-Accessibility - Software

Speech Synthesis and related APIs

EFlite

A speech server for Emacspeak and yasr (or other screen readers) that allows them to interface with Festival Lite, a free text-to-speech engine developed at the CMU Speech Center as an off-shoot of Festival.

Due to limitations inherited from its backend, EFlite does only provide support for the English language at the moment.

eSpeak

eSpeak is a software speech synthesizer for English, and some other languages.

eSpeak produces good quality English speech. It uses a different synthesis method from other open source text to speech (TTS) engines (no concatenative speech synthesis, therefore it also has a very small footprint), and sounds quite different. It's perhaps not as natural or smooth, but some find the articulation clearer and easier to listen to for long periods.

It can run as a command line program to speak text from a file or from stdin. It also works well as a Talker with the KDE text to speech system (KTTS), as an alternative to Festival for example. As such, it can speak text which has been selected into the clipboard, or directly from the Konqueror browser or the Kate editor.

eSpeak can also be used with GNOME-speech and Speech Dispatcher.

Festival Lite

A small fast run-time speech synthesis engine. It is the latest addition to the suite of free software synthesis tools including University of Edinburgh's Festival Speech Synthesis System and Carnegie Mellon University's FestVox project, tools, scripts and documentation for building synthetic voices. However, flite itself does not require either of these systems to run.

It currently only supports the English language.

Festival

A general multi-lingual speech synthesis system developed at the CSTR [Centre for Speech Technology Research] of University of Edinburgh.

Festival offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques. It is written in C++ with a Scheme-based command interpreter for general control.

Besides research into speech synthesis, festival is useful as a stand-alone speech synthesis program. It is capable of producing clearly understandable speech from text.

recite

Recite is a program to do speech synthesis. The quality of sound produced is not terribly good, but it should be adequate for reporting the occasional error message verbally.

Given some English text, recite will convert it to a series of phonemes, then convert the phonemes to a sequence of vocal tract parameters, and then synthesis the sound a vocal tract would make to say the sentence. Recite can perform a subset of these operations, so it can be used to convert text into phonemes, or to produce an utterance based on vocal tract parameters computed by another program.

Speech Dispatcher

Provides a device independent layer for speech synthesis. It supports various software and hardware speech synthesizers as backends and provides a generic layer for synthesizing speech and playing back PCM data via those different backends to applications.

Various high level concepts like enqueueing vs. interrupting speech and application specific user configurations are implemented in a device independent way, therefore freeing the application programmer from having to yet again reinvent the wheel.

Internationalised Speech Synthesis

All the currently available free solutions for software based speech synthesis seem to share one common deficiency: They are mostly limited to English, providing only very marginal support for other languages, or in most cases none at all. Among all the free software speech synthesizers for Linux, only CMU Festival supports more than one natural language. CMU Festival can synthesize English, Spanish and Welsh. German is not supported. French is not supported. Russian is not supported. When internationalization and localization are the trends in software and web services, is it reasonable to require blind people interested in Linux to learn English just to understand their computer's output and to conduct all their correspondence in a foreign tongue?

Unfortunately, speech synthesis is not really Jane Hacker's favourite homebrew project. Creating an intelligible software speech synthesizer involves time-consuming tasks. Concatenative speech synthesis requires the careful creation of a phoneme database containing all the possible combinations of sounds for the target language. Rules that determine the transformation of the text representation into individual phonemes also need to be developed and fine-tuned, usually requiring the division of the stream of characters into logical groups such as sentences, phrases and words. Such lexical analysis requires a language-specific lexicon seldom released under a free license.

One of the most promising speech synthesis systems is Mbrola, with phoneme databases for over ten different languages. Unfortunately, the license chosen by the project is very restrictive. Mbrola can only be distributed as a pre-built binary. In addition, the phoneme databases are for non-military and non-commercial use only. We contacted the project developers, but they were unable to change the licensing of their work due to the limitations set by various contributors. Unfortunately, given the restrictive licensing model of Mbrola, it cannot be used as a basis for further work in this direction, at least not in the context of the Debian Operating System.

Without a broadly multi-lingual software speech synthesizer, Linux cannot be accepted by assistive technology providers and people with visual disabilities. What can we do to improve this?

There are basically two approaches possible:

  1. Organize a group of people willing to help in this regard, and try to actively improve the situation. This might get a bit complicated, since a lot of specific knowledge about speech synthesis will be required, which isn't that easy if done via an autodidactic approach. However, this should not discourage you. If you think you can motivate a group of people large enough to achieve some improvements, it would be worthwhile to do.
  2. Obtain funding and hire some institute which already has the know how to create the necessary phoneme databases, lexica and transformation rules. This approach has the advantage that it has a better probability of generating quality results, and it should also achieve some improvements much earlier than the first approach. Of course, the license under which all resulting work would be released should be agreed on in advance, and it should pass the DFSG requirements. The ideal solution would of course be to convince some university to undergo such a project on their own dime, and contribute the results to the Free Software community.

Last but not least, it seems most of the commercially successful speech synthesis products nowadays do no longer use concatenative speech synthesis, mainly because the sound databases consume a lot of diskspace. This is not really desireable for small embedded products, like for instance speech on a mobile phone. Recently released Free software like eSpeak seem to try this approach, which might be very worthwhile to look at.

Screen review extensions for Emacs

Emacspeak

A speech output system that will allow someone who cannot see to work directly on a UNIX system. Once you start Emacs with Emacspeak loaded, you get spoken feedback for everything you do. Your mileage will vary depending on how well you can use Emacs. There is nothing that you cannot do inside Emacs :-). This package includes speech servers written in tcl to support the DECtalk Express and DECtalk MultiVoice speech synthesizers. For other synthesizers, look for separate speech server packages such as Emacspeak-ss or eflite.

speechd-el

Emacs client to speech synthesizers, Braille displays and other alternative output interfaces. It provides full speech and Braille output environment for Emacs. It is aimed primarily at visually impaired users who need non-visual communication with Emacs, but it can be used by anybody who needs sophisticated speech or other kind of alternative output from Emacs.

Console (text-mode) screen readers

BRLTTY

A daemon which provides access to the Linux console for a blind person using a soft braille display. It drives the braille terminal and provides complete screen review functionality.

The following display models are currently (as of version 3.4.1-2) supported:

BRLTTY also provides a client/server based infrastructure for applications wishing to utilize a Braille display. The daemon process listens for incoming TCP/IP connections on a certain port. A shared object library for clients is provided in the package libbrlapi. A static library, header files and documentation is provided in package libbrlapi-dev. This functionality is for instance used by Gnopernicus to provide support for display types which are not yet support by Gnopernicus directly.

Screader

The background program screader reads the screen and puts the information through to a software Text-To-Speech package (Like `festival') or a hardware speech synthesizer.

Yasr

A general-purpose console screen reader for GNU/Linux and other UNIX-like operating systems. The name yasr is an acronym that can stand for either Yet Another Screen Reader or Your All-purpose Screen Reader.

Currently, yasr attempts to support the Speak-out, DEC-talk, BNS, Apollo, and DoubleTalk hardware synthesizers. It is also able to communicate with Emacspeak speech servers and can thus be used with synthesizers not directly supported, such as Festival Lite (via eflite) or FreeTTS.

Yasr works by opening a pseudo-terminal and running a shell, intercepting all input and output. It looks at the escape sequences being sent and maintains a virtual window containing what it believes to be on the screen. It thus does not use any features specific to Linux and can be ported to other UNIX-like operating systems without too much trouble.

Graphical User Interfaces

Accessibility of graphical user interfaces on UNIX platforms has only recently received a significant upswing with the various development efforts around the GNOME Desktop, especially the GNOME Accessibility Project.

GNOME Accessibility Software

Assistive Technology Service Provider Interface

This package contains the core components of GNOME Accessibility. It allows Assistive technology providers like screen readers to query all applications running on the desktop for accessibility related information as well as provides bridging mechanisms to support other toolkits than GTK.

Bindings to the Python language are provided in package python-at-spi.

The ATK accessibility toolkit

ATK is a toolkit providing accessibility interfaces for applications or other toolkits. By implementing these interfaces, those other toolkits or applications can be used with tools such as screen readers, magnifiers, and other alternative input devices.

The runtime part of ATK, needed to run applications built with it is available in package libatk1.0-0. Development files for ATK, needed for compilation of programs or toolkits which use it are provided by package libatk1.0-dev. Ruby language bindings are provided by package libatk1-ruby.

gnome-accessibility-themes

The gnome-accessibility-themes package contains some high accessibility themes for the GNOME desktop environment, designed for the visually impaired.

A total of 7 themes are provided, providing combinations of high, low or inversed contrast, as well as enlarged text and icons.

gnome-speech

The GNOME Speech library gives a simple yet general API for programs to convert text into speech, as well as speech input.

Multiple backends are supported, but currently only the Festival backend is enabled in this package; the other backends require either Java or proprietary software.

Gnopernicus

Gnopernicus is designed to allow users with limited or no vision to access GNOME applications. It provides a number of features, including magnification, focus tracking, braille output, and more.

gnome-orca

Orca is a flexible and extensible screen reader that provides access to the graphical desktop via user-customizable combinations of speech, braille, and/or magnification. Under development by the Sun Microsystems, Inc., Accessibility Program Office since 2004, Orca has been created with early input from and continued engagement with its end users.

Orca can use GNOME-speech (the default) and Speech Dispatcher for delivering speech output to the user. BRLTTY is used for braille display support (and for seamless console and GUI braille review integration).

KDE Accessibility Software

kmag

Magnify a part of the screen just as you would use a lens to magnify a newspaper fine-print or a photograph. This application is useful for a variety of people: from researchers to artists to web-designers to people with low vision.

Non-standard input methods

Dasher

Dasher is an information-efficient text-entry interface, driven by natural continuous pointing gestures. Dasher is a competitive text-entry system wherever a full-size keyboard cannot be used - for example,

The eyetracking version of Dasher allows an experienced user to write text as fast as normal handwriting - 25 words per minute; using a mouse, experienced users can write at 39 words per minute.

Dasher uses a more advanced prediction algorithm than the T9(tm) system often used in mobile phones, making it sensitive to surrounding context.

GOK

GOK [GNOME Onscreen Keyboard] is a dynamic onscreen keyboard for UNIX and UNIX-like operating systems. It features Direct Selection, Dwell Selection, Automatic Scanning and Inverse Scanning access methods and includes word completion.

GOK includes an alphanumeric keyboard and a keyboard for launching applications. Keyboards are specified in XML enabling existing keyboards to be modified and new keyboards to be created. The access methods are also specified in XML providing the ability to modify existing access methods and create new ones.