GETTING VIAVOICE SPEECH RECOGNITION WORKING ON A MODERN LINUX DISTRO

This is a brief HOWTO document describing how to get the ViaVoice RPMs IBM produced for Red Hat Linux 6.2 working on a modern Linux distribution. I've chosen to make the results as distribution- and platform-agnostic as possible; even allowing its filesystem location to be changed reasonably easily.

[NOTE: this document is now pretty old, so much of it may no longer work. be warned! --jm 20080221]

PREREQUISITES

- Using ALSA for sound in the Linux kernel is STRONGLY recommended. A few people have noted that this isn't strictly the case, but in my experience I haven't had much luck with the OSS drivers and ViaVoice, so I've always used this. If you're using a 2.6.x kernel, you've got this anyway.

- Get a decent microphone headset. Speech recognition will basically NOT WORK AT ALL without one.

INSTALLATION

- Download these files:

- the IBM JRE:

http://jmason.org/software/xvoice/vv-jre.tar.gz

- the scripts and directory framework that ViaVoice will be put into:

http://jmason.org/software/xvoice/vv-home.tar.gz

- some older versions of the system shared libraries that ViaVoice requires:

http://jmason.org/software/xvoice/vv-usrlib.tar.gz

- A compiled version of xvoice (this is optional!)

http://jmason.org/software/xvoice/vv-xvoice.tar.gz

- Extract at least 'vv-jre.tar.gz', 'vv-usrlib.tar.gz', and 'vv-home.tar.gz'.

      tar xvfz vv-jre.tar.gz
      tar xvfz vv-home.tar.gz
      tar xvfz vv-usrlib.tar.gz

- mv the extracted "ViaVoice" directory somewhere like "/usr/local/ViaVoice".

      mv ViaVoice /usr/local/ViaVoice

- Create a 1-line config file pointing to that location, and source it:

      echo "VV_HOME=/usr/local/ViaVoice" > /etc/viavoice.conf
      . /etc/viavoice.conf

All the tools use this file to find the apps and libs at runtime.

- get the RPM: "ViaVoice_runtime-3.0-1.2.i386.rpm" is the one I used. This step is up to you

- extract using "rpm2cpio" and "cpio" like so:

      cd $VV_HOME
      rpm2cpio ViaVoice_runtime-3.0-1.2.i386.rpm | cpio -ivd

This will create a "usr" directory in the current dir. Now do:

      mv usr/lib/*.so* lib/usr-lib
      mv usr/lib/ViaVoice lib/usr-lib-ViaVoice
      rm -rf usr

to move the required parts into place. The end result should look something like what's listed in http://jmason.org/software/xvoice/tree.txt .

- create a symbolic link in /usr/lib/ViaVoice. The libs seem to assume it's there and will core-dump otherwise.

      ln -s $VV_HOME/lib/usr-lib-ViaVoice /usr/lib/ViaVoice

- create symbolic links to the bin scripts, in a directory on your path:

      ln -s $VV_HOME/bin/* /usr/local/bin

TRAINING

- Check your mixer settings!

          alsamixergui

Ensure the Mic is enabled, Mic Boost is on (if you have it), and that the volumes are around the 75% mark.

You have to do this by hand -- the "vvstartaudiosetup" tool no longer seems to be capable of calibrating the Mic volume itself. (It was never very good at it anyway.)

attachment:mixer.png

- Move aside any existing training in ~/viavoice.

(You may need to do this. I did -- my previous ViaVoice settings would no longer work on a new machine! If the 'vvstartuserguru' tool core-dumps when you try to run it, then you need to do this.)

          mv ~/viavoice ~/viavoice.OLD

- start the Enrollment Guru:

          vvstartenrollment

- It'll output:

  Please enter the ViaVoice UserName [default: ViaVoice User1]

- Type a username and hit return. It'll carry on with:

  Please wait, creating user [USERNAME]
  Starting ViaVoice Enrollment
  Java must be installed for ViaVoice Enrollment to work.
  Please wait while Java initializes...

The window should appear.

- Then hit 'Next >' on the initial page.

attachment:guru-initial.png

- Choose the first Story and hit 'Next >'.

- Hit 'Start' and try reading the text. You may need to tweak the mixer settings to ensure that:

(a) nothing shows in the "Audio Level" meter when you're silent (b) it enters the green zone without hitting red when you're speaking

attachment:guru-starting.png

- One problem -- there's a bug! As you train the Guru, and it pages through the story, the Audio Level meter expands to fill the window. (This seems to be a bug in the Guru on modern Java JREs.)

attachment:guru-broken.png

To fix it, hit "Cancel" and exit out of the program; don't worry, your training won't be lost. Just wait for the command line to return, then re-run the guru with:

          vvstartenrollment

You may have to run this twice, since it seems to have a habit of leaving the mic busy for a few seconds after the first exit, but it'll work fine second time around.

Then choose the story you were just reading, and it'll let you carry on from the same page.

attachment:guru-choose.png

- Once it's done, it'll process your voice -- on modern machines that takes about a minute.

attachment:guru-finishing.png

- I recommend reading at least a couple of the shorter training "stories" before using xvoice.

USING THE PREBUILT XVOICE TARBALL

- The xvoice script in "bin" expects to find the xvoice binary in "$VV_HOME/lib/xvoice/xvoice".

- You may need to copy some xvoice support files to /usr/local for it to work:

      mkdir -p /usr/local/share/xvoice
      mkdir -p /usr/local/share/pixmaps
      mv lib/xvoice/vimcmds /usr/local/share/xvoice/vimcmds
      mv lib/xvoice/xvoice.xml /usr/local/share/xvoice/xvoice.xml
      mv lib/xvoice/xvoice.png /usr/local/share/pixmaps/xvoice.png

OTHER USEFUL STUFF

- My sawfishrc is available at http://jmason.org/software/xvoice/ . It contains two interesting sets of functions:

"corner.jl" allows windows to be moved into the screen corners with one keypress or voice command. Very useful if you segment your desktop into 2 or 4 logical areas and want a quick, mouse-free way to move windows. "wclass.jl" allows applications to be controlled by name, instead of requiring a user click on their windows. Each app has a pair of WCLASS properties to identify their windows. These functions allow a user to deiconify, raise, or iconify windows by the application name.

- My xvoice.xml is available at http://jmason.org/software/xvoice/ if you're interested. It's mostly default.