Voice Recognition + Content Extraction + TTS = Innovative Web Browsing

An Interesting Opportunity

Voice recognition and text to speech technologies make a good combo for future user interfaces. Imagine browsing the web by voice and listening to blogs like you listen to podcasts. Your mobile phone will be able to provide these features in the near future. Voice web browsing is not only useful for the visually impaired, but also for users who wish to do other things while surfing the web. Voice recognition and text to speech feature prominently on the iPhone 4S, but voice web browsing is not incorporated.

This blog post includes code which allows you to extract the main text content from a web page and convert it to a playable audio file. The process is triggered by simple voice commands such as “receive hacker news”, “read article 7” or “save article 19”. The resulting audio file can also be synchronized to other services such as DropBox, Amazon Cloud Drive, and Apple iCloud to be played later. We use IKVM to allow us to run the boilerpipe library, which is written in Java, over .NET. We choose .NET because it comes with “batteries included”: ready to use voice recognition and text to speech capabilities.

The present article demonstrates the application of main text content extraction. For methods of MTCE see our article: Extraction of Main Text Content Using the Google Reader NoAPI. For further exploration of voice recognition and text to speech .NET capabilities consult the following links:

Good voice recognition and text to speech systems are expensive and require training. Companies which provide these services do not offer a more granular way to access a web service or run a local engine. For example, AT&T Natural Voice for TTS sounds good, but their licensing terms are prohibitive for small companies and startups. On the voice recognition side of the equation, Nuance has been accumulating patents to strengthen their market position, making it difficult for others to compete. Although many other companies offer voice recognition, most of them actually use Nuance’s technology. See for example: Siri, Do You Use Nuance Technology? Siri: I’m Sorry, I Can’t Answer That. Voice recognition systems require training to improve their accuracy. SpinVox, a Nuance subsidiary, used “conversion experts”. They built teams which listened to audio messages and manually converted them to text. If you want to use this approach, you’ll need to wait for Amazon’s Mechanical Turk to offer micro jobs in real time.

What is missing here? None of the leading companies offer good quality voice recognition on a charge per use basis. Google seems to be actively researching voice recognition, and has achieved impressive results with “experimental” speech recognition technology. Sadly, Google’s voice recognition and text to speech APIs can not be used to develop all desktop and server applications. Their use is restricted to Android phones, Chrome’s beta html5 support, and Chrome’s extensions. It would be nice for Google to remove this restriction and include this service on their web APIs Console.

Our code demonstrates that it is possible to use voice recognition and text to speech while avoiding the licensing, patent and API conundrum.

Using VR and TTS under .NET

Prerequisites

If you use our VoiceWebBrowsing code, also available on GitHub, you will just need Microsoft Visual Studio 2010. However if there is a new version of boilerpipe you will have to generate the boilerpipe .NET assemblies yourself as follows:

  1. Have Microsoft Visual Studio 2010
  2. Download boilerpipe from http://code.google.com/p/boilerpipe/
  3. Download and install IKVM from http://www.ikvm.net/
  4. Run boilerpipe library and dependencies through ikvmc: ikvmc -nojni -target:library  boilerpipe-1.2.0.jar lib\nekohtml-1.9.13.jar lib\xerces-2.9.1.jar
  5. Use the resulting boilerpipe-1.2.0.dll .NET assembly from ikvmc.

Code

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Speech.AudioFormat;
using System.Net;
using System.IO;
using System.Xml;

namespace VoiceWebBrowsing
{
    public partial class MainForm : Form
    {
        #region Constants
        //const string _bogusRSSFeed = "<items><item><title>First title</title><link>http://</link></item><item><title>Second title</title><link>http://</link></item></items>";
        const string _bogusRSSFeed = null;
        const string downloadPath = @"..\..\..\..\Download";
        #endregion

        #region Private Variables
        SpeechRecognizer _speechRecognizer = new SpeechRecognizer();
        SpeechSynthesizer _ttsVoice = new SpeechSynthesizer();
        SpeechAudioFormatInfo formatInfo = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
        Queue<string> _queue = new Queue<string>();
        List<string> articleList = new List<string>();
        HashSet<SpeechSynthesizer> tts2FileTasks = new HashSet<SpeechSynthesizer>();
        #endregion

        #region Private Methods
        private void InitGrammar()
        {
            GrammarBuilder readGrammar = new Choices(new string[] { "read article" });
            Choices articleChoice = new Choices();
            for (int i = 1; i <= 30; i++)
            {
                articleChoice.Add(i.ToString());
            }
            readGrammar.Append(articleChoice);

            GrammarBuilder saveGrammar = new Choices(new string[] { "save article" });
            saveGrammar.Append(articleChoice);

            GrammarBuilder otherGrammar = new Choices(new string[] { "receive hacker news", "stop", "test" });

            //GrammarBuilder commands = new Choices(new string[] { "receive hacker news", "stop", "test" });
            Choices commands = new Choices();
            commands.Add(new Choices(new GrammarBuilder[] { readGrammar, saveGrammar, otherGrammar }));

            var grammar = new Grammar(commands);
            this._speechRecognizer.LoadGrammar(grammar);
        }

        private void Say(string text)
        {
            this._ttsVoice.SpeakAsync(text);
        }

        private void ReadHackerNewsFeed()
        {
            string hackerNewsRSSUrl = "http://news.ycombinator.com/rss";

            using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
            {
                //
                // an app.config is added to surpress: The server committed a protocol violation. Section=ResponseStatusLine
                //
                string rssXmlStr = null;
                if (_bogusRSSFeed == null)
                    rssXmlStr = client.DownloadString(hackerNewsRSSUrl);
                else
                    rssXmlStr = _bogusRSSFeed;
                XmlDocument xmlDoc = new XmlDocument();
                xmlDoc.LoadXml(rssXmlStr);

                XmlNodeList items = xmlDoc.SelectNodes("//item");

                int counter = 1;
                articleList.Clear();
                foreach (XmlNode item in items)
                {
                    string title = item.SelectSingleNode("title").InnerText;
                    string link = item.SelectSingleNode("link").InnerText;
                    articleList.Add(link);
                    Say("article " + counter.ToString() + " " + title);
                    System.Diagnostics.Debug.WriteLine(title);

                    counter++;
                }
            }
        }

        private void ReceiveHackerNewsButton_Click(object sender, EventArgs e)
        {
            ReceiveHackerNewsCommand();
        }

        private void SaveArticle(string link, string article)
        {
            SpeechSynthesizer tts2File = new SpeechSynthesizer();
            tts2File.SpeakStarted += new EventHandler<SpeakStartedEventArgs>(tts2File_SpeakStarted);
            tts2File.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(tts2File_SpeakCompleted);
            System.Security.Cryptography.SHA1Managed hashAlgorithm = new System.Security.Cryptography.SHA1Managed();
            hashAlgorithm.Initialize();
            byte[] buffer = Encoding.UTF8.GetBytes(link);
            byte[] hash = hashAlgorithm.ComputeHash(buffer);
            string fileName = BitConverter.ToString(hash).Replace("-", string.Empty) + ".wav";
            string executionPath = System.Reflection.Assembly.GetExecutingAssembly().Location;
            string fullPath = Path.Combine(executionPath, downloadPath, fileName);
            hashAlgorithm.Clear();
            tts2File.SetOutputToWaveFile(fullPath, formatInfo);
            tts2File.SpeakAsync(article);
            this.tts2FileTasks.Add(tts2File);
        }

        #endregion

        #region Constructor
        public MainForm()
        {
            InitializeComponent();
            _speechRecognizer.Enabled = true;
            _speechRecognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(_speechRecognizer_SpeechRecognized);
        }
        #endregion

        #region Events
        void _ttsVoice_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
        {
            this._ttsVoice.SetOutputToNull(); // Needed for flushing file buffers.
        }

        void _speechRecognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            string command = e.Result.Text;
            CommandTextBox.Text = command;

            if (command == "test")
            {
                return;
            }

            if (command == "stop")
            {
                StopCommand();

                return;
            }

            if (command == "receive hacker news")
            {
                ReceiveHackerNewsCommand();

                return;
            }

            if (command.Contains("read article"))
            {
                string[] words = command.Split(' ');

                ReadArticleCommand(Decimal.Parse(words[2]));

                return;
            }

            if (command.Contains("save article"))
            {
                string[] words = command.Split(' ');

                SaveArticleCommand(Decimal.Parse(words[2]));

                return;
            }
        }

        private void MainForm_Load(object sender, EventArgs e)
        {
            InitGrammar();
            BackgroundWorker.RunWorkerAsync();
        }

        void tts2File_SpeakStarted(object sender, SpeakStartedEventArgs e)
        {
        }

        void tts2File_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
        {
            SpeechSynthesizer tts2File = (SpeechSynthesizer)sender;

            tts2File.SetOutputToNull();
            this.tts2FileTasks.Remove(tts2File);
        }

        private void StopButton_Click(object sender, EventArgs e)
        {
            StopCommand();
        }
        private void ReadArticleButton_Click(object sender, EventArgs e)
        {
            ReadArticleCommand(ArticleNumberUpDown.Value);
        }

        private void SaveArticleButton_Click(object sender, EventArgs e)
        {
            SaveArticleCommand(ArticleNumberUpDown.Value);
        }

        private void BackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
        {
            while (true)
            {
                System.Threading.Thread.Sleep(125);

                lock(this)
                {
                    if(this._queue.Count > 0)
                    {
                        string cmd = this._queue.Dequeue();

                        if (cmd != null)
                        {
                            System.Uri uri = new Uri(cmd);
                            if (uri.Scheme == "voicewebbrowsing")
                            {
                                if (uri.Host == "receivehackernews")
                                {
                                    ReadHackerNewsFeed();
                                }
                                else if (uri.Host == "stop")
                                {
                                    this._ttsVoice.SpeakAsyncCancelAll();
                                }
                                else if (uri.Host == "readarticle" || uri.Host == "savearticle")
                                {
                                    string articleNumberStr = System.IO.Path.GetFileName(uri.AbsolutePath);
                                    int articleNumber = int.Parse(articleNumberStr);

                                    if (articleNumber > articleList.Count)
                                    {
                                        Say("please retrieve hacker news articles first");
                                    }
                                    else
                                    {
                                        articleNumber--; // 0-based index
                                        string link = articleList[articleNumber];

                                        java.net.URL url = new java.net.URL(link);
                                        string article = de.l3s.boilerpipe.extractors.ArticleExtractor.INSTANCE.getText(url);
                                        if (uri.Host == "readarticle")
                                        {
                                            Say(article);
                                        }
                                        else if (uri.Host == "savearticle")
                                        {
                                            SaveArticle(link, article);
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
        #endregion

        #region Commands
        private void ReceiveHackerNewsCommand()
        {
            StopCommand();
            lock (this)
            {
                this._queue.Enqueue("voicewebbrowsing://receivehackernews");
            }
        }
        private void StopCommand()
        {
            lock (this)
            {
                this._queue.Enqueue("voicewebbrowsing://stop");
            }
        }

        private void ReadArticleCommand(Decimal article)
        {
            StopCommand();
            lock (this)
            {
                this._queue.Enqueue(String.Format("voicewebbrowsing://readarticle/{0}", article.ToString()));
            }
        }

        private void SaveArticleCommand(decimal article)
        {
            lock (this)
            {
                this._queue.Enqueue(String.Format("voicewebbrowsing://savearticle/{0}", article.ToString()));
            }
        }
        #endregion
    }
}

See Also

  1. Extraction of Main Text Content Using the Google Reader NoAPI

Where can you go from here?

  1. You can write a continuous Hacker News front page reader which constantly checks the feed and reads you new titles.
  2. You can write a voice oriented mobile web browser for Windows Phone, Google Android, and for Google Chrome using their APIs. To write a mobile browser for iPhones, you will have to wait for iOS Siri API.
  3. You can write a cloud service to provide a web page to a podcast converter service, store the audio file in the cloud, and automatically convert Google Reader’s starred items.
  4. Finally, you can research how to improve TTS and VR technologies.

Resources

  1. Dragon Speech Recognition Software
  2. Publications by Googlers in Speech Processing
  3. Patent case seeks to silence Nuance voice recognition
  4. Nuance Loses First Patent Fight with Vlingo, Others to Follow
  5. eSpeak: Open Source Text to Speech
  6. CMU Sphinx: Speech Recognition Open Source Toolkit
  7. SAM:  The First Commercial voice synthesis program for Commodore 64, Apple and Atari computers.
  8. Siri
  9. Microsoft Tellme
  10. The Mobile Challenge: My Personal Rants
  11. List of speech recognition software
  12. What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?
  13. Siri for everyone, with Pioneer’s Zypr API
  14. Reverse Engineering and Cracking Apple Siri with SiriProxy

Automated Browserless OAuth Authentication for Twitter

Introduction

My first impression after having my first encounter with the OAuth protocol was: bureaucracy meets the web. It’s understandable that in order to authorize third party applications users must approve  access to their own information, but if I want to access my personal information under my own application why do I need to complete all this “paperwork”?

Also, user experience suffers when you have to jump to the browser and return to your application as part of the workflow. Mobile and desktop apps need more alternatives to work around that. Twitter offers the xAuth API for desktop and mobile applications but you have to send a request with “plenty of details” and may have to wait a long time to get it.

This article describes how to use the OAuth 3-legged protocol with a headless browser like HtmlUnit to get tokens from twitter without user intervention.

The example uses HtmlUnit and Jython. If you want to use HtmlUnit under .NET I recommend looking at Using HtmlUnit on .NET for Headless Browser Automation (using IKVM). WP7 developers may also want to look at the .NET article to see if it could be applied to Silverlight.

Once you obtain the token you can keep it to use in future calls. Be aware that tokens may expire based on conditions such as time. Ethically, the automated application should ask users to either allow or deny applications access to twitter.

Prerequisites

  1. JRE or JDK
  2. Download and Install the latest Jython version. Run the .jar and install it in your preferred directory (e.g: /opt/jython).
  3. Download and decompress setuptools-0.6c11.tar.gz
  4. Go to the setuptools directory. Install the package under Jython with: sudo /opt/jython/bin/jython setup.py install
  5. Download and decompress python-twitter-0.8.1.tar.gz
  6. Look at the required dependencies for python-twitter and install them with Jython:
    1. http://cheeseshop.python.org/pypi/simplejson
    2. http://code.google.com/p/httplib2/
    3. http://github.com/simplegeo/python-oauth2
    4. You’ll need to change the file oauth2/__init__.py for Jython 2.5 compatibility:
from urlparse import parse_qs, parse_qsl

to:

try:

from urlparse import parse_qsl, parse_qs

except ImportError:

from cgi import parse_qsl, parse_qs

 

  1. Under the python-twitter-0.8.1 directory download the HtmlUnit compiled binaries from http://sourceforge.net/projects/htmlunit/files/ (we are using HtmlUnit 2.8 for this example).
  2. Go to the python-twitter-0.8.1 directory and Install the python-twitter package under Jython:
    1. sudo /opt/jython/bin/jython setup.py install
  3. Create a twitter application for testing and get its key and secret.

Example

get_access_token.py

Changes

  1. Replace consumer_key and consumer_secret with your application key/secret.
  2. Add the following imports and get_pincode function:
import com.gargoylesoftware.htmlunit.WebClient as WebClient
import com.gargoylesoftware.htmlunit.BrowserVersion as BrowserVersion

def get_pincode(url, username, password):
  webclient = WebClient(BrowserVersion.FIREFOX_3_6)
  page = webclient.getPage(url)

  twitter_username_or_email = page.getByXPath("//input[@id='username_or_email']")[0]
  twitter_password = page.getByXPath("//input[@id='password']")[0]
  allow_button = page.getByXPath("//input[@id='allow']")[0]

  twitter_username_or_email.setValueAttribute(username)
  twitter_password.setValueAttribute(password)

  page = allow_button.click()

  code = page.getByXPath("//kbd/code")[0]

  return code.getTextContent()
  1. Replace:
pincode = raw_input('Pincode? ')

with:

  twitter_username = None # replace it with your twitter username
  twitter_password = None # replace it with your twitter password
  print "Geting pincode"
  pincode = get_pincode('%s?oauth_token=%s' % (AUTHORIZATION_URL, request_token['oauth_token']),  twitter_username, twitter_password)
  print "pincode =", pincode

 

run.sh

#!/bin/sh
/opt/jython/jython -J-classpath "htmlunit-2.8/lib/*" get_access_token.py

Complete source code

#!/usr/bin/python2.4
#
# Copyright 2007 The Python-Twitter Developers
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import sys

# parse_qsl moved to urlparse module in v2.6
try:
  from urlparse import parse_qsl
except:
  from cgi import parse_qsl

import oauth2 as oauth

# HTMLUnit related code
import com.gargoylesoftware.htmlunit.WebClient as WebClient
import com.gargoylesoftware.htmlunit.BrowserVersion as BrowserVersion

def get_pincode(url, username, password):
  webclient = WebClient(BrowserVersion.FIREFOX_3_6)
  page = webclient.getPage(url)

  twitter_username_or_email = page.getByXPath("//input[@id='username_or_email']")[0]
  twitter_password = page.getByXPath("//input[@id='password']")[0]
  allow_button = page.getByXPath("//input[@id='allow']")[0]

  twitter_username_or_email.setValueAttribute(username)
  #password.text = password
  #password.setText(password) # HtmlPasswordInput
  twitter_password.setValueAttribute(password)

  page = allow_button.click()

  code = page.getByXPath("//kbd/code")[0]

  return code.getTextContent()

REQUEST_TOKEN_URL = 'https://api.twitter.com/oauth/request_token'
ACCESS_TOKEN_URL  = 'https://api.twitter.com/oauth/access_token'
AUTHORIZATION_URL = 'https://api.twitter.com/oauth/authorize'
SIGNIN_URL        = 'https://api.twitter.com/oauth/authenticate'

consumer_key    = None
consumer_secret = None
twitter_username = None
twitter_password = None

if consumer_key is None or consumer_secret is None:
  print 'You need to edit this script and provide values for the'
  print 'consumer_key and also consumer_secret.'
  print ''
  print 'The values you need come from Twitter - you need to register'
  print 'as a developer your "application".  This is needed only until'
  print 'Twitter finishes the idea they have of a way to allow open-source'
  print 'based libraries to have a token that can be used to generate a'
  print 'one-time use key that will allow the library to make the request'
  print 'on your behalf.'
  print ''
  sys.exit(1)

signature_method_hmac_sha1 = oauth.SignatureMethod_HMAC_SHA1()
oauth_consumer             = oauth.Consumer(key=consumer_key, secret=consumer_secret)
oauth_client               = oauth.Client(oauth_consumer)

print 'Requesting temp token from Twitter'

resp, content = oauth_client.request(REQUEST_TOKEN_URL, 'GET')

if resp['status'] != '200':
  print 'Invalid respond from Twitter requesting temp token: %s' % resp['status']
else:
  request_token = dict(parse_qsl(content))

  print ''
  print 'Please visit this Twitter page and retrieve the pincode to be used'
  print 'in the next step to obtaining an Authentication Token:'
  print ''
  print '%s?oauth_token=%s' % (AUTHORIZATION_URL, request_token['oauth_token'])
  print ''

  print "Geting pincode"
  pincode = get_pincode('%s?oauth_token=%s' % (AUTHORIZATION_URL, request_token['oauth_token']), twitter_username, twitter_password)
  print "pincode =", pincode

#  pincode = raw_input('Pincode? ')

  token = oauth.Token(request_token['oauth_token'], request_token['oauth_token_secret'])
  token.set_verifier(pincode)

  print ''
  print 'Generating and signing request for an access token'
  print ''

  oauth_client  = oauth.Client(oauth_consumer, token)
  resp, content = oauth_client.request(ACCESS_TOKEN_URL, method='POST', body='oauth_verifier=%s' % pincode)
  access_token  = dict(parse_qsl(content))

  if resp['status'] != '200':
    print 'The request for a Token did not succeed: %s' % resp['status']
    print access_token
  else:
    print 'Your Twitter Access Token key: %s' % access_token['oauth_token']
    print '          Access Token secret: %s' % access_token['oauth_token_secret']
    print ''

Conclusion

We have seen how to getOAuth tokens with a headless browser. This approach can be applied to other services such as Facebook and LinkedIn. A partial list of other services you can play with is available at: http://wiki.oauth.net/w/page/12238551/ServiceProviders

Look at our previous article Web Scraping Ajax and Javascript Sites for more information about setting up and usage HtmlUnit and Jython.

Sadly the prerequisites part requires an important extra effort to have it working quickly but once you have setup all the development environment it’s plain sailing.

Resources

  1. OAuth articles from Eran Hammer-Lahav
  2. OAuth 2.0 for Android Applications
  3. OAuth Will Murder Your Children
  4. Do Facebook Oauth 2.0 Access Tokens Expire?
  5. OAuth2 for iPhone and iPad applications
  6. Movistar BlueVia’s official API for SMS

Photo taken by mariachily