Most of the magic behind Siri happens remotely.
I want to create my OWN version of Siri…. except I don’t care for having it on my phone. I want my entire house to be talking to me… more like Jarvis (from Ironman).
I believe I have access to all the right resources to create this AI.
It breaks down into three major parts:
1) convert speech to text
2) query database populated with q&a
3) convert text to speech
Speech to Text
Most speech to text engines suck. Siri’s works exceptionally well because the engine isn’t on your phone… it’s remote. I supposed we can hack Siri by running a MITM attack on an iphone and faking the SSL cert and intercepting the apple ID…. OR we can do something much simpler. Google’s Chrome 11 browser includes a voice input function (which isn’t yet part of the HTML5 standard) and can convert your speech into text. This guy discovered that it was happening remotely through an undocumented API call to google. All we have to do is access this same API and we got ourselves a free Speech-to-Text engine!
In case you don’t understand Perl, this is how you use the API:
POST params: (which should include the contents of a .flac encoding of your voice recorded in mono 16000hz or 8000hz)
(which should read “audio/x-flac; rate=16000” or 8000 depending on your voice recording. This should also be mirrored in the Content-Type section of your header.)
Response: json text
I used ffmpeg to convert my audio into the desired format:
So I recorded my voice on my iphone 3gs asking “what day is it today?” and converted it to the appropriate .flac format and posted it to google’s API and this is what I got in response:
Database populated with Q&A
This is probably the most difficult part to obtain. To build it from scratch would require tons of data and advanced algorithms to interpret sentences constructed in various ways. I read somewhere that Siri was using Wolfram Alpha’s database….. so…. I checked out Wolfram Alpha and they have an engine that answers your questions. Not only that, they also offer an API service. (If you query less than 2000 times a month, it’s free!). So I signed up for the API service and tested it out. I asked it some simple questions like “What day is it today?” and “Who is the president of the United States?”. It returns all answers in a well-formed XML format.
Text to Speech
This part is easy… and google makes it even easier with yet another undocumented API! It’s straight-forward. A simple GET request to:
Just replace the parameter with any sentence and you can hear google’s female robot voice say anything you want.
I can either make my program run over a web browser or as a stand-alone app. Running it over the web browser is cool because I would then be able to run it from just about any machine. Unfortunately, HTML 5 doesn’t have a means of recording voice. My options are a) only use google Chrome, b) make a flash app, c) make a Java applet.
Anywho… no big deal.
Putting It All Together
It responds with this answer. Good girl.
It’s still missing the voice input portion of the code. Currently, it just accepts a .flac file. I wrote 3 chunks of code that I put together as one pipeline of an AI process. The advantage of this over Siri is that I can intervene at anytime. I can have it listen for particular questions such as “who is your master?” and respond appropriately…. but more importantly, I can have it listen for “Turn on my lights” or “turn on the TV” or “open the garage door” or “turn to channel 618”. Certain questions will have my bot send a signal to the appropriate Arduino controlled light switch or garage switch or IR blaster and respond with a “yes, master”. I’ll post videos when it’s done.
Here is a video of the prototype in action.
Updated to give you a link to a working demo. This version requires you to use the Chrome browser (thanks to Shiv Kokroo for generously providing hosting / wolfram app ID):
Click on the little microphone and try asking her a question like “how many legs does a spider have?” or “what is 15 + 11?” or “turn off the lights”. 🙂
Update: There is a follow-up to this post here.
Source codes can be found on github.
- What is aquatic adaptation
- Can a hdd be repaired
- Is the blood pressure 142 95 healthy
- How reliable is Western media
- Sexuality Is Siddhartha malhotra gay
- What does the regular expression pattern mean
- What should I do get into Cambridge
- How do you dig a grave
- Where is the voltage divider rule applicable
- Why is it hot at the equator
- What is happening in the Democratic party
- How powerful is the French military
- How does Noam Chomsky get his news
- Why is it hard to focus?no_redirect=1
- Which Indian street food is world famous
- Who is Bernie Sanders natural successor
- Who is Michael Gove Whats his background
- How safe is Garfield Park in Chicago
- What makes you a model 1