Background and Potential of Apple’s Siri
With the announcement of iOS 5 and the iPhone 4S, Apple kept a key piece of functionality out of the betas: Apple Siri. Siri, Inc., a company purchased by Apple in April of 2010, was created out of the Stanford Research Institute (SRI, or SiRI). SiRI is a spinoff of DARPA’s PAL (Perceptive Assistant that Learns) program. From SRI’s CALO (Cognitive Agent that Learns and Organizes) about page:
“The Defense Advanced Research Projects Agency (DARPA), under its Perceptive Assistant that Learns (PAL) program, has awarded SRI the first two phases of a five-year contract to develop an enduring personalized cognitive assistant. DARPA expects the PAL program to generate innovative ideas that result in new science, new and fundamental approaches to current problems, and new algorithms and tools, and to yield new technology of significant value to the military.
SRI has dubbed its new project CALO, for Cognitive Agent that Learns and Organizes. The name was inspired by the Latin word “calonis”, which means “soldier’s servant”. The goal of the project is to create cognitive software systems, that is, systems that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise.”
For folks who have been following Apple from before the iPod, it’s not a surprise that Apple would invest heavily in this space. From the early years, Apple has taken chances on how to use technology in more intuitive methods – away from the keyboard. If you haven’t seen it, check out the 1987 Knowledge Navigator concept video for an example of where Apple thought this might head 24 years ago. However, Apple’s Siri is the first real strong step towards interacting with an intelligent AI through voice. This is another example of how the hardware is reaching the point of computing strength, form factor, and battery life that allows the software to break out of the typical model.
Apple Siri Components
Apple’s Siri works by utilizing Nuance voice recognition on the iPhone, capturing audio input through the microphone or headset, forwarding that audio “fingerprint” to Apple for voice recognition, and then to the Siri engine – “the semantic layer” – to reply and take action back on the iPhone. Siri needs data connectivity to complete this task – so offline mode is not available.
Today, Siri is closed off to 3rd party developers. This system is only available to core functions in the OS. One major benefit to this capture/send/act aspect is that the brains of the voice recognition software as well as the function of the Siri engine are manipulated in the cloud. Recognition, interpretations, and actions can be modified centrally to the benefit of all users. While this orchestrated engine requires device based code and logic to execute, the quality of that execution can be manipulated as data is accumulated over time. Meaning that as users begin to explore Siri, there’s an ability to add additional functionality, tweak recognition, and change rules over time.
Apple Siri and the Enterprise
Propelics feels that Apple will begin to open this technology up to 3rd party developers next year – possibly to be announced at WWDC in 2012. This comes with caveats, however. This is a technology that requires a seamless integration and experience for the user, and iOS developers should expect that this is going to cause limitations in terms of their ability to manipulate the Siri engine. Apple is going to be cautious on ensuring this experience is not tarnished.
For example, within the enterprise we can see a scenario in the not distant future of picking up a device and asking “What were my sales today compared to yesterday?” In a typical voice-driven application, this isn’t a horribly difficult situation. My application can parse those keywords of “sales”, “compared”, “today”, and “yesterday” to identify a pre-built report that matches this situation. I’m sure this is happening today in small situations.
In a Siri context, however, we’re one layer removed from that logic. What context of the word “sales” are we using? Sales of what? What company? Is this a stock market question? We lose that contextual information of being in a single application. On the positive side, we have other tools available to the logic engine such as current location, time of day, user information, etc., that may add additional logic to the query being presented. The fundamental change, however, is that current Siri integrations today have been universal (all functionality available to all users), 3rd party integration is heavily reliant on what’s available on each individual user’s phone. Quite a bit more challenging and complicated.
In addition, you can be sure that Apple is going to ensure that the integrated experience is carried forward. To Apple, the end-user shouldn’t need to worry about what application is delivering the information, it should be up to Siri to understand what applications carry the contextual information required to answer the question – whether that information is in an installed application or available in the cloud. This is an important distinction to other voice-enabled applications. Apple will not be content to just deliver a voice-enabled interface to iOS applications separately, it will look to deliver an integrated interface to all available outcomes – even if that requires reducing the functionality available to 3rd party applications. Apple could enable Siri to pinpoint 3rd party applications, such as “Business Intelligence Application, what were my sales today compared to yesterday?”, but that solution doesn’t feel like one that Apple would accept.
One thing is for sure, when your competition gets vocal very quickly discounting your product, you know you’ve developed something that has potential. Propelics is just starting on this path. We expect to Siri go “pro” on iOS in 2012 with the release of the iPad 3 and iOS 5.1 or more. We also expect Siri to find it’s way into Mac OSX. And also expect continual advancement of the voice recognition and Siri contextual engines as well as additional OS level integrations.
In any case, even with limitations we look forward to bringing this technology into our Enterprise iPad applications. We’re already looking into potential opportunities in some of our solutions for banking, automotive, insurance, and manufacturing. Propelics builds iOS Apps to reduce the complexity of performing tasks, and adding Siri voice recognition to enterprise iPad apps is a great step in this direction.
So if the queries are: “What clients have contracts ending in the next 30 days?” “Which of my clients has put in an urgent service request lately?” “I’m in Boston Tuesday, which of my clients can I visit near Logan airport?” “What’s the status of this customer’s recent order?” or “Show me all outstanding and received payments from this customer.” We look forward to making these a reality.
Don’t underestimate where Apple’s Siri has derived from or what the possibilities are by just viewing the functionality in this iOS 5 release. We feel that this is a very small first step into a wide future of intelligent interaction via voice recognition in iOS – especially in the Enterprise.
This is a topic that will be covered often in the Propelics blog, and part of our Enterprise Mobile Strategy Workshops going forward. If you would like to be notified about new articles about this topic and others, follow @propelics on twitter, or sign up to have these posts delivered to you in email by filling out the form on the left.
Partner and Co-Founder at Propelics