Wednesday, December 5, 2012

We need app/device/websites interlinking

I find my current computer experience very frustrating. For example, when I want to check my gmail I have to open a browser and click on some bookmark and possibly login (or re-login as another user if my spouse is currently logged in in her gmail). Or when I am searching for something the way I have to click on a link by link and open them in new tabs and then close some of them when there are too many tabs.

And the root of the problem in my opinion is the lack of interlinking. My biggest irritation is that nothing really changed in the last 40 years - we still work with our computers via flat screen using some kind of window-oriented solution where each window represents an application, each application is generally unaware of the rest of the running applications, and we control the process within each application on a very minute level of detail by clicking the mouse or pressing something on the keyboard.

Taking my previous gripe as an example, my desire to check gmail is called in my head simply a 'check-my-email' scenario but when it comes to realizing it practice I have to do all the tiny actions involving my moving and clicking the mouse to open a browser or a new tab in a browser, typing my password, clicking Login button etc.

Furthermore, when in front of the computer I very seldom work with just one application exclusively (okay, except for when playing Unreal Tournament and watching Netflix) - most of the time I am solving some kind of a problem which involves more than one application or a website, i.e. work in scenarios:
  • going out for a movie - check rating on, if rating is more than 60% then find showtime around 7pm near my home (and possibly, though I very seldom do it - buy ticket on fandango and print it).
  • working in my Intellij IDEA on java code and then googling for a strange exception error message and going through all the forums and mail lists trying to figure out what is causing it and how to fix it. Note: in this particular scenario I am only interested in forums and sites dealing with java and with specific library (or even version of the library) I am having a problem with. As a side note, it is after repeatedly doing these steps I decided to create 
  • researching a doctor I am going to visit by checking him on all medical sites (note: in this scenario when I am looking for ‘Bob Smith’ MD I do not want to see any managing directors and I am interested in ‘Bob Smith’ only in my town and state, not some guys across the country) and then when I think he is okay I need to find directions to his office, check office hours, add him and his address to my contacts, and print directions just in case.
By the way, attempt of search engines to be smart and return relevant results (when you search ‘pizza Portland’ you will get different localized results depending on where you are - Portland Oregon or Portland Maine) stems from the attempt to deduce the end user’s scenario based on what he types and his previous searches (a practice which duck duck go calls google’s bubble), and currently the only way for the end user to communicate his scenario is via additional search keywords, which most of time fails miserably leading to lots of clicking on the results; even when search engine guesses my scenario correctly, the output is just paginated search result links.

We need a new interactivity paradigm (let’s code name it ‘Stream’) which will shift the interaction model from being application action-centered (clicking on the UI or typing search queries) to user scenario-centered by interlinking devices, applications and websites.

In a nutshell, Stream will consist of two things:
  1. Human communication API: a very rapid request/response cycle as close to the speed of thought as possible
  2. Abstraction API: ability to expose a high-level scenarios to the user and/or other Stream device/application/websites
Let’s review each item in more detail.

Human Communication API
drives my ability to execute a mental request which the target device supporting Stream interface (computer or my home phone or my bedroom AC) would somehow recognize. The implementation of recognition can be:
  • reading my facial expression (via high resolution webcam)
  • reading my hand/finger gestures (via high resolution webcam or my cell phone’s camera looking at my hands)
  • reading my tactile gestures
  • voice commands
  • reading movements of the eyes and correlating these movements to a pixel-precision location on the computer display (let’s say there is a new type of icon on a desktop - a ‘visiicon’ - something which triggers a command when I look at it for more than 1.5 second)
  • ideal solution - real brain-computer interface (hopefully the non-invasive kind)

Abstraction API:  Each device/application/website must provide a new kind of api which would bridge scenarios (what is exposed to the outside) and usecases, considerations, and sequence of actions (internal implementation) on the device/application/website that must be taken to accomplish the abstraction scenario.

Stream API is not an API in a typical sense - instead of exposing a bunch of low-level methods like ‘Email[] checkEmail()’ the goal is to tell other Stream devices that ‘I have a ‘check-email’ scenario you can use, and internally I will figure out if I need extra input from the user via the Human Communication API and these are the possible outcomes of this scenario’. This way the creator of the device/application/website (who generally has the best insight into the possible usecases and capabilities) puts together and exposes not just atomic actions but a thought-through implemented scenarios which the Stream can take and communicate with the user via communication module (if user input is needed) and with other Stream devices.

Having a standard API will allow competition among developers to provide more than one kind of gateway to the rest of Stream ecosystem - some might develop a new Siri-like voice interface, others might provide smart free-text entry desktop gadgets which will try to figure out user scenarios on the fly, confirm them, and execute them, and others will provide visiicons and 3d gestures support.

I am really waiting for the day when we can raise user experience for typical scenarios to a new level where we are communicating on the abstractions (scenarios) rather than on a series of button and menu clicks with isolated app/device/websites.

What do you think?

No comments:

Post a Comment