How to Build Your Own Alexa Service

Screenless Interfaces. Viget dabbles in such things. Read on to learn more.

With the recent introduction of Amazon and Google products that provide Ironman-esque voice control functionality, we've been wondering lately what this means for the future of human computer interactions. Always on the lookout for emerging technology to get ahead of, we decided to put a project together to see what these little devices are capable of.

We had about 2 weeks before the three Viget offices were assembling for an all hands gathering, so we wanted to something both fun and interactive. What we ended up with was an Alexa service that could figure out which Viget employee you were thinking about. We called it: The Know It All

There are a couple pieces to this puzzle - a Rails backend, a React frontend, and an Alexa ... other frontend. I'll cover the Alexa aspect more in depth as that's what's new and interesting here, but you can find links to the other pieces down below. Enough chatter, let's get into how this thing actually works!

Making an Alexa Skill

Amazon has a Developer Console, which may take some hoop jumping to get into. But once you're in, all of the integration work takes place inside of an Alexa Skill. And more specifically, the Interaction Model of that Skill, which includes an Intent Schema, and Sample Utterances. Let's take a look at what that looked like for us:

Intent Schema

{
  "intents": [
    {
      "slots": [
        {
          "name": "answer",
          "type": "POSSIBLE_ANSWERS"
        }
      ],
      "intent": "Play"
    },
    {
      "intent": "AMAZON.YesIntent"
    },
    {
      "intent": "AMAZON.NoIntent"
    },
    {
      "intent": "Skip"
    }
  ]
}

Sample Utterances

Play begin
Play I want to play
Play {answer}
Play they are a {answer}
Skip i don't know

So, what's going on here? Sample Utterances are the entry point. When you say "Alexa, tell [the name of your service] to [do something]", it takes your [do something] checks for a matching line. If there is a match, then it invokes the associated Intent (identified by the first word in the line).

So when we say "Alexa, tell The Know It All that I want to play," the Play Intent is passed to the server endpoint we've configured.

Another piece to be aware of is the "slots" key. That comes in to play when you say something like They are a woman. The Play they are a {answer} line would match there, which fires the Play Intent with the term "woman" in the answer slot.

And lastly I'll point out the POSSIBLE_ANSWERS slot type associated with our "answer" slot. Amazon has a few built in slot and intent types if you wanted to hook into a well known data set (eg. dates, sports, actors, etc.) For our purposes, we had a custom list of possible answers to our questions, so we defined our own slot type to be matched on.

The Backend

As I mentioned before, you can configure an Alexa Skill to make it's requests to an API endpoint. With that hooked up, Amazon will send a POST request anytime there is an interaction with your defined Skill, and the user's speech will be sent along according to the schema you've laid out. It also ties in a session variable which enables you to engage in a back-and-forth interaction with the user which you can continue or terminate at any point.

A major help for us here was the use of the alexa-rubykit gem. It assists you in building up the appropriate response to send back to Amazon so you can easily define: - what the Echo should say next - what the Echo should say if it doesn't hear anything immediately - whether the session should continue or close out - any auxiliary parameters you'd like to track in the session

As promised, here's a gist of the Alexa specific pieces of Ruby code powering the backend.

The Frontend

I won't discuss the intricacies of this too much, it was really just an excuse for me to dabble in React, Microcosm, and React Motion. All of which are excellent tools by the way. The Microcosm app receives updates via Pusher (also excellent) and serves as a visual display for the current state of the active "game" being driven by Alexa.

There's also a fancy waiting page, let's just watch a nice gif of that.

looping-know-it-all

Wrapping Up

It's been fun and relatively straightforward to get our own custom Alexa Skill up and running. And you ever happen to swing by our HQ office, definitely give The Know It All a spin yourself!

Eli Fatsi

Eli uses his mathematics degree from Carnegie Mellon to blur the lines between the digital and physical worlds. He codes for Shure, Volunteers of America, and other clients from our Boulder, CO, office.

More articles by Eli