Dan Bond

archives

Alexa meets Monzo

Feb 6, 2017

Alexa meets Monzo

https://github.com/syscll/alexa-monzo

An unofficial Alexa Skill for interacting with the Monzo API.

Recently, I started to become very interested in voice assistants and home automation. I had a few IoT devices already and there were a few others, such as the Nest and Philips Hue, that I really wanted to purchase. I liked the idea of having them all connected. So, after a bit of shopping around I decided to buy an Amazon Echo as there isn’t much competition on the market for voice assistants (at least not in the UK).

Moreover, my main interest in home automation was the ability to make my own integrations with the apps I love and use the most — enter Monzo. I’ve been using Monzo for a while and one of their main selling points is the ability to access your account data through the use of their API.

So, as my first project I decided to build a custom Alexa Skill to use all of their banking goodness.

Tech Considerations

Firstly, Alexa Skills can only trigger a Lambda function on AWS or a HTTPS endpoint. I decided to choose the recommended Lambda option, however this meant there were a few limitations when it came to choosing a programming language as AWS Lambda only supports Python, Node, C# and Java. I did want to utilise the awesomeness of Golang, but I didn’t want to have to use external services for such a simple app.

As the skill is just a buffer between Alexa and the Monzo API, I decided to roll with Python for its sheer simplicity as the Lambda function only needs to make HTTP requests and format data into a JSON response.

There are a few prerequisites before you can start developing a custom skill. Firstly, you’ll need an AWS account. The free tier gives you more than enough resources to get the ball rolling as you’ll need access to Lambda in order to run the skill and CloudWatch to process any logs. Secondly, you’ll need access to the Amazon Developer Console in order to enable your skill for testing on your Amazon Echo (you can also publish your app to the Alexa Appstore if you want others to use it).

The code for the skill itself is rather straight forward. The basic flow is as follows:

  1. Alexa transforms speech into text and sends it to your Lambda function in the form of a request.
  2. Your application parses the request and uses the data to run the necessary functions.
  3. Upon completion, your application returns any data in the form of an Alexa response.

Simple.

Alexa automatically maps speech to a custom type called an intent. An intent represents a high-level action that fulfils a user’s spoken request. For example, the Monzo skill has an intent called GetTransactions which is triggered when a user’s spoken request matches a specific sentence, also known as a utterance.

Defining several utterances per intent will increase Alexa’s accuracy while mapping a spoken request. Utterances can optionally contain arguments called slots that collect additional information needed to fulfil the user’s request. One example of a slot is the TRANSACTION_CATEGORY used to define all of the different transaction categories in the Monzo skill.

If we combine all 3 together, we get something like this: GetTransactions how much I've spent on {category} in the last {duration}. This allows us to ask Alexa to get the total amount spent on any given category in any given time period.

Once the application has successfully parsed the request it can perform any business logic, which in our case is communicating with the Monzo API. (I won’t go into much detail about this, the docs can be found here.)

The final stage of the skill is to create and return a valid Alexa response. A response contains data that both Alexa and the companion app can use, as well as other information for making further requests such as session attributes. Alexa will say anything defined in the outputSpeech object and the companion app will display anything in the card object.

The footprint of an Alexa skill is rather minimal. It doesn’t require a lot of code to get a basic application working on your Amazon Echo — the majority of time is spent performing any configuration.

Headaches

There seems to be a few differences between the Alexa mobile and web apps. This became apparent when trying to link my Monzo account with my Alexa. Even though both interfaces seem to be identical, the mobile app would not let me return to the app upon OAuth completion. There were no error messages on screen, just a continual redirect loop. I tried this on the web app and it worked perfectly first time — account linked!

Amazon have created a few built-in slot types that make it easier to work with request data. For example, the AMAZON.DURATION slot converts words that indicate durations into an ISO-8601 duration format. This is extremely useful as you don’t have to spend time manually converting speech to datetime objects. However, my only gripe is that the original speech is not included in the Alexa reuqest — only the formatted duration. This may seem trivial, but it in my case I wanted the Alexa to speak back the original speech input, I therefore had to manually convert each duration back to human text.

Although this last issue isn’t critical, I found testing the skill a bit inconsistent. The developer console has a dedicated testing section that allows you to send custom or pre-generated JSON requests without the need to physically speak to your Echo device. This makes testing a lot faster, although I regularly received session timeouts and inconsistent error messages when sending the same request.

The End Result

A definite success! After a bit of trial and error, the Alexa skill works almost perfectly ever time. The voice recognition is near perfect and the average end-to-end request/response time is ~1 second. See for yourself..

If you want to try the application on your own Echo, I wrote some detailed instructions for getting started with the Monzo Alexa Skill and comprehensive documentation can also be found on the Amazon Developer website.