What does it take to build a chatbot? Let’s find out.

 

Without any delay, the image below shows what we are building:

An image of nothing.

To answer the question in the title, “What does it take to build a chatbot?” the answer is not much.

I’m a web developer. It has been my desire to dig into this thrilling field for a long time. Unfortunately, I can’t say I have the knowledge in Natural Language Understanding (NLU) required to build a chatbot without help. The good news is that such help is available today.

Google’s Cloud Natural Language API, Microsoft’s Cognitive Services APIs, and IBM’s Watson Conversation provide commercial NLU services, with generous free tiers. There are also completely free ones, at least for the moment. This includes API.AI, which has recently been acquired by Google, and Wit.ai, which Facebook owns.

From a web developer’s point of view, that’s all the help we need — an API which will remove the complexity for us.

Let’s start with the fun part

If you are eager to see the example live, here is the demo available on Heroku. The entire code for this example is available on GitHub.

For the purpose of this article, we’ll create a chatbot called TiBot to answer our questions about the date and time. We’ll use API.AI’s API to process these questions. I believe API.AP is more intuitive and easier to work with than Wit.ai.

At the back end, a simple Node.js server will handle requests sent from the front-end app via WebSockets. We’ll then fetch a response from the language processing API. Finally, we’ll send the answer back via WebSockets.

At the front end, we have a messenger-like app built on a single Angularcomponent. Angular is built-in TypeScript (a typed superset of JavaScript). If you are not familiar with either of them, you still should be able to understand the code.

I chose Angular because it inherently uses RxJS (the ReactiveX library for JavaScript). RxJS handles asynchronous data streams in an amazingly powerful yet simple manner.

API.AI setup

API.AI has a neat Docs section. First, we need to become familiar with some of the basic terms and concepts related to the APIs, and to know NLU in general.

Once we create an account at API.AI, we need to create an agent to start our project. With each agent, we get API keys — client and developer access tokens. We use the client access token to access the API.

Agents are like projects or NLU modules. The important parts of an agent are intents, entities, and actions and parameters.

Intents are the responses the API returns or, according to API.AI, “a mapping between what a user says and what action should be taken by your software.” For example, if a user says, “I want to book a flight,” the result we receive should look like the following:

{ … “action”: “book_flight” … }

Entities help extract data from what the user says. If a user says, “I want to book a flight to Paris,” we want to get the information about Paris in. We need that data passed to our logic so that we can book a flight to Paris for our user. The result should look like this:

{
...
"action": "book_flight",
"parameters": {
"destination": "Paris"
}
...
}

Entities are parameters values, like data types. There are system-defined entities by the API.AI platform. Examples of these include @sys.date, @sys.color, @sys.number. More complicated ones include @sys.phone-number, @sys.date-period, @sys.unit-length-name.

We can also define our own entities, or pass them on the fly with each request. A good example of passing entities on the fly is that of users listening to their playlists. Users have a playlist entity in their request or a user session with all of the songs in the playlist. We would be able to respond to “Play Daydreaming” if the user is currently listening to Radiohead’s A Moon Shaped Pool playlist.

Actions and parameters send requests to the API so that they result in an action. But they may also result in something our chatbot doesn’t understand. We may choose to fall back to a default response in that case.

Parameters are the companion of actions. They complement and complete the action. In some cases, we don’t need parameters. But there are cases where actions only make sense with parameters. An example is booking a flight without knowing the destination. That is something we need to think about before we even start creating the intents.

Finally, the following code is how the API’s response should appear for a resolved intent:

{
"id": "HEX_ID",
"timestamp": "TIMESTAMP",
"lang": "en",
"result": {
"source": "agent",
"resolvedQuery": "this is how the bot understood the input",
"action": "this.is.the.action",
"actionIncomplete": false,
"parameters": {
"par1": "value 1",
"par2": "value 2"
},
"contexts": [],
"metadata": {
"intentId": "HEX_ID",
"webhookUsed": "false",
"webhookForSlotFillingUsed": "false",
"intentName": "the.intent.name"
},
"fulfillment": {
"speech": "",
"messages": [
{
"type": 0,
"speech": ""
}
]
},
"score": 0.9700000286102295
},
"status": {
"code": 200,
"errorType": "success"
},
"sessionId": "SESSION_PROVIDED_ON_OUR_SIDE"
}

The most important part of the JSON is the “result” object with the “action” and “parameters” properties discussed above. The confidence for the resolved query (in the range of 0 to 1) is indicated with “score”. If “score” is zero, our query hasn’t been understood.

It’s worth noting that the “context” array contains information about unresolved intents that may need a follow-up response. For example, if a user says, “I want to book a flight,” we’d process the book_flight” action (the context). But to get the required “destination” , we may respond with, “Ok, where would you like to go?” and process the “destination” within the following request.

The back end

We are building a chat app. The communication between the client and the server will go through WebSockets. For that purpose, we’ll use a Node.js WebSocket library on the server. Our WebSockets module looks like this:

const server = require('./');
const processRequest = require('./intents');
const WebSocket = require('ws');
const uuidv4 = require('uuid/v4');

const wss = new WebSocket.Server({server: server});

wss.on('connection', (ws) => {
ws.on('message', (msg) =>
processRequest(msg)
.then(answer => ws.send(JSON.stringify({type: 'bot', msg: answer})))
);
ws.send(JSON.stringify({type: 'sessionId', msg: uuidv4()}));
});

module.exports = wss;

The format of the WebSockets messages is a string encoded JSON with “type” and “msg” properties.

The string “type” refers to one of the following:

“bot”, which answers to the user.

“user”, which the user asks the bot.

“sessionId”, which issues a unique session ID.

Our chatbot’s answer is contained in “msg”. It is sent back to the user, the question of the user, or the sessionId.

The processRequest(msg) represents the core of our server’s functionality. It first makes a request to the API:

//
// text: user's question
// sessionId: users session ID
// tz: user's timzone
const callApiAi = (text, sessionId, tz) => new Promise((resolve, reject) => {
const request = apiai.textRequest(text, { sessionId: sessionId, timezone: tz });

request.on('response', response => resolve(response));
request.on('error', error => reject(error));
request.end();
});

Then, it executes withdoIntent() — the specific action for the user’s intent, based on the response from the API:

// Process the action
// response: Response from the API
// tz: user's timezone
const doIntent = (response, tz) => {
const { parameters, action, fulfillment } = response.result;

return new Promise((resolve, reject) => {
if (intents[action]) {
return resolve(intents[action](parameters, tz));
} else if (fulfillment.speech) {
return resolve(fulfillment.speech);
}
return reject(handleUnknownAnswer());
});
}

doIntent() checks to see if there is a function to handle the action in the response. It then calls that function with the parameters of the response. If there is no function for the action, or the response is not resolved, it checks for a fallback from the API. Or it calls handleUnknownAnswer().

The action handlers are in our intents module:

module.exports = {
"date.check": (pars, tz) => {
// ex: is it New Year in Korea
// ex: is it the 21st of February
const { location } = pars;
const utz = getTZ(tz);

return fetchTZ(location, utz)
.then(tz => {
const date = moment.tz(pars.date, tz);
const now = moment.tz(tz);

return now.format("MM-DD-YYYY") === date.format("MM-DD-YYYY")
? `
Yes, it's ${now.format("MMMM Do YYYY")}
${location ? 'in ' + getLocation(location) : ''}
`
: `
No, it's ${now.format("MMMM Do YYYY")}
${location ? 'in ' + getLocation(location) : ''}
`;
});
},

 // ex: I wonder which year are we in
"date.year.get": (pars, tz) => 'The current year is ' + moment().format('YYYY'),

"date.between": (pars, tz) => {
// ex: how many days between today and New Year
// ... entire action code goes here ...
},

"date.day_of_week": (pars, tz) => {
// ex: what day of the week is it today in London
// ... entire action code goes here ...
},

"date.day_of_week.check": (pars, tz) => {
// ex: is it Friday tomorrow in Moscow
// ... entire action code goes here ...
},

"date.get": (pars, tz) => {
// ex: what date is tomorrow
// ... entire action code goes here ...
},

"date.month.check": (pars, tz) => {
// ex: do you know if it's March now
// ... entire action code goes here ...
},

// ex: I'd like to know what month is it now
"date.month.get": (pars, tz) =>'It's ' + moment().format('MMMM'),

"date.since": (pars, tz) => {
// ex: how do I know how many minutes passed since the year 2013
// ... entire action code goes here ...
},

"date.until": (pars, tz) => {
// ex: how many months till New Year
// ... entire action code goes here ...
},

"date.year.check": (pars, tz) => {
// ex: now it's 1989 year right
// ... entire action code goes here ...
}
};

To each handler function, we pass the parameters from the API response. We also pass the user’s time zone that we receive from the client side. Because we are dealing with the date and time, it turns out that the time zone plays an important role in our logic. It has nothing to do with the API, or NLU in general, but only with our specific business logic.

For example, if a user in London on Friday at 8:50 pm asks our bot, “What day is it today?” the answer should be, “It’s Friday.”

But if that same user asks, “What day is it in Sydney?” the answer should be, “It’s Saturday in Sydney.”

Location is important to our business logic too. We want to detect where the question is coming from (in the case of Sydney), so that we can get the time zone for its location. We would combine Google Maps Geocoding API and Time Zone API for that.

The front end

Our app is a single Angular component. The most important functionality is within the ngOnInit() method of the component:

ngOnInit() {
//
// The WebSocket Observable
this.ws$ = Observable.webSocket(this.wsUrl);

//
// Get the sessionId on connecting to the WS:
//we need to do this only once, and we are only
//concerned aboout the messages containing the sessionId
this.ws$.filter(r => r.type === 'sessionId')
.takeUntil(this.ngUnsubscribe$).take(1)
.subscribe(r => this.wsSessionId = r.msg);

//
// Get responses from the bot, and show them
//(attempt to reconnect on connection fail, retry 3 times)
this.ws$.takeUntil(this.ngUnsubscribe$)
.filter(r => r.type === 'bot')
.retryWhen(err$ =>
Observable.zip(err$, Observable.range(1, 10), (e, n) => n)
.mergeMap(retryCount => Observable.timer(1000 * retryCount))
)
.delayWhen(input => Observable.interval(100 + input.msg.length * 10))
.subscribe(
(msg) => this.pushMsg(msg)
);
}

We first create the WebSocket (WS) connection to our server with a WS Observable. We then subscribe a couple of observers to it.

The first observer gets the sessionId when it connects to the WebSocket server. Immediately, the take(1) operator is unsubscribed:

The second subscription is the fun one:

this.ws$.takeUntil(this.ngUnsubscribe$)
.filter(r => r.type === 'bot')
.retryWhen(err$ =>
Observable.zip(err$, Observable.range(1, 3), (e, n) => n)
.mergeMap(retryCount => Observable.timer(1000 * retryCount))
)
.delayWhen(inp => Observable.interval(100 + inp.msg.length * 10))
.subscribe(
(msg) => this.pushMsg(msg)
);

Here we want to take out the messages only from the bot, hence the filter(r => r.type === ‘bot’) operator. The retryWhen(err$ => …)operator automatically re-connects to the WebSocket after it has been disconnected.

The purpose of the delayWhen() operator is to achieve “the bot is typing” effect that the messengers use. To do this, we delay the data for 100 + MSG_CHARACTERS_LENGTH * 10 milliseconds.

When the message gets through all the operators, we push it into our array of messages (msg) => this.pushMsg(msg).

We use the component’s private pushMsg() method to add a message and to show it in the chat:

private pushMsg(msg: Object, clearUserMsg: boolean = false) {
this.msgs.push(msg);
this.botIsTyping = false;
this.scrollChatToBottom();
this.userMsg = clearUserMsg ? '' : this.userMsg;
}

If the message is from the user (the clearUserMsg flag), we clear the input box. We use this.botIsTyping to control “the bot is typing” effect. So here we set it to false.

We handle the user input with the onSubmit() method when the user hits Enter:

onSubmit() {
const input = {
type: 'user',
sessionId: this.wsSessionId,
msg: this.userMsg,
tz: this.timezone
};

this.ws$.next(JSON.stringify(input));
this.pushMsg(input, true);
this.botIsTyping = true;
}

Along with the user’s message, we send the user’s sessionId and time zone. These are indicated in this.ws$.next(JSON.stringify(input)). To show the bot is typing effect, we also set this.botIsTyping to true.

The Angular’s component template we use as the UI of our app, consists of the following code:

<ul class="msgs" #chatMsgs>
<li class="bot">
⏰Hello there, TiBot here. Ask me something about date and time!
</li>
<li
*ngFor="let input of msgs"
[ngClass]="{'user': input.type === 'user', 'bot': input.type === 'bot'}"
>{{ input.msg }}</li>
<li class="bot" *ngIf="botIsTyping">
<span>.</span>
<span>.</span>
<span>.</span>
</li>
</ul>

<div class="chat-input">
<form (submit)="onSubmit()">
<input [(ngModel)]="userMsg" name="userMsg" class="chat-input-text" type="text" placeholder="Ask something...">
</form>
</div>

This is all we need for our app on the front end.

It’s amazing to see how elegant and clean this code turned out. Thanks to RxJS. When using WebSockets, things tend to get complicated. Here we’ve done it with a single line of code.

And having features like auto re-connecting — well, that’s a story on its own. But with RxJS, we handled that in a simple manner.

To conclude, I hope you understandable why I said, “It doesn’t take much” to answer the question, “What does it take to build a chatbot?”

This doesn’t mean that building a chatbot is an easy task. These NLU services, as intelligent as they are, won’t solve all our problems. We still need to take care of our own business logic.

A couple of years ago, it was impossible for me to build something similar to this. But services like API.AI now makes that power available to everyone.

API.AI also provides integrations with Facebook Messenger, Twitter, Viber, and Slack. But for this article, I thought it would be best to use their API to better understand how everything works.

I hope you’ve enjoyed this article and find it helpful to build your own chatbot.

Source: https://medium.freecodecamp.org/what-does-it-take-to-build-a-chatbot-lets-find-out-b4d009ea8cfd

Share :

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.