By Neha M
Greetings! When we go to a physical store and talk to attendants, we tend to use our regional language rather than an official language even when we know the language, due to the ease of expression. After looking at some chatbot conversations, I have come to realize that it’s the same case when users chat with a chatbot too. However, most chatbots fail to support different languages and use the default English language. In this blog, I’ll walk you through steps to train your own conversational AI chatbot in any language you want.
No matter which language you are working with, certain tasks like tokenization, parsing, entity recognition, etc are something you carry out anyways. A lot of modules offer these tasks for a lot of languages. You would want to refer to my blog – ‘NLP with Python’ where I have listed several modules (some being only Python modules) and their support for other languages. There are also several tools specially created to work on a particular language like JapaneseTokenizer, Chinese-tokenizer, etc. However, for many languages, there may not be any pre-trained models available. In that case, tools like Spacy let you train NLP models for your own language.
When you are training your own deep learning NLP models, choosing the best embedding model for your task is important. It is ok if the language you choose to build your bot in does not have any pre-trained language models available. Most of the algorithms for language models out there are unsupervised and let you train state-of-the-art models in any language/set of languages you want. Once you have the language model ready for your language, you can finetune your model to perform various tasks like text classification, next sentence prediction, answer generation, etc.
Ok, now that you have prerequisites ready, let’s see if you can build your chatbot. In my blog – ‘How are chatbots trained’, I have listed down the steps to train a chatbot. Do refer to the blog to get more details about the below steps.
Here, keeping track of the current context and selecting actions are language agnostic. The other 3 tasks are what we are interested in.
Intent classification requires a multi-class and/or multi-label text classification model which can classify your text into one or more intents (maybe based on context). We have our language model which can either be finetuned to do this task or can be used to retrieve embeddings which can be used as features to build classification models using other machine learning algorithms/frameworks.
An entity recognition system could be rule-based or built using deep learning techniques. To build rule-based entity systems, you will need POS tags or dependency parser tags on which you can build your entity regex rules. Or you can just use regex rules on queries. However, people are moving towards deep learning techniques to build NER systems. For all the above techniques, you can use tools like Spacy and Stanford CoreNLP about which we discussed in the earlier section.
We have discussed in our other blog that we need a Knowledge Graph which we can use for query expansion to retrieve results. This knowledge graph contains entities linked to one another based on some relation. How do we build this? Use all your entity extraction or information extraction techniques on domain data to build out a simple Knowledge Graph. When a user queries the bot, match the identified entities from the query to your knowledge graph by leveraging vector similarity techniques – Here comes your embeddings again! A recent advance in Knowledge Graph frameworks has brought in the concept of Knowledge Graph Embeddings which could make this task much easier.
After going through the above steps, it’s clear that we can create a Conversational AI chatbot in any language we want. The key is to know what tools do you have in hand which can handle your language or help you create NLP models for the language desired, have a language model for your language, and you are good to go!
By Neha M
Hello! Today I will be talking about one of the most hyped topics of AI and my personal favourite Natural Language Processing, in...
By Dinesh Sharma
AI Artificial Intelligence as a concept has taken the world by storm, but has also left millions confused about its true...