How to test chatbots in Rasa framework? | by Joanna Trojak | Feb, 2021
Chatbots are the new apps. They are the new interface users can interact with to get a particular service. Much like in the case of mobile applications, user experience and quality should be the main goal for developers.
The variety of tools is not so abundant like in case of web and mobile interfaces and many developers think that testing conversations are the most difficult types of testing around. The main problem is that conversations are messy. We cannot force the user’s to follow the designed happy path we have created.
The majority of chatbot platforms do not provide all the necessary tools required to improve the quality of the conversation. For instance in DialogFlow we can assess the quality of intents, in respect of them failing or succeeding in the conversation but that’s it. We can’t dive deeper and get more information necessary to provide a better conversational interface.
Rasa framework is different because it provides more tools for developers and testers. We can test chatbots in many ways and understand how to improve. We can write test stories, assessing nlu model, checking its performance, assess the intent or entity performance. The aim of this article is to give a general overview of those tools.
The steps we should take to test a chatbot are illustrated on the following checklist:
This is the first step we should take before hitting the train button in Rasa. The command
rasa data validate
checks for the inconsistencies in stories or nlu model. If there are any, the model may not train or have a bad performance so it is good to use it before training to spare yourself unnecessary annoyance.
Test stories are part of the conversation-driven development I have before talked in the article entitled I’m learning all the time — conversation-driven development in a chatbot for Erasmus students with Rasa framework. Test stories are written in the form of exemplary conversations to check whether the bot will behave as expected.
You can write them in test stories file in the project folder or if you have the Rasa X deployed on the external server, you can generate stories while interacting with the chatbot.
Once you have a good set of test cases, you can run
1. Case Study: Building Appointment Booking Chatbot
2. IBM Watson Assistant provides better intent classification than other commercial products according to published study
3. Testing Conversational AI
4. How intelligent and automated conversational systems are driving B2C revenue and growth.
You should make a habit of adding new stories as your chatbot grows and learns. It is not something you can do once in a while and then forget about it.
I’m planning to write an article on how to write good test cases.
Your NLU file contains all the training examples your chatbot is trained with. In the real-life, the bot will encounter the examples which are not in the training set of course. That’s why you should split your data for testing to simulate that kind of situation.
rasa data split nlu
Once your data is split, you can check the prediction rate of your model.
rasa test nlu--nlu train_test_split/test_data.yml
Once the testing has been completed you can find the results in the results folder in the project. If you want to test the chatbot more you can use cross-validation.
rasa test nlu--nlu data/nlu.yml--cross-validation
You can check your dialogue model on a group of test stories.
rasa test core --stories test_stories.yml --out results
This command will generate a report about failed stories and confusion matrix for each story regardless of whether they failed or not.
We can test the policy configuration in a similar manner. Policies are pre-trained models you can use in your training. AssesingAssessing their performance will help you decide which configuration is the best for your chatbot.
rasa train core -c config_1.yml config_2.yml
--out comparison_models --runs 3 --percentages 0 5 25 50 70 95
You can create two or more config file to check which set of models performs the best.
Rasa provides you with the necessary tools to assess the quality of the conversation. Thanks to that you can give the users a better experience while using your chatbot.
Credit: Source link