Are you familiar with Selenium or Appium? Then you will love Botium. We are introducing a new generation of test automation especially for chatbots. And it is open source!View the profile
About the talk
Testing is a crucial enabler for the success of chatbots and virtual assistants. Doing it manually requires enormous time and efforts.
As DevOps and furthermore AIOps grow in importance, automated testing will remain critical to ensure that bots actually do what their designers intend. Unlike traditional software where the application follows a predefined flow, a chatbot runs without any restrictions. Talking to a bot has no barriers.
Combining this with an unpredictable user behavior, it becomes utmost difficult to verify the correctness of conversational AI. Training data and test sets are infinitely large. In fact, quantity plays a major role in quality assurance for bots, but makes it impossible to test manually.
The main questions to answer are "Why are bots failing?", "What and how should you test?" and of course "How to automate?".
We will showcase the setup of a test automation pipeline for a Rasa based chatbot to continuously check conversation flows and NLP performance. And we will take it even further by adding full End-to-End testing from API over Web & Mobile to Voice.
Deep friendship with Florian Treml, even after playing in the same rock band for several years :). Starting together the Botium journey in 2018 was just the next logical step. Today we are counting more than 65k users of Botium and can be considered as the first choice for testing conversational AI.
Presented by Botium GmbH Co-founder and CEO Christoph Börner at the 2021 Rasa Summit. https://rasa.com/summit/
#ConversationalAI #NLP #aichatbot
- Learn more about Rasa: [https://rasa.com](https://www.youtube.com/redirect?even...)
- Rasa documentation: [http://rasa.com/docs](https://www.youtube.com/redirect?even...)
- Join the Rasa Community: [https://forum.rasa.com](https://www.youtube.com/redirect?even...)
- Twitter: [https://twitter.com/Rasa_HQ](https://www.youtube.com/redirect?even...)
- Facebook: [https://www.facebook.com/RasaHQ](https://www.youtube.com/redirect?even...)
- Linkedin: [https://www.linkedin.com/company/rasa](https://www.youtube.com/redirect?even...)
We are a bit on the time pressure because I want to show you as much as possible today and just is 25 minutes. So, maybe I'm going to talk about building an end-to-end test automation, pipeline for conversational, AI, meaning, Jefferson virtual assistants and you have a short in one sentence with we are doing at Belgium, here is a movie trying to get the best out of your check. Pulse by you're trying to rise from your quality and your end user satisfaction on to highest level. I
just brought her, to be honest to slides on because I want to show you mainly hands-on experience on. But once lied, I had to add yesterday because I got my email to question. What is the value of testing? Check for 2 minutes on this topic and talk about it? And then I have to say about this, Text to talk about testing strategy and then we will Derek to jump in, right? Some tests. And as I put her on the Playbook slide is, I'm always testing will be killed last minute, no scripting no coding skills required. I mean, everyone in your team
should be able to do this and yeah, finally, we will have to stop. Also, do a CI CD pipeline by way. I'm assuming here that you guys know what did, what is continuous integration? What is continuous delivery. But yeah, the rest you will see. So let's get started. First of all the value of testing and then they would try to you to put the beer like the value chain on to explain this. A told it's a good idea to show the bolts development, lifecycle many more days, the SS USB ports. And if we go here, from the left, to the right with the six phases, starting planning
going to the signs, but untrained test, and deploy. And of course, this is more like and Proceso MX be improving our thoughts permanently. This doesn't end here with deployment. We start again and again and again and what do you see here? Underneath our different needs to not coming up through this possible in life cycle? Are we have on the left side, maybe business, maybe the old guy Center on, you need to analyze using it. We have architect to have to do an MP selection. We are somewhere here in the painting face, take the sciences that need to take care about this
natural language processing about Bill machine learning community. Of course, you worry about testing operations guys about production line during an song. And you might think now that texting or to leculture Value even just test face, but I'm going to surprise you. This is this value. We are heading through this whole box of sharp example. You can go back to the planning phase. We can do stuff, like provide a benchmark test meaning. Lutron your domain specific task against against against Watson. And then compare the results, have
to be honest, or to be fair, all about sex via train or which is an opening date of birth, and you definitely see what is the best option for you. So we can provide an old faces for this whole cycle values. And, of course, mainly in the testing, and in the training bus, what time, close or, when we are in production for the operations, guys, we could do permanent checkbook, analytics and saw also, this was important to me as I got this question yesterday to show you a bit that the value chain of testing on when you are creating box, is more than trust and pray that you're
disgusting face and then the second one is how do you trade the solar testing strategy for your boat down? The ingredients are pretty clear. Just take out all the different types of tests. We combine them together to a holistic approach ultimate is different testing types and integrate them into your CI, CD pipeline into him again and he told me it's with every bills off your training, all over every build off the hall and tan and of just put you on five typically S-Type starting you on the last with a
crazy conversation flow testing, what is about casting your dialogue so this is good for doing smoking. Regression testing on adding domain-specific testing also on the factions. Hi to your doing relent when testing as I'm going to show you life in a few minutes and Antoinette means whatever, you're both operates and no matter if disease, cross-browser is, this is cross-platform is Izzy's Kraus mobile. If your boss is running on mobile devices, is this is more space testing. If this is testing against ivr system, you have to do all this stuff when you can easily achieve. I'm by using
a test automation tool, a culture of very big importance. Of course, the health of your iron gym. Therefore we are offering a new piece for testing, read aloud about statistics and math. What are your top intense? Without the weakest ones ends on. And, of course you all's, I have to do. I'm talking about performance testing, loading, stress-testing, security testing gdpr has a very big thing in Europe and song to make sure that your body is really responds. It's also under high loads. And finally, as mentioned before you have to make sure what's going on in production in
it at the Centre of some kind of playground so we can add some stuff you that we want to do. If you remember. Our soul testing strategy is the smoke testing based on your possession Mogul. Meaning that thing comes out. And therefore I just go here to our start menu and select regression testing in our quick start in these old project a new name. Arab Smoke, maybe, I'm in. The next thing is to choose the technology we are testing again. Soon as you can see, we have a older Blockbuster
Technologies here and all the smaller ones and the older Otis Day. And the only thing I need to do now is I need an HP DSN point of Sarah and cheese hosted on our email service to call T-Mobile. So we just need a point of Sarah. Antoinette is. Okay, thank you. Kate North going to tell you this one and then we can already checked the connectivity by saying hello to Sarah and as we can feed and Sarah reply. Hi there. My name is Sarah. I'm copying this reply, because this will be our first test case, and we are older than to fight the terror realized. It's a greeting in Thailand, and you
also get suspect. That was the first time we are connected to Sarah, and the next step will be. We are doing some basic helloworld test. And therefore, I'm choosing his best clothes designer and I said introduction everything. Here can be done without coding skills. So I'm going to send. Hi to Sarah and we are expecting. Got back. Hi there. My name is Sarah and that's already that I hit next in the last time I just stay and I will see all to the integration. Be safe to Tesco tract. Go back to the test to the dashboard and
run the test for the first time and that's already. It is used. By the way those tests here going to know. We were lucky. I'm so Sarah replied with hi there. My name is Sarah and everything is green. And why was I expecting her to fail? Well, this is not a static responds to we have Define. I'm Sarah is not always replying with your with same sentence. I'm so we have to take care of the name between phones and therefore, I will tweet this test. We have created the bed. So therefore I go to all visual test case a detour. And instead of the
static response, I will go here for something that makes one's by defining has Eric Reid. Apparently stand, I've checked up front with Sarah will reply, it's either. Hi, my name is Aaron. Or it will be. Hi there. My name is Sarah. Ford Ford auction. She will use his iPod Sarah and then we'll save meaning, that those tests will always pass. We just have to go back to our station and replace here to start to Creep by with these other friends list and that it safe to whole thing. And if we go back and start the test station, then Sarah, this
time might already be different. So this is how you compose. I'm very first very easy test. You already have a result or that you can see it differently. So, what about continuous? Integration and delivery. The cool thing is, if we go here to our test project and to the Builder integration, we can see this walking boxes automatically generated for a sweat hose mean that we can take them in Thursday integrate them into our pipeline now. So we have to excuse naesha the best. What food do you like, Amanda Cole, do we can even create.
I'm all alone. Come on to your based on this description. I would just use this girl here for the moment and try to trick of the test for the pipeline in Jenkins. So, tell me to Jenkins and this is pretty easy. By the way, I'm only part-time orchestration to lose. No matter if you use your engines or artificial pipeline services will remember seeing or whatever old is Tuscon Boulevard. Council likes going to make you freestyle project idea, skip the description for now, we don't need to ask his management because bottom
box is always fetching on the latest test versions from from people. Don't have to take care of here with what we need is a Buick Regal. Yeah. Usually I would say no bill to smoke test and execute this test after the training model was changed. So Training Day to order, something like this. But as I don't have Sarah or I will say goodnight to build and test or let's say, Let's test everyday, at 5:30 in the morning and therefore, you can just use your
own expression. So we go for minutes or hours every day, every month of the year, and I want to run the test only from Monday till Friday, then I say you want to 5 everyday at 5:30 in the morning and that's already, it's for the trigger. The only thing we have to add this. Now, if you'll step and therefore we want to ask you to show. Come on TV. I'm a safe. Apply safe. And let's say, you know, it's 5:30 in the morning, then if you will be treated by Champions, we can go into this build have Okinawa console and see how to test have been executed
finished with success and we even have you a link to the result. So from absolutely zero to connecting to your bored. During the first helloworld contain a conversation base testing, integrating it in your hole pipeline. Yeah, it took us more less just a few minutes and that's pretty cool. I'm coming from test automation, for mobile apps and websites and so on and taking back in those chain. And those days, when we started telling you, I have to create wet dry, vac, and sew and design. Oldest. Then this thing that we appeal to your is effing
freezing, pretty quick to do you, to show you and texting Auntie, Laura is using before Sarah's able to talk. So I decided to show you a voicemail Staffing. Same principle on, we go to start menu, start our quick start Wizard and old maybe Sarah and friend. And no, I don't have to enter credentials again because we already collected before to Sarah. So I just use the Chuck box just to be sure. I say hello to her again. Yes, she's there. And now, the difference as you can
see, I can also play the answers. You gave her cell so we can tile and she said, we have to make it a bit more interesting and show your other options how to compose test. I will go to our lunch at recorder. Because these may be the easiest way of composing. You tasks can be done by everyone in your team. Time. Will it start right now for us? After we save the test project? It opens a lot of connection to Sarah. So, therefore, I can talk now to her in this
upset and say hi, and she will reply. And the cool thing is, at the end of our conversation, we can just save this whole thing as a new tested and we connect because I promised you we do end to end and we talked to her meaning. I'm not going to chat to her text place and really going to talk to her. And one thing she can answer is about to get started. Before I wasn't there for, I could just be coded descendants by hitting here to micro how to get started with rasa. And as you can see, we have my old people here
and I can send this to Sarah and she will protect the writing talent and also. Easy as that, or what you can do is, for example. Can you tell me about the Enterprise Edition? Signing this to her. She will start to tell us about Jaws Enterprise and so on. So. You could really have here. I didn't know I want our conversation with your checkbook, we hold everything and then at the end, save the whole thing as a new test case Obits covid-19 testing in this case,
it's okay and that's it. Also, if we go to back to go check them out for Sarah here when we see Sarah and Trent, I can run the tests and they will run at 2 and now I'm takes a bit longer before we were testing directly against the Raza interview. Now we are testing Aunt wind using voice over here. In a few seconds we have resolved and on our result up you can see. And you can also hear if you let this terrorist that, we were signing all the facts and Sara.
Ansel Ansel, and the cool thing here is going back to our test project same thing as before bottom box, traited out of the box for a swipe up and we just need to integrate it into our pipeline. So I called it is called, and again go back to my Champions. I'm going back to the trinkets project and figure it again and just add and how to build stuff that's when this case. I really want everything at 5:40 in the morning to run first smoke test. And if they smoked us. Let's run. Is your
staff? Just add it to you say, apply it safe, and let you mention it. Again, 5:30 in the morning, we start to build Go inside to build tactical sold out food and we can see test on pending. Meaning they're running right now, but in a few seconds, finished the class. We have the results I want. Again, I can click on the results and I see well test her running and tranquil space. Easiest that, for those of you who are new to this whole damned pulse can come to git. De Grace nail salon on what this means now is that you can execute this
test 100 times a day. So we see a big project with 50 developers. I'm working on the jackpot and so long, they are producing hundreds. Commit sedate, meaning that a hundred times. We run a smoke test. We are running regression test. We are running and 20. Antoine are so, you are the only ones to afford to write you a test or chimp all seem to see executed for you. To be crazy enough to try this. 25 minutes to show you also the other stuff, like example, security testing. And therefore I'm going to go back to how we
could also do crappie, start again, talk to you and you screw the test but the time of it, I will now just take all the smoke test that we have time before. So I'm Sarah, smoke is all approach text. I will go to the settings and I will turn on here to switch on. This is just a small snitch, but it has a deep impact I, and I will know security. Testing him safe. And if you believe it or not, now we have been able to continue security testing through our entire potluck. I'm so if I go back to my dashboard and start smoke test once again, then you might not
see immediately a big difference. So we will get in a few. Get the results again. So very simple conversation with send, hi to Sarah, and Sarah. I would live as expected, but if I go now to the security scan top, you can see that a lot of stuff was happening in the back. Drop down. So we are doing. The testing is done via putting a proxy in between Belgium and your organization of the eye. In this case, Sarah and all communication that goes to Sarah, and everything that comes back goes for the security proxy and he's tested for round about three hundred
extra vulnerabilities out there. And if you imagine that this very short conversation already produces is 9 errors that. I think it's worth it. It's time to enable continued security testing what we're doing. Here is the first of all I'm telling you what's the problem? I don't know. Incomplete or no cash control and problem if he had a sack and we are going even further and telling you about the possible solution. Yeah. So what you can do to fix this problem and in most of the cases, we are even able to show you here at Lake. More information on what the
problem is, and how to fix it. And as you can see where it's going to be relying on the Ubers, this is Moe. Lester driving behind security testing. And once again on go back to the test project filter, integration coffee or water poop. And every time this test get executed, you also have continued security testing in place and the cool thing is in your create, you a team? You don't need to do any testing. Ologist are all the football. I want to show you. I know, we're moving pretty fast, but I want to really show you as
much as possible. Let's do a quick performance test at least, use a requirement that he's coming up more and more of our customers to really, I'm simulating of thousands of parallel uses in to see how about Subway heading in there for like, create a project. And once again, we just take our Sara Raza. Folks are just to be sure. Will let see if she's there. Yes she is. Then next up is a part from composing you test with visual added to a recorder or in the folder that is included. We
can also select you're out of the box tester and are now just doing. You have created before for Sarah until this very easy communication. And at the end of the project, I think I say no and I will stress and load testing. And if we go back to the first project we have here, at the foremast, Esther and on this job, I can say no I want to do those tests thing. I want to do stress testing or there's even an advanced mode if I want to do a bit more on going for 2:00 testing, you have now for parameters
to choose and we are not limiting your anything else. So you can say I want to dance the whole night. I want to test 10 minutes on to something that is very to be pleased with. Our customers are testing for 5 minutes, Berry high low will eat the peach or something. I will do it. Now for 1 minute to not waste too much time on that we have a test at multiplicate. They're here. This is multi playing the conversations in the test that we have to remember. We use a smoke test at with Define before, which is just saying hi and it's hurting for the reply of Sarah. So if you smell a thing
here, I don't know what all do. We need 2, * 2 to the station to be to have a folk. You always and finally be fine and acceptable rate of a small life, your exit. So, we both can you reply to more than 6% of all uncommon stations, we should ends to performance test. Otherwise they might long, run long and by the way down here you can always see the it. Act of changing your parameters. I also with a change you spell, you related to Siri and, you know, 500,000 users at this. And
by the way, we are notably missing hear anything. The limiting factor is usually the server, where are you hosting? You hook your box or the AI on the other side. So, finally, we just had started before and testing the text right now. Once again, easiest. And I promise before without any scripting knowledge or holding all the Chesapeake, now, if you have time till we see the first results of the first seconds, we see already that we have kept up with Ponce time. Measured the count of possessing conversations is already going down a bit of the few seconds
but we have no face combo yet. So this is good. And the processing delay is also changing a bit and yeah, the one minute this test Finished and we will have the results of testing and maybe order to mention as you have seen before, I'm all the stress testing. The difference is that in the load testing. William Wallace firing a constant load against the check out to see the behavior over time versus and stress. Testing be increasing. This load in stops on the stress. That thing is very good to find the boundary of the chest, usually used car with low value. Like I don't
know, hundred and we can walk to wherever 10,000. And at some point you will see that your check was kind of dog do anymore. This is almost as thing. And we have a result of Adventure before stress-testing won't look that much cheaper and it's just that that we are praising the Lord prominently and then I'm coming finally to the NOP testing in this is usually taking, most of the time, they will have put it at the end but over there We have to go leave to make test automation fast and easy and therefore didn't empty. Testing always. So we
create a new test project. We call it to you once again just that like Sarah here and then very good auction. That is pre-selected here is for the first and then pee test you or do we need to use our conversation mode. Download the so what happens here is the Cotton Bowl, download the conversation mode and out of this training daughter. Generate also, test starter, and this is a very good initial step for this test from the finish. This training, all of Sara's, pretty big, so it will take some time and therefore I ordered it is in
the afternoon and can show you what is the outcome. So this is typical outcome of an NRP test run. And if I give you a sneak peek, here, you can see, the stuff is different. So we have a lot of Statistics atmosphere showing you. What is Wellington? Funky dance list. What are you busy? Can we just intense is the confidence of Europe to rent. This world are the distributions. We have suggestions here how to fix Top Gear full of with yoga training program us. And we have mismatched probability risk included. All of them mixed and alternative intend
to get a full confusion Matrix with the lintons your boss knows. And that means and testing red always is bad. So if we hit this one, we see how old is actor, Larenz here on the wrong in temples predicted, most probably due to low confidence. In what we are doing is giving you always hold inside. So you can click on to the stator and get under. That is behind even stuff. Like how many grams or grams? We are showing here over at terrence's, dad belonged winning times and the data scientist, and should be able to interpret this cross and TV, appearances are too close to
each other in Saint, Paul to close. That's what this is what we're doing with this. And I'll be performers, testing and same thing year, as I have shown you before, you just take the rap for this and a few testing project, you added to your CI, CD Pipeline and every time you want to you to find it, every time you do a commit or whatever you want to have sex cutest Aston,
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.