The last few years have seen an explosion in companies adopting artificial intelligence (AI) and machine learning (ML) solutions into their business pipeline. Huge advancements in applications of this technology in the customer experience sector are giving CX teams an edge when it comes to improving customer experience. Natural language processing (NLP) has made it possible to automate tasks we wouldn’t have attempted just 5 years ago.
However, sometimes the utility of these methods can be lost in translation.
At Wootric, we have built a modern feedback management platform that helps companies measure and boost customer happiness. We use NLP to, among other things, surface themes in customer and employee feedback. We do this by using our text analytics model to “tag” each piece of feedback with the content themes it contains. So any one comment might have no tags, one tag, or multiple tags applied to it.
Huge ROI for CX professionals
Machine learning has opened up a whole new avenue of insight into the feedback many companies are already collecting. To date, this data has been untapped because it is virtually impossible for humans to categorize hundreds of feedback comments quickly or cost-effectively.
It is common to collect unstructured feedback from thousands of customers via Net Promoter Score, customer satisfaction or employee engagement surveys. Yet, CX teams aren’t typically utilizing that data to understand the “why” behind the score. “It is easy for teams to get overly focused on the score, but the numbers alone don’t tell you enough to take action,” says David Yin, V.P. of Customer Insights at Ancestry and a Wootric customer, “But extracting insights from text data, especially with the high volume generated by modern survey technologies, has plagued the industry.”
Today many companies still dedicate resources to pouring over feedback, trying to sort and tag and quantify themes in Excel spreadsheets. However, once companies start getting hundreds or thousands of customer comments per month, that quickly gets expensive and untenable.
So, text classification using NLP is a huge benefit to companies that are keen on making customer experience their competitive edge. Companies that get thousands of customer or employee survey feedback comments a month use software like Wootric CXInsight™ to understand what is driving customer/employee happiness (or dissatisfaction). They can then prioritize resources and projects that will really move the needle.
What’s up with accuracy?
Classification using machine learning isn’t perfect (neither is manual categorization, it turns out). The Wootric CXInsight dashboard is completely transparent, giving CX professionals the ability to see and modify tags at the comment level. Occasionally, our customers will see something like this:
This comment “Cost too much time on our end to set up” is about difficulty with onboarding (getting started). Any English-speaking human can see it isn’t about “Price.” Machine learning got this wrong!
When our customers take the time to reclassify a comment like this, it helps our algorithm learn and become more accurate over time.
However, for those unfamiliar with the fundamentals of machine learning, seeing a misclassification can create doubt about the veracity of the model and play into the misconceptions about where its value lies.
Should machine learning models be 100% accurate?
In fact, a big misconception I’ve noticed is that machine learning classification should be close to 100% accurate, otherwise it isn’t useful. As a result, I have found myself frequently answering the question “If the AI got this piece of feedback wrong, why should I trust your product?”
I started to think a more structured answer would benefit both myself as a data scientist, and those asking the question. I wrote a longer version of this article on our engineering blog that shows the results of my time spent diving down the rabbit hole for a formal answer to this problem. So, if you want to dig into the math, head there.
In this post, I’ll share some key context about machine learning that I hope will give any layperson the confidence to see the occasional misclassification as a natural byproduct of the overall benefits of text analytics.
Understanding trade-offs in machine learning accuracy
First some key information about how data scientists think about the accuracy of machine learning algorithms. There are two metrics at play here:
- Precision. This is about false positives, or, in the case of text classification, How many comments were tagged inaccurately?
- Recall. This is about false negatives, or, in case of text classification, How many comments didn’t get tagged that should have been?
There is an inherent tradeoff between these two values. Using customer feedback classification as an example, higher precision minimizes inaccuracy, but results in far fewer comments being classified. This means missing out on the opportunity to classify a lot of comments pretty accurately.
On the other hand, a model that prioritizes recall would tag a much higher percentage of the comments. However, more & more comments would get classified inaccurately like the example above.
So, when it comes to using a machine learning product, you are looking for an application that is skilled at striking a balance between precision and recall. At Wootric, our goal is to provide meaningful insight that helps our customers achieve their business goals.
Here are two examples of high value uses of machine learning that deliver time-saving and insight, respectively–and where allowing for some amount of misclassification makes sense:
How text analytics empower CX professionals
1. Finding comments about a particular topic
Let’s say you are interested in what new features your software users are requesting. Having a text classification system in place will save you time, and keep you from the distraction of looking at tons of unrelated feedback.
For a concrete example, let’s say you want to read at least 100 feature requests from your software users. Now, assume that it takes, on average, 5 seconds to read a user comment. Looking at real life data from one of our clients, we know that feature requests show up in about 6.8% of their overall feedback. So, without text analytics, our customer would have to spend a little over 2 hours and read 1460 comments to find to 100 feature request-related comments!
Now, consider what happens when you use a classification model with a precision of 70% to speed this process up. If our customer only looks at feedback that the model tagged as “Feature Request”, she will see that around 70% of that feedback will actually be feature requests. This means that she would end up reading about 143 comments — and spending only 12 minutes — to find and read 100 comments that are in fact about feature requests.
In summary, using a classification system that still misclassified 30% of feedback saves her 1 hour and 50 minutes of slogging through irrelevant data. Our customer can use that time savings to further analyze data and focus on taking action.
2. Identifying themes in customer feedback and their relative importance
Here is an example Wootric dashboard for company that has given us a couple hundred pieces of feedback to get started with text analytics. In the dashboard image below you can see that the topic “Shipping – Packaging” is the #1 topic that customers are commenting about. Product Quality, Price, Customer Suggestions, etc are other themes that surfaced from classification using machine learning.
Now we know from our discussion about the trade-off between precision and recall that machine learning classification isn’t 100% accurate. Given the relatively large volume of comments on the topic though, we are quite certain that Shipping-Packaging is in fact the #1 thing these customers are talking about.
Because the number of feedback classified is so small for other topics (out of 111 comments, only 15 comments for Product Quality and 13 comments for Price, for example), we have lower confidence in the exact ranking of the rest of the themes shown here.* Nonetheless, out of the gate, the company knows what their customers are talking about and has rough sense of relative importance. Boom.
And, as the volume of feedback we process for this customer increases, our confidence in the ranking the themes will dramatically improve.
Improving CX with text analytics
These are just two “low hanging fruit” examples of the benefits of classifying customer comments using machine learning. Natural language processing is not yet 100% accurate, but it’s the Earth and the Moon better than letting precious customer comments sit idle. With the help of machine learning, CX teams are finally “reading” all of the comments their customers and employees took the time to convey. As a result, they are gaining new insight–and getting smarter about how to improve customer experience.
This is why so many customer experience professionals are adopting the use of text and sentiment analytics. Hopefully, I’ve inspired confidence in text classification even in the face of a few errors!
*In this case we are 99.998% confident that more comments concern “Shipping-Packaging” than “Product Quality.” With only 111 pieces of total feedback, we are 70.35% confident that more comments concern “Product Quality” than “Price.”
Learn more about how to improve CX with text analytics.