Humans started communicating with each other in words about a hundred thousand years ago. Now we’re communicating with machines almost as effectively as we do with other people—and getting better at it every day.
The smart assistants that you see nowadays, like Alexa, Google Home, Siri, as well as other smart home devices, can respond to your commands just like humans do. All you have to do is ask a question, and they volunteer the answer in moments. The main concept behind this whole interaction mechanism is natural language processing, or NLP.
NLP is a branch of machine learning that combines rule-based text searches with mathematics and statistical modeling, and enables computers to understand human language. It’s what allows computers to process the text query efficiently and respond with a relevant result. Though machine-to-human interaction is one use of NLP, it also makes machine-to-machine interaction easy, as well.
Understanding natural language processing
Natural language processing is a complex process with several key steps.
Reading text data
Processing can happen only when data is read by the machines, and regardless of the use case, NLP requires a huge amount of data. Tech giants like Google, Amazon, and Microsoft collect data through their services, but for most others, data is collected through customer surveys, collecting data from websites like Reddit and Twitter, and using publicly available datasets from websites like kaggle and Google Dataset Search.
This data is a combination of text and numerical values, and is usually stored in databases, CSV files, and JSON files. Depending on the programming language, there are several library options available that can help you read the data easily. For example, Python provides libraries like pandas and Dask to read CSV files, and R language has the utils package for the same.
In this step, the text that was read in the previous step is processed. Human-language data, or high-level data, is very elaborate. While it makes sense for humans, for computers, it needs to be simplified, leaving only the words that are most important for the context of the sentence. Text processing can be a complex, multistep process, though exactly what steps are included will vary by project. Often, it begins by removing the stop words, which are common English words like as, will, do, the, and similar words. Regular expressions may be used to manipulate strings, and strings may be converted to vector format, as computers only understand numbers. Text processing is usually done in Python language, using a library called NLTK.
Once you are done reading and preprocessing, statistical and mathematical models are used to make sense of the preprocessed data. This is usually done with a branch of machine learning called deep learning, which uses multiple neural-network based approaches to give best-in-class results. Search.io uses a technology called Neuralsearch, which will be discussed in more detail later.
Natural language processing use cases
The rest of this article will take a closer look at natural language search, including what it is, how it’s used, and how it uses NLP as its core component.
Natural language search
Today, any information that you might want to know is easily found through a search engine. Search engines provide an ever-growing number of features, like voice search and automatic query completion. If you’ve ever wondered how a machine that was originally built for calculation is now able to answer your questions, a big part of the answer is language processing. The process through which machines process everyday searches and respond to your questions as if you were talking to a human is known as natural language search.
When search engines were first built, they were driven primarily by text matching. For example, if you searched “when did Spain win their first FIFA”, the stop words are removed, leaving only keywords like Spain, win, first, and FIFA. These remaining words are then matched against entries in database tables, and if a match is found, the result is returned to the user. But this method doesn’t work until the computer has received the entire search query. As more data about how people use search has accumulated, companies have come up with new ways to improve their search results. The result of this is NLP-based search engines that can sometimes “guess” your query before you’ve even finished typing it.
Use cases of NLP in search
Now that you understand what natural language processing and natural language search are, let’s investigate some of the other use cases of NLP in searches.
You may have wondered before how a search for a product, like a mobile phone with a specific configuration, results in the options that are presented to you on e-commerce websites. The answer is NLP. When you type the product name and specification, NLP does its work and returns the best results possible for your search.
For example, if you search “Pixel 5 unlocked,” a traditional keyword search will return results for the Pixel 5, but the results may not be focused enough that the listings returned would be for unlocked devices, or even all for phones. When NLS is used, the intention of your search is understood, so the results presented to you would be for unlocked Pixel 5 phones.
Understanding search intent
Understanding the intent of a search is what makes computers and smart devices “smart.” Search intent relates to what the user is expecting when they run the search. For example, if you search for “best phone 2022”, a search engine will understand that you’re looking for phone reviews and comparisons, not for a site where you can buy a phone. This is the beauty of NLP.
You can read more about how NLP is used to understand search intent here.
You’ve probably encountered devices that respond to voice commands. If you ask them to schedule a meeting, they do it for you; if you ask them to play music, they start a playlist; if you ask them if it’s raining, they tell you if you need to grab your umbrella. The core of all these devices is NLP-based search techniques. Similarly, many applications on your phone and computer alike will offer a voice search option. When you use this, your voice is converted to text, the search is run, and depending on your device settings, the search result is either displayed on the screen or converted to speech, which allows the device to tell you the result.
Understanding NLP in natural language search
Now let’s look at the underlying workings of the NLS systems that use NLP as their core component. When you search for something, your query is recorded and sent to the search system. If it’s a voice query, it’s converted to text using speech-to-text algorithms. Then, text and voice queries alike are passed to the NLP system, which performs a combination of steps below. Which steps are used will depend on the data source—most projects won’t use all of these steps.
- Parsing: In the parsing process, the sentence is broken down into individual words, which makes it easier for the system to establish the relationship among words.
- Stemming: Stemming is the process of converting the words into their base forms by removing prefixes and suffixes. This reduces resource usage and improves computing capability. For example, “change” and “changing” are converted to “chang”.
- Lemmatization: Lemmatization is a different process of bringing the words into their base form. It does so by considering the context and morphological basis of each word. For example, “changed” is converted to “change”. An important thing to note is that both stemming and lemmatization are used to reduce words to their original formats, so most projects do one or the other. To better understand the difference between the two, you can take a look at this article.
- Named Entity Recognition: Named entity recognition, or NER, is the process of identifying and categorizing the key information from the text into predefined categories, such as personal names, locations, organizations, and events.
- Stop Words Removal: At this stage, common words that don’t add much meaning to the sentence are removed.
- Modeling: A machine-learning model and a traditional text-match mechanism are used together to get the results.
Once the NLP analysis is done and the intention behind the search is identified, databases are searched to find the results of the user’s query. The result is then returned to the user, either as a text result or through the use of a text-to-speech engine. Read more about query understanding.
How is NLP used in search?
Natural language processing is frequently used to improve the search experience.
- Spelling correction: Have you ever noticed that even when you make a typo, your search results are frequently still accurate? It’s not magic, but rather NLP techniques being used to make sense of incorrect spellings and return the right results.
- Brand name rewrites: This feature is closely related to spelling correction. Many brands and other organizations spell or style their names in specific ways—using a common word, but spelling it with a K instead of a C, for example. This feature automatically converts incorrect brand names to the correct ones.
- Properly redirecting searches: This feature is solely dependent on the search intent. Once intent has been identified, your search results may look different, with search engines returning results like images, maps, products, and websites. For example, when you search “Austin to New York”, the search engine is able to predict that you’re probably looking for information about traveling between those locations. When your search results are returned, you get information about flights, the distance between these cities, and a map with the route by car highlighted.
- Live, effective search preview: Some apps and web browsers now provide live search previews, showing products that match the typed search as you’re typing it, which allows users to get a better idea of what results their search is likely to return. These search previews don’t just show the products, but also provide other important details, such as pricing and color variations, helping users to filter their results more effectively.
- Voice search: This is a very common use of NLP, and works well with many languages. You’ve seen how this process works earlier in the article.
- Ranking products and popularity: This feature allows search engines to display results based on their rank or popularity among other users.
How is Search.io improving NLS?
While natural language processing can do a lot to improve the search experience, there are still some areas where it doesn’t work very well. This is especially true when it comes to things like synonyms. Many tools rely on the wordnet model, where different words are grouped into sets called synsets, which are collections of synonymous or semantically linked words. These sets often need to be manually created, which takes a huge amount of time, and requires constant updating and maintenance. Additionally, this approach often fails when it comes to multi-word concepts. For example, if someone is looking for ”dress shoes”, it’s obvious to a person that the expected result of the query is footwear, but for a machine, it’s not that simple.
Another approach to handling this is using a straightforward binary match for the words. This fails not just when it comes to handling synonyms, but also if a user misspells part of the query.
To resolve these issues, Search.io has developed a new semantic search solution called Neuralsearch. Neuralsearch largely eliminates the need to spend time creating lengthy lists of synonyms, and performs well in areas that other search algorithms tend to do poorly with: longer queries, queries where words have multiple meanings, queries that require more context than a few keywords, and other complex human-language queries. It’s also very fast, and doesn’t require any specialized hardware to offer instant responses that address the underlying meaning of the search, not just a handful of keywords.
You can learn more about Neuralsearch in Search.io's whitepaper Neural Hashing: The Future of Search. Although Search.io isn’t the only company using this neural network-based approach, their solution is faster than other vector-based approaches and scalable without the large clusters.
In this article, you’ve learned what natural language processing is, and how it’s used in the real world. You’ve also been introduced to natural language search, which is when NLP is used in the search processes. Although the natural language search process still has room for improvement, it’s getting better all the time, and searches get a little smarter every day.
If you’re looking to add a smarter search to your site, look at Search.io, a customized, AI-powered search engine that helps users find what they want, when they want it. It’s augmented with cutting-edge AI technologies to help your users find relevant content, and offers live search preview, spelling correction, product ranking, and many more features. Give Search.io a try, and find out for yourself how effective your search bar can be.
About the author
Gourav Singh Bais is an Applied Machine Learning Engineer at ValueMomentum Inc. He works with Machine Learning/Deep learning pipelines, retraining systems, and Data Science prototypes.