In order to understand the need for Natural Language Processing (NLP), let’s go through a scenario.
Let’s say, a lot of companies, especially in the heavy industry, want to improve the company’s operations and they want to keep their employees safe. However, it is never easy as machinery breaks and someone has to go fix it. Now companies know that by better analyzing their data they can improve their operations thereby saving money and keeping their employees safer.
Sadly, the challenge is that only 20% of the data they store (for this analysis) is in a structured format – including spreadsheets or databases that’s data that computers usually use. The other 80% of it is in the form of text like repair manuals, injury reports, and notes shouted down by technicians. This information is extremely valuable but due to its size and structure, it has largely been invisible to analytics teams.
Imagine you are searching a database of injury reports you want to find injuries related to the lower body, what do you search? Analysis from one large utility company showed that lower body injuries returned only a small number of results far fewer than existed in reality. That’s because their search tool was looking for the exact keywords ‘lower body injuries’. But when the analysts use more specific search terms like foot injuries that returned many more results where the foot was used in the context of distance.
So while they had more results than searching for just lower body injuries they weren’t the results they wanted. This was because a foot can be both a body part and a unit of measurement. And, while humans can determine context and know the difference, until recently computers were largely stumped. Thanks to this field of AI called Natural Language Processing (NLP), computers can now analyze and understand textual data.
So What is Natural Language Processing?
Natural Language Processing is as its name implies the processing by computers of natural language. Natural language is a term that means any human-generated language and, so in other words, not a computer language like FORTRAN or Java. And also not just numbers and other mathematical languages so it’s a natural language.
With that, any system that is taking natural language as input and doing some processing on it no matter what it is doing is a natural language processing system.
How Does Natural Language Processing Work?
In case you are curious, here is how natural language processing works at a high level.
NLP algorithms can’t read text like we do but they can look for patterns and they find these patterns by turning huge amounts of text into matrices. When analyzing text, the algorithm might first remove words that don’t really offer as much value stuff like a, the, is, and are. These are called stop words, then the algorithm might split the sentences into groups of words. They count how many times each group of words appears in each document and how many documents have that group of words out of all the documents being analyzed.
Without knowing anything at all about the text, the algorithm can then tell how often a given word or phrase appears in a given document. It’ll figure out how many documents contain that phrase, out of all of the documents, so tokens that appear lots of times in lots of documents may not mean much. On the other hand, tokens that appear frequently in only a few documents tell us that something is going on.