Query Relaxation And Scoping As Part Of Semantic Search
The right search query is a Goldilocks-style effort: Not too specific that you get no results, and not too broad that you get too many.
Semantic search, meanwhile, is all about understanding what searchers throw into a search box.
In other words, with semantic search, we meet searchers where they are instead of requiring them to meet us where we are.
Enter query relaxation and query scoping.
Search engines get searchers to the right content right away through techniques like synonyms, query word removal, and query scoping.
We avoid missing out on relevant information that wouldn’t otherwise appear, and we leave out information that isn’t relevant.
Query relaxation and scoping are tied very closely with the concept of precision and recall.
Precision measures whether the returned results are relevant, and recall is whether relevant results are returned.
One way to increase recall specifically is through query expansion.
Query expansion is all about expanding what the query will match with the hope of having better results.
The main reason a search engine might apply query expansion is due to some indication that the “base” search results without query expansion would not be satisfactory for the searcher.
In this series, we have already seen some ways to expand queries.
Typo tolerance, plural ignoring, and stemming and lemmatization are all ways to increase the recall of searches.
We’ve already seen those query expansion methods among the bedrocks of search, but other query expansion methods are also just as fundamental.
An article in Search Engine Journal from 2008 covers how Google performs query expansion!
The article discusses not just stemming and typo tolerance but also translations, word removals, and synonyms.
Synonyms And Alternatives
There’s a reason George Orwell introduced Newspeak in his novel 1984 and why it resonated in a story about life utterly controlled to the point of blandness.
Linguistic richness is driven by the ability to say the same thing, or nearly the same thing, with different words and phrases. “Great” can be “awesome,” and “low-cost” is a near neighbor to “cheap.”
Meanwhile, these different words can help us more precisely refer to items similar in all but the smallest ways.
These differences are sometimes so small that this precision instead breeds confusion and less likely to find what we want.
A customer wanting a rocking chair may not know whether to search for “rockers,” “rocking chairs,” or simply “chairs.”
This is where synonyms and alternatives provide value.
They help us expand recall in search results.
Synonyms and alternatives are similar, but they are not the same.
(You could say that they are not synonyms.)
Synonyms refer to two words or phrases that mean the same thing.
Alternatives instead refer to similar words or phrases but have some degrees of difference.
Often, synonyms make their way into a search engine through synonym lists.
These lists can come from predefined lists, such as general ecommerce terms.
The problem with predefined lists is that synonyms for one company’s search engine won’t necessarily work for another.
Quick: What’s a console? You may immediately think of video games, but someone else might think of a car or music.
For that reason, many synonym lists are created in-house.
At the beginning of a search implementation process, internal subject matter experts think of all of the words that could be synonyms for other words and add them to the search engine configuration.
(This, in reality, is often an idealized view of what happens. Often the person creating the synonym list is not a subject matter expert, but instead, the person implementing the search engine.)
Generally, this initial list will provide a good starting point, but there are sure to be missing synonyms.
The only real way to discover which terms your searchers will use is to let them search.
Using Analytics To Discover Synonyms
You’ll see very quickly in your analytics queries that could use new synonyms.
These queries are returning zero results and are a sign that searchers are looking for something they can’t find.
Now, not all of these queries will give you a new synonym.
Sometimes, searchers are looking for items that you just don’t have.
Nonetheless, you’ll see queries where you think immediately, “oh, we have that one,” and “I didn’t know people asked for it like that.”
There will also be times when a query returns results but not what the searcher wants.
These queries can also give you ideas for synonyms if you track “search refinements.”
Search refinements represent when searchers search and then search again.
This implies that the searchers didn’t find what they wanted the first time and tried again to find something better.
Someone searching for “Dell laptop” and following it up with “Dell notebook” is saying that “laptop” and “notebook” are related, but the search results for “laptop” were insufficient.
While there’s nothing wrong with looking for those trends in your analytics manually (it can be a good activity to slowly ease into the work week), you’ll be a lot more productive if you have a system that proactively sources them for you.
Some systems may even apply synonyms on your behalf, but this isn’t always helpful.
A human can spot refinements that don’t show valid synonyms or may see that the system is suggesting an incorrect type of synonym.
Types Of Synonyms
That’s right: There are different types of synonyms.
This concept may seem strange at first, but it’s probably not far from how most people think of them.
“Two-way” is the first type of synonym. These synonyms are direct replacements for each other.
“Small” and “mini” are two-way synonyms of each other.
The words don’t need to be perfect replacements but can be close enough that people might use one for the other.
For example, “rope” and “string” don’t describe the same thing, but they are close enough to be worthy two-way synonyms.
It can be useful to think of the query created through the use of synonyms.
If we take a query of “small cheese pizza” and expand that out, you can think of the query now as “(small or mini) and cheese and pizza.”
“One-way” is the next type of synonym.
This type is often used for words that refer to an object that belongs to a larger category.
“PlayStation” is a type of video game “console,” but a “console” is not a type of “PlayStation.”
If you add a one-way synonym to the search configuration, you can have PlayStations show up whenever someone searches for “console.”
Why not a two-way synonym between these two terms?
Because two-way synonyms are transitive.
If term one and term two are two-way synonyms, and terms two and three are two-way synonyms, then terms one and three are two-way.
In a more direct example, “PlayStation” and “console” and “Xbox” and “console” as two groups of two-way synonyms would mean that “PlayStation” and “Xbox” are synonyms, and searchers would see Playstations when searching for Xboxes, and vice versa.
“Alternative corrections” is the final type.
These are used when the words aren’t precise replacements for each other, and you want the exact match to appear higher than the alternative.
For example, you might say that “pants” are an alternative to “shorts,” but when someone searches the word “shorts,” then all shorts should appear higher than pants generally.
All synonym types, by their nature, expand recall.
However, the hit on precision should be minimal because these synonyms are “pointers” to similar concepts.
You would expect a better search experience for the end user.
Query Word Removal
Sometimes searchers will use a query that doesn’t return anything because the query was too specific or used a word that didn’t exist in any of the records.
Remove one word, or two words, from the query, and perfectly decent results would come back.
This is a great time to use query word removal.
Perhaps the most common query word removal step is removing “stop words.”
Stop words are very common words that provide meaning for communication but don’t help with retrieval. Words such as “the” or “an” can remove otherwise good matches.
This is more common in queries oriented toward natural language, such as voice search queries.
An example of this would be searching for “an orange shirt” on a product search engine.
If the search engine searches over the title, color, and category, there might be plenty of records that have “shirt” as a category and “orange” as a color, but none that include the word “an.”
Now, really, does the word “an” provide any useful information here?
No, it doesn’t, and the search engine can safely remove it without losing precision.
Unlike synonyms, you generally do not want to create your own stop word lists, and most search engines have them built-in per language.
However, there are times when you will want to expand on the built-in list, such as if you have an industry term that is so common that it doesn’t provide any value to a query.
Removing Words If No Results
Then there are queries where all of the words bring value but searched together, bring back no results.
Often searchers will be happy with less precise results in exchange for increased recall. In these situations, we want to remove words to put results in front of the user.
There are two main ways to do this: make all query words optional or remove words from the query.
If you make all of the query words optional when there are no results, you assume that records that match more words are more relevant, all else being equal.
An alternative is to remove query words one-by-one until you find matching records or there are no more words left in the query.
You can start by removing the first words or the last words. Last word removal tends to be more common.
Making all of the query words optional and then sorting by the number of matching words is generally the better approach, especially when paired with the removal of stop words.
This is, however, a less ideal approach when precision is important, and you want to show that, indeed, there were no results that matched all of the query words.
One person may be alright with seeing Uniqlo v-neck sweaters for a query of “Gucci v-neck sweaters,” while another sees those results as completely irrelevant.
Of course, another scenario is to know which words are actually providing the most value to the query and mark them as optional.
This is generally not seen in keyword-based search engines, but there have been some search engines that will take a similar approach for stop words.
For example, some search engines have experimented with discounting common words automatically without stop word lists, using inverse document frequency.
As with synonyms, query word removal will expand recall, usually without a hit on precision. Because stop words don’t provide much value to the result, you won’t lose out on good results by not including them.
Similarly, removing words when there are no results has no precision to lessen because there are no results that could be precise.
We’ve primarily looked at situations where a searcher is overly precise and the search engine needs to expand the query to improve recall.
There are, likewise, times when the search engine can understand the user intent, and query scoping can increase precision.
Search expert Daniel Tunkelang calls query scoping “one of the most effective ways to capture query intent.”
He identifies two major steps in query scoping. The first is query tagging, followed by the scoping itself.
Query tagging identifies the parts of a query with the attributes they likely belong to.
For example, “Marcia” will most likely match to a “name” attribute, while “The Brady Bunch” maps to a “show title” attribute.
Query scoping takes this mapping and restricts attribute searching for these query parts.
The search engine doesn’t search “Brady” inside of the “name” attribute or “Marcia” in the “show title” attribute.
This kind of query scoping reduces recall, as we won’t see results that have that text in other attributes.
However, the outcome should be that we have higher precision because we aren’t searching for irrelevant attributes.
We could increase precision even further by filtering results by known attribute values.
This doesn’t even require machine learning, as the search engine can do a simple match between facet values and text in a query.
This reduces recall heavily, so we can also find a nice balance where we instead boost results with matching values rather than filtering.
The boosted results will tend to be the best matching ones because the query-filter match gives you a signal that it is what the searcher wants.
Through your analytics or hands-on experience, if you find that your search is missing user intent and requiring searches to be “just right,” then query expansion and query scoping are two ways to calibrate your precision and recall.
These approaches will let in results that should be there and leave out the ones that shouldn’t.
Featured Image: penguiin/Shutterstock
Credit: Source link