Bug: How to Fix Solr Exact String Matching with Sitecore
Sitecore with Solr is a definite powerhouse solution! Solr’s ease of setup and use combined with Sitecore’s IQueriable make writing search queries a walk in the park. Recently I discovered that doing exact string matching using lambda (==) does not work properly. This article describes the issue and provides a solution to get you on the path to relevant search results!
Sitecore Linq queries make life smooth sailing for us developers, don’t they? Before you had to get to the low-level Lucene.Net manipulation and manually setup all the queries. Now things are much easier! All the confusing query-building stuff is abstracted away and now even my little niece could probably write a search query or two. Unfortunately, with abstraction we have become more reliant on the high-level APIs, and take some features and functionality that happens under the hood for granted, and in some cases may become a bit careless, just because it’s so easy to do.
This is what happened on one of the recent projects, where we had to implement an exact search using a person’s last name. This sounded very straight-forward, so I quickly wrote a simple search query as follows:
searchContext.GetQueryable<Person>().Where(p=>p.FullName == searchQuery).GetResults()
After a quick test it appeared to work properly and I was getting people with the full name I searched for. I was about to wrap that one up and send it on to QA, when one of the queries returned an unexpected result. Let’s say our query was “John Doe”, which returned the right person, but also “John Doe Jr”, and “John Smith”, and “Jack Doe”…well you get the picture. Whoa! At a first glance it looked like Solr executed a .Contains() instead of == for each word of the query.
The first thing I did – I went “by the book” and checked the Sitecore search logs to make sure the query was properly constructed by Sitecore and what I found was – fullname_t:(John Doe). From my Solr experience I knew that exact string matching required double quotes in the query like this: fullname_t:(“John Doe”), with or without parenthesis. However, when I tried manually issuing this query in the Solr Dashboard query page, John Doe Jr showed up along with John Doe, so the double quotes ensured the whole phrase matching, but it was still executing rather a .Contains() for the whole query, rather than ==. At this point it seemed as some adjustments had to be made in Solr.
One of the developers we were working with along-side, Michael, mentioned that he had also found the same issue and already filed a support ticket. In a few days he got back to us saying that support got back with a fix.
How to fix Sitecore IQueriable Exact String Matching
The devil is always in the details. The _name field that is defined in schema.xml generated by Sitecore is set to be of type general_text. This field type is handled a bit differently than that of a string, which is also another field type for hosting text in Solr. It also turns out that in order for Solr do an exact string match, and not a contains, the field type had to be set to string!
Now, this is easy enough for custom computed fields, right? We are just simply setting the returnType attribute to string however, what about the fields in schema.xml generated by Sitecore?
Of course, it’s not a good idea to just go in and change field types on Sitecore generated fields, as we don’t know what effect that would have on the rest of the CMS’s functionality. At this point you may say “well, let’s just create another computed field for, let’s say an item name, and give it a string returnType!” Well this would be a good solution for a custom computed field, but there is a better way for fields that were placed into schema.xml by Sitecore. Fortunately, Solr allows us to simply copy the value from _name field into a new field, let’s call it _nameexact through a couple of lines of configuration in schema.xml!
First, let’s define our new field:
<field name="_nameexact" type="string" indexed="true" stored="true" />
Second, after a long comment that looks something like this:
<!-- copyField commands copy one field to another at the time a document is added to the index. It's used either to index the same field differently, or to add multiple fields to the same field for easier/faster searching. --> <!-- Copy the price into a currency enabled field (default USD) --> <!-- Text fields from SolrCell to search by default in our catch-all field --> <!-- Create a string version of author for faceting --> <!-- Above, multiple source fields are copied to the [text] field. Another way to map multiple source fields to the same destination field is to use the dynamic field syntax. copyField also supports a maxChars to copy setting. --> <!-- <copyField source="*_t" dest="text" maxChars="3000"/> --> <!-- copy name to alphaNameSort, a field designed for sorting by name --> <!-- <copyField source="name" dest="alphaNameSort"/> -->
– let’s add our own copy statement:
<copyField source="_name" dest="_nameexact" />
That’s it! Now in our Person model we just need to change the IndexField attribute of the FullName property to use the “_nameexact” field instead and we should be all set! We could also create a separate property for the exact name, just in case we wanted to later do partial name matching, in which case the original geneal_text type for the FullName field would work better.
Now after making this change, we are no longer getting “John Doe Jr” and only the exact name matches continued to be returned. Thanks to Michael and Sitecore support for figuring that one out!
Hopefully, this post will save many developers hours or days of troubleshooting, and please share this article, as many more developers will thank you. Also remember to comment if this worked for you, and give back by blogging, tweeting, and posting on Sitecore forums about any bugs and fixes you find yourself!