Bug: Sitecore Makes Duplicate Requests to Solr on Item Creation

Vasiliy Fomichev

In How To, Sitecore Fix Posted

Sitecore with Solr is very powerful setup; however, it is not the kind of animal you can simply release into the wild and watch it thrive. Solr needs close monitoring, especially in the early days, as it can become the reason of some strange website issues related to performance. This blog describes an issue I’ve discovered with Sitecore 7.2 rev. 140526 making duplicate calls to Solr on item creation.

 

Sitecore Makes Duplicate Calls to Solr

Starting with version 7 Sitecore has become more search-dependent, thus, search scalability has become one of the very important factors. Meet Solr – scalability master of the open source search world. Although, it is very easy to setup Solr, check out my Solr Production-Ready Setup walkthrough article, it requires some tuning to work smoothly.  Solr works like a charm “out of the box” with small data sets, however, when we start getting into the territory of millions of records, performance becomes more of a factor.

During one of my recent projects I was involved in investigating Solr performance issues. The server was running out of memory, however, due to the internal client politics a simple hardware upgrade (horizontal or vertical) would have take a long time, so we started looking to other ways of optimizing the search server’s performance. We have optimized the Solr and Sitecore search configuration, refactored the code to reduce the use of search (where possible without affecting performance), and resorted to incremental search index updates, sacrificing the so-taken-for-granted sync strategy.

While looking in all corners for anything that could help us improve the search speed, the networking engineer side of me wondered, if there was anything interesting happening on the network (at one point in time I’ve received a CCNA  & Network+ certifications, seems like a different lifetime now.) The available network bandwidth seemed to be more than enough, and Sitecore requests were getting to the Solr server in timely manner, however, I then noticed an unusually high number of calls from Sitecore to Solr with each item creation.

Next up was Fiddler. The tried and true tool that keeps saving me hours of guess work! If you have not used Fiddler, I highly recommend this tool for debugging network requests. It comes especially handy in situations like this, where we want to investigate HTTP communication happening between two applications.

Due to access restrictions on the client’s network and servers, I was not able to troubleshoot it directly in the environment, so I replicated the setup locally, with Sitecore and Solr running on my development machine. There was one small problem I had to address in order to be able to see traffic between Sitecore and Solr. Fiddler runs under the user that launches the application (in this case it was my Windows account on my laptop), however, Sitecore App Pool runs under a different user (likely Network Service or ApplicationPoolIdentity) prohibiting the sniffer from seeing that traffic. The easiest way I could think of to overcome this problem was to setup Fiddler as a reverse proxy sitting between my Sitecore and Solr (setup steps are at the end of the article).

Once I had access to seeing exact communication happening between Sitecore and Solr, I noticed that item creation actions resulted in four requests, two of which were dupplicates of the other two. Here is an example duplicate request record from Fiddler (notice the second two requests are duplicateds of the first two) –

 

 
POST http://localhost:8888/solr/sitecore_cancer_master/update?version=2.2 HTTP/1.1
Content-Type: text/xml; charset=utf-8
Host: localhost:8888
Content-Length: 3270
Expect: 100-continue
Accept-Encoding: gzip, deflate

<add><doc><field name="_indexname">sitecore_master_index</field>...<field name="_database">master</field></doc></add>
HTTP/1.1 200 OK
Content-Type: application/xml; charset=UTF-8
Transfer-Encoding: chunked

POST http://localhost:8888/solr/sitecore/update?version=2.2 HTTP/1.1
Content-Type: text/xml; charset=utf-8
Host: localhost:8888
Content-Length: 10
Expect: 100-continue
Accept-Encoding: gzip, deflate

<commit />

POST http://localhost:8888/solr/sitecore_cancer_master/update?version=2.2 HTTP/1.1
Content-Type: text/xml; charset=utf-8
Host: localhost:8888
Content-Length: 3270
Expect: 100-continue
Accept-Encoding: gzip, deflate

<add><doc><field name="_indexname">sitecore_master_index</field>...<field name="_database">master</field></doc></add>
HTTP/1.1 200 OK
Content-Type: application/xml; charset=UTF-8
Transfer-Encoding: chunked

POST http://localhost:8888/solr/sitecore/update?version=2.2 HTTP/1.1
Content-Type: text/xml; charset=utf-8
Host: localhost:8888
Content-Length: 10
Expect: 100-continue
Accept-Encoding: gzip, deflate

<commit />

 

This discovery meant that the Solr server was taking double the load from the client during items creationg. At first, this doesn’t seem like a huge problem, until we get into mass data imports, which was something we were doing on that project.

 

How to Fix the Issue with Duplicate Item Action Requests to Solr

Immediately after discovering the issue, I went ahead and reported it to Sitecore Support. It took some time for the support folks to figure it out, however, as always, they came through and created a patch. Sitecore support saves the day yet again!

This patch was created for Sitecore 7.2 rev 140526, so if you happen to experience the same issue and applying the below fix does not resolve it, I would suggest to break out your reflector of choice and take a peak at the logic inside the patch library.

  1. Extract and place the attached Sitecore.Support.432821 library in your bin folder.
  2. Change your custom “sitecore_master_index” and other indexes that point to master database, so that they point to the custom class:

<index id=” sitecore _master_index” type=”Sitecore.Support.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.Support.432821″>

Now all item creation events should only cause a single set of requests to each core pointing to the master database and mass actions like data imports should not put excessive load on the search server! Voila! Performance during data imports improved significantly!

 

How to Setup Fiddler to Record Traffic between Sitecore and Solr

To setup Fiddler to act as a proxy between Sitecore and Solr follow these steps:

  1. Ensure Sitecore and Solr are up and working properly
  2. Launch Fiddler
  3. Navigate to Tools>Fiddler Options
  4. Click on the Connections tab and take a note of the Fiddler port (“Fiddler listens on port” field)
  5. Ensure that “Allow remote computers to connect” checkbox is checked
  6. Close the Fiddler options window
  7. In Fiddler navigate to Rules>Customize Rules (Fiddler ScriptEditor will open in a new window)
  8. Scroll down to the OnBeforeRequest handler (static function OnBeforeRequest(oSession: Session))
  9. Add the following forward rule (I used localhost, as all of the systems were running on my local machine, change them, if Solr or Fiddler is running on remote instances)
 
if (oSession.host.toLowerCase() == "localhost:{fiddler port}") oSession.host = "localhost:{solr port}";
  1. Save the rules file
  2. Update the Solr.ServiceBaseAddress setting to use the fiddler address and port from step 4 (in my case since Solr and Fiddler were running on the localhost, I simply updated the port)

Now Sitecore requests to Solr would go to Fiddler, which will then forward it to Solr. This way we are letting Fiddler listen in on all communication between Sitecore and Solr.

 

0 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.