Hi there! Today we will understand one of the most interesting and useful aspects of Apache Solr engine. In most real world applications, data availability is a key aspect that organisations around the globe have to account for in order to stay current with the customer’s demands. Solr, in particular, is extremely potent when it comes to data management and data availability. It’s not far fetched to assume that a very special care is taken as to when the data is available after indexing in solr engine.
To facilitate this aspect, solr has a concept called “Near Real Time Search”,or, more popularly referred to as NRT. Essentially, it tells how long after indexing data is available to search on applications. ‘Near’ in NRT is configurable in this regard. Also, it’s one of the main features of solrCloud and is rarely attempted in master-slave configurations.
Now, let’s understand a related concept of “commits”. Document durability and search-ability is controlled by
commits. Commits are either “hard” or “soft” and can be issued by a client , via a REST call or configured to occur automatically in
solrconfig.xml. Typically in NRT applications, hard commits are configured with
openSearcher=false, and soft commits are configured to make documents visible for search.When a commit occurs, various background tasks are initiated, however, these background tasks do not block additional updates to the index nor do they delay the availability of the documents for search.
Where Use Near Real Time (NRT) Search?
Near Real Time (NRT) Search is essentially used in all business applications, small or large. For Example:
- E-commerce applications
- Large scale BFSI sector applications
- Applications with structured/unstructured data back-end
- Applications handling Legal services Data
- Applications with insurance data
- Applications pertaining to citizen data, etc.
Commits and Searching
We commit documents to solr index in two ways: Hard Commit and Soft Commit. Hard commit, as the name suggests, flushes/dumps all changes made since last commit to the index in the hard drive. Soft commit, on the other hand, is faster as it does not commit changes in hard drive but only makes the changes available for search.
Both hard and soft commits have two primary configuration parameters:
Soft commit makes use of two parameters:
||Integer. Defines the number of documents to queue before pushing them to the index. It works in conjunction with the
||The number of milliseconds to wait before pushing documents to the index. It works in conjunction with the
Another important aspect in this discussion is Transaction logs(tlog). In essence, if enabled, a tlog is created after every hard commit and acts as a storehouse of the updates rolled out after the last hard commit. tlogs are important as they prevent data loss to occur. To facilitate this, all index calls are turned to tlogs before the clients, hence, if solr crashes, upon restarting, all these messages are replayed preventing any kind of data loss.
It is usually preferable to configure commits (both hard and soft) in
solrconfig.xml and avoid sending commits from an external source. Use steps below to configure hard/soft commit values as needed.
Step -1 : Fire up a Solr instance –
Use command ./solr start once in “bin” folder. Following window appears.
Step -2 : Select “Files” in “filmsdata” core –
Step 3: Modifying values for soft/hard commit –
The time chosen for
autoSoftCommit determines the maximum time after a document is sent to Solr before it becomes searchable and does not affect the transaction log. Choose as long an interval as your application can tolerate for this value depending on the requirements.
Pro tip : For high bulk indexing, especially for the initial load if there is no searching, consider turning off
autoSoftCommit by specifying a value of
-1 for the maxTime parameter.
NRT is very important and widely used aspect in solr applications and instrumental in handling bulk indexing operations. Correctly configured solr instance can drastically impact performance in terms of data availability and searchability and thus create a huge impact on business.