Hi there, welcome to this blog. Today, we’ll be learning how to setup Apache solr server on a regular, day-to-day used Desktop/Laptop. This is going to be a comprehensive post, covering all the steps necessary to make a solr instance run its course. So, without further ado, let’s begin!
Note : In this post, we are going to use Apache Solr 7.4 installed on Ubuntu 18.04.3 LTS operating system. I will be covering a similar post taking Windows 10 as focal point some time later.
Step – 1: Minimum Requirements –
Java run-time environment (JRE) version 1.8 or higher.
If you don’t have the required version, or if the java command is not found, download and install the necessary version from the Oracle website at http://www.oracle.com/technetwork/java/javase/downloads/index.html. Solr is tested on several versions of Linux, macOS, and Windows, hence, any instance of the same can be used based on your operating system.
Step – 2: Installing Apache Solr –
Installation is very simple using terminal, hence, in this blog we will use the terminal to download and install the solr instance.
Step – 3: Understanding Solr architecture –
As described in step-2 above, once Solr’s tgz file is downloaded, right-click on the folder and click “Extract Here”. A new folder named solr-7.4.0 is created.
Solr has the following aspects attached to each directory:
- “bin”: It has scripts used to start/stop solr instance.
- “contrib“: Includes add-on plugins for specialized features of Solr.
- “dist”: Contains the main Solr .jar files
- “docs”: Includes a link to online Javadocs for Solr.
- “example”: Includes several types of examples that demonstrate various Solr capabilities.
- “licenses”: Includes all the licenses for third party libraries.
- “server”: It’s the core of solr engine.
There are multiple ways to communicate with solr engine to setup apache solr in linux. One can use REST clients, CURL, native api’s and Postman for the same. The operations performed are Index, Update, Delete and Search/Query. Managed-schema and solrconfig.xml are used by solr engine to perform various options.
Now, let’s assume we have to index a certain data, query solr engine and analyze everything that happens in between. All this information is logged in a folder “/api/logs” and/or “/connector/logs”. However, before getting on with this aspect further, let’s first understand how to index and query data from solr core.
Note: To start solr, go to bin directory using terminal and issue command: “./solr start”. Similarly, to stop solr, issue command “./solr stop”.
Step – 4: Starting solr and creating a core –
As we run command ./solr start in the terminal, we are presented with the following screen on browser for URL: localhost:8983/ provided no information is changed in conf file and solr services have started successfully in the terminal window.
Before we start creating a core, kindly go through below link to understand basics of solr schema design: https://lucene.apache.org/solr/guide/7_4/overview-of-documents-fields-and-schema-design.html
Just as we have rows and columns in a SQL database, we have a document in solr as the basic unit of transaction. A document in turn is composed of different field-value pairs. A collection or a core is a set of documents that end-user can query. Also, this set of document is termed as a collection if solr is started in solrCloud mode and is called a core if solr is started in standalone mode. Since we have started solr in standalone mode, we will henceforth refer to our document-set as a core.
Now, for this post, lets index a set of XML files containing “films” data which is available straight out of the box.
To do so, fire up the terminal and issue the following command:
Another important aspect while dealing with documents is handling the “managed-schema” file needfully. In order to understand this further, let’s take a look at the format of data in the “films” data-set.
As one can see that it has the following fields types: id, directed_by,initial_release_date and genre, schema file needs to have all these fields configured before data can be indexed to core ‘filmsdata’. Since it’s a default data set provided by solr, we can safely assume that these fields are pre-configured in managed-schema file.
For more information on schema design , refer to the document below: https://lucene.apache.org/solr/guide/7_4/field-type-definitions-and-properties.html
We can check if data is indexed correctly or not by using the following URL: http://localhost:8983/solr/filmsdata/select?q=*. Once the data is indexed successfully, we can play around with it as we like.
Step – 5: Manipulating data –
Solr essentially can do the following operations on data.
- Query/Search data
- Update data
- Delete data
We will look at each aspect in detail below.
Searching data or Querying data is one of the most common operations performed in solr. In order to facilitate searching, solr has multiple query handlers defined in config file solrconfig.xml. Some are present right out of the box whereas some can be defined as necessary. One such handler is /select handler. In this example, we will use /select handler to query data from solr.
Now, the query : http://localhost:8983/solr/filmsdata/select?q=*:*&fq=name:0.45 , returns the following records:
Here, the response has two parts, “responseHeader” key and “response” key. The former describes aspects like, status – Suggests if query is successful or not ; QTime – Suggests time taken to fetch the records ; params – Suggests what is queried from solr indexes whereas the latter describes aspects like, numFound – Number of docs returned from solr.
Now, another operation used very frequently in solr applications is the update operation.
Let’s try changing a field in the above document. Let’s change “directed_by” value to Gerard Butler. To do so, we will prepare a file with the information we need to change as shown below:
We use “set” parameter to update an already existing field value and “add” parameter to add a new value to the document. Run below command to update the document:
The output will be as below:
Now lastly, there are following ways to delete documents in solr. One way is to delete documents by id. For this, we need to specify the ID’s of the documents to be deleted between the <delete></delete> tags and save the document as an XML file:
In some cases, we need to delete documents with a certain field-value pair. This can be achieved by specifying the name and value of the field within the <query></query> tag pair and saving it as an XML file. This operation will delete all the documents having that field-value pair.
Lastly, in some cases we need to delete all documents from an index. In order to do that, we need to create an XML file as follows:
In conclusion, we have covered the most fundamental aspects of solr operations covering things that are needed for basic solr setup for an application.