• Home
  • /
  • Solr
  • /
  • The Standard Query Parser – Ultimate Solr Guide

The Standard Query Parser – Ultimate Solr Guide

Spread the love

Solr’s default Query Parser is also known as the “lucene” parser.

The key advantage of the standard query parser is that it supports a robust and fairly intuitive syntax allowing you to create a variety of structured queries. The largest disadvantage is that it’s very intolerant of syntax errors, as compared with something like the DisMax query parser which is designed to throw as few errors as possible.

Standard Query Parser Parameters

In addition to the Common Query ParametersFaceting ParametersHighlighting Parameters, and MoreLikeThis Parameters, the standard query parser supports the parameters described in the table below.qDefines a query using standard query syntax. This parameter is mandatory.q.opSpecifies the default operator for query expressions. Possible values are “AND” or “OR”.dfSpecifies a default searchable field.sowSplit on whitespace. If set to true, text analysis is invoked separately for each individual whitespace-separated term. The default is false; whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g., multi-word synonyms and shingles.

Default parameter values are specified in solrconfig.xml, or overridden by query-time values in the request.

Standard Query Parser Response

By default, the response from the standard query parser contains one <result> block, which is unnamed. If the debug parameter is used, then an additional <lst> block will be returned, using the name “debug”. This will contain useful debugging info, including the original query string, the parsed query string, and explain info for each document in the <result> block. If the explainOther parameter is also used, then additional explain info will be provided for all the documents matching that query.

Specifying Terms for the Standard Query Parser

A query to the standard query parser is broken up into terms and operators. There are two types of terms: single terms and phrases.

  • A single term is a single word such as “test” or “hello”
  • A phrase is a group of words surrounded by double quotes such as “hello dolly”

Term Modifiers

Solr supports a variety of term modifiers that add flexibility or precision, as needed, to searches. These modifiers include wildcard characters, characters for making a search “fuzzy” or more general, and so on. The sections below describe these modifiers in detail.

Wildcard Searches

Solr’s standard query parser supports single and multiple character wildcard searches within single terms. Wildcard characters can be applied to single terms, but not to search phrases.

Wildcard Search TypeSpecial CharacterExample
Single character (matches a single character)?The search string te?t would match both test and text.
Multiple characters (matches zero or more sequential characters)*The wildcard search: tes* would match test, testing, and tester. You can also use wildcard characters in the middle of a term. For example: te*t would match test and text. *est would match pest and test.

Fuzzy Searches

Solr’s standard query parser supports fuzzy searches based on the Damerau-Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term. For example, to search for a term similar in spelling to “roam,” use the fuzzy search:

roam~

This search will match terms like roams, foam, & foams. It will also match the word “roam” itself.

An optional distance parameter specifies the maximum number of edits allowed, between 0 and 2, defaulting to 2. For example:

roam~1

This will match terms like roams & foam – but not foams since it has an edit distance of “2”

Proximity Searches

A proximity search looks for terms that are within a specific distance from one another.

To perform a proximity search, add the tilde character ~ and a numeric value to the end of a search phrase. For example, to search for a “apache” and “jakarta” within 10 words of each other in a document, use the search:

"jakarta apache"~10

The distance referred to here is the number of term movements needed to match the specified phrase. In the example above, if “apache” and “jakarta” were 10 spaces apart in a field, but “apache” appeared before “jakarta”, more than 10 term movements would be required to move the terms together and position “apache” to the right of “jakarta” with a space in between.

Existence Searches

An existence search for a field matches all documents where a value exists for that field. To query for a field existing, simply use a wildcard instead of a term in the search.

field:*

A field will be considered to “exist” if it has any value, even values which are often considered “not existent”. (e.g., NaN"", etc.)

Range Searches

A range search specifies a range of values for a field (a range with an upper bound and a lower bound). The query matches documents whose values for the specified field or fields fall within the range. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically, except on numeric fields. For example, the range query below matches all documents whose popularity field has a value between 52 and 10,000, inclusive.

popularity:[52 TO 10000]

Range queries are not limited to date fields or even numerical fields. You could also use range queries with non-date fields:

title:{Aida TO Carmen}

This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.

The brackets around a query determine its inclusiveness.

  • Square brackets [ & ] denote an inclusive range query that matches values including the upper and lower bound.
  • Curly brackets { & } denote an exclusive range query that matches values between the upper and lower bounds, but excluding the upper and lower bounds themselves.
  • You can mix these types so one end of the range is inclusive and the other is exclusive. Here’s an example: count:{1 TO 10]

Wildcards, *, can also be used for either or both endpoints to specify an open-ended range query. This is a divergence from Lucene’s Classic Query Parser.

  • field:[* TO 100] finds all field values less than or equal to 100.
  • field:[100 TO *] finds all field values greater than or equal to 100.
  • field:[* TO *] finds any document with a value between the effective values of -Infinity and +Infinity for that field type.

Boosting a Term with “^”

Lucene/Solr provides the relevance level of matching documents based on the terms found. To boost a term use the caret symbol ^ with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.

Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for

“jakarta apache” and you want the term “jakarta” to be more relevant, you can boost it by adding the ^ symbol along with the boost factor immediately after the term. For example, you could type:

jakarta^4 apache

This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:

"jakarta apache"^4 "Apache Lucene"

By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, it could be 0.2).

Constant Score with “^=”

Constant score queries are created with <query_clause>^=<score>, which sets the entire clause to the specified score for any documents matching that clause. This is desirable when you only care about matches for a particular clause and don’t want other relevancy factors such as term frequency (the number of times the term appears in the field) or inverse document frequency (a measure across the whole index for how rare a term is in a field).

Example:

(description:blue OR color:blue)^=1.0 text:shoes

Querying Specific Fields

Data indexed in Solr is organized in fields, which are defined in the Solr Schema. Searches can take advantage of fields to add precision to queries. For example, you can search for a term only in a specific field, such as a title field.

The Schema defines one field as a default field. If you do not specify a field in a query, Solr searches only the default field. Alternatively, you can specify a different field or a combination of fields in a query.

To specify a field, type the field name followed by a colon “:” and then the term you are searching for within the field.

For example, suppose an index contains two fields, title and text,and that text is the default field. If you want to find a document called “The Right Way” which contains the text “don’t go this way,” you could include either of the following terms in your search query:

title:"The Right Way" AND text:go

title:"Do it right" AND go

Since text is the default field, the field indicator is not required; hence the second query above omits it.

The field is only valid for the term that it directly precedes, so the query title:Do it right will find only “Do” in the title field. It will find “it” and “right” in the default field (in this case the text field).

Grouping Terms to Form Sub-Queries

Lucene/Solr supports using parentheses to group clauses to form sub-queries. This can be very useful if you want to control the Boolean logic for a query.

The query below searches for either “jakarta” or “apache” and “website”:

(jakarta OR apache) AND website

This adds precision to the query, requiring that the term “website” exist, along with either term “jakarta” and “apache.”

Grouping Clauses within a Field

To apply two or more Boolean operators to a single field in a search, group the Boolean clauses within parentheses. For example, the query below searches for a title field that contains both the word “return” and the phrase “pink panther”:

title:(+return +"pink panther")

Comments in Queries

C-Style comments are supported in query strings.

Example:

"jakarta apache" /* this is a comment in the middle of a normal query string */ OR jakarta

Comments may be nested.

Specifying Dates and Times

Queries against date based fields must use the appropriate date formating. Queries for exact date values will require quoting or escaping since : is the parser syntax used to denote a field query:

  • createdate:1976-03-06T23\:59\:59.999Z
  • createdate:"1976-03-06T23:59:59.999Z"
  • createdate:[1976-03-06T23:59:59.999Z TO *]
  • createdate:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]
  • timestamp:[* TO NOW]
  • pubdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]
  • createdate:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]
  • createdate:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]

So, this is it for today. Stay tuned for more such posts in future.