• Home
  • /
  • Solr
  • /
  • The DisMax Query Parser – Ultimate Solr Guide

The DisMax Query Parser – Ultimate Solr Guide

The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).

In general, the DisMax query parser’s interface is more like that of Google than the interface of the ‘lucene’ Solr query parser. This similarity makes DisMax the appropriate query parser for many consumer applications. It accepts a simple syntax, and it rarely produces error messages.

The DisMax query parser supports an extremely simplified subset of the Lucene QueryParser syntax. As in Lucene, quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses. All other Lucene query parser special characters (except AND and OR) are escaped to simplify the user experience. The DisMax query parser takes responsibility for building a good query from the user’s input using Boolean clauses containing DisMax queries across fields and boosts specified by the user. It also lets the Solr administrator provide additional boosting queries, boosting functions, and filtering queries to artificially affect the outcome of all searches. These options can all be specified as default parameters for the request handler in the solrconfig.xml file or overridden in the Solr query URL.

Interested in the technical concept behind the DisMax name? DisMax stands for Maximum Disjunction. Here’s a definition of a Maximum Disjunction or “DisMax” query:

A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.

Whether or not you remember this explanation, do remember that the DisMax Query Parser was primarily designed to be easy to use and to accept almost any input without returning an error.

DisMax Query Parser Parameters

In addition to the common request parameters, highlighting parameters, and simple facet parameters, the DisMax query parser supports the parameters described below. Like the standard query parser, the DisMax query parser allows default parameter values to be specified in solrconfig.xml, or overridden by query-time values in the request.

The sections below explain these parameters in detail.

q Parameter

The q parameter defines the main “query” constituting the essence of the search. The parameter supports raw input strings provided by users with no special escaping. The + and – characters are treated as “mandatory” and “prohibited” modifiers for terms. Text wrapped in balanced quote characters (for example, “San Jose”) is treated as a phrase. Any query containing an odd number of quote characters is evaluated as if there were no quote characters at all.

q.alt Parameter

If specified, the q.alt parameter defines a query (which by default will be parsed using standard query parsing syntax) when the main q parameter is not specified or is blank. The q.alt parameter comes in handy when you need something like a query to match all documents (don’t forget &rows=0 for that one!) in order to get collection-wide faceting counts.

qf (Query Fields) Parameter

The qf parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field’s importance in the query. For example, the query below:

qf="fieldOne^2.3 fieldTwo fieldThree^0.4"

assigns fieldOne a boost of 2.3, leaves fieldTwo with the default boost (because no boost factor is specified), and fieldThree a boost of 0.4. These boost factors make matches in fieldOne much more significant than matches in fieldTwo, which in turn are much more significant than matches in fieldThree.

mm (Minimum Should Match) Parameter

When processing queries, Lucene/Solr recognizes three types of clauses: mandatory, prohibited, and “optional” (also known as “should” clauses). By default, all words or phrases specified in the q parameter are treated as “optional” clauses unless they are preceded by a “+” or a “-“. When dealing with these “optional” clauses, the mm parameter makes it possible to say that a certain minimum number of those clauses must match. The DisMax query parser offers great flexibility in how the minimum number can be specified.

The table below explains the various ways that mm values can be specified.

SyntaxExampleDescription
Positive integer3Defines the minimum number of clauses that must match, regardless of how many clauses there are in total.
Negative integer-2Sets the minimum number of matching clauses to the total number of optional clauses, minus this value.
Percentage75%Sets the minimum number of matching clauses to this percentage of the total number of optional clauses. The number computed from the percentage is rounded down and used as the minimum.
Negative percentage-25%Indicates that this percent of the total number of optional clauses can be missing. The number computed from the percentage is rounded down, before being subtracted from the total to determine the minimum number.
An expression beginning with a positive integer followed by a > or < sign and another value3<90%Defines a conditional expression indicating that if the number of optional clauses is equal to (or less than) the integer, they are all required, but if it’s greater than the integer, the specification applies. In this example: if there are 1 to 3 clauses they are all required, but for 4 or more clauses only 90% are required.
Multiple conditional expressions involving > or < signs2<-25% 9<-3Defines multiple conditions, each one being valid only for numbers greater than the one before it. In the example at left, if there are 1 or 2 clauses, then both are required. If there are 3-9 clauses all but 25% are required. If there are more then 9 clauses, all but three are required.

When specifying mm values, keep in mind the following:

  • When dealing with percentages, negative values can be used to get different behavior in edge cases. 75% and -25% mean the same thing when dealing with 4 clauses, but when dealing with 5 clauses 75% means 3 are required, but -25% means 4 are required.
  • If the calculations based on the parameter arguments determine that no optional clauses are needed, the usual rules about Boolean queries still apply at search time. (That is, a Boolean query containing no required clauses must still match at least one optional clause).
  • No matter what number the calculation arrives at, Solr will never use a value greater than the number of optional clauses, or a value less than 1. In other words, no matter how low or how high the calculated result, the minimum number of required matches will never be less than 1 or greater than the number of clauses.
  • When searching across multiple fields that are configured with different query analyzers, the number of optional clauses may differ between the fields. In such a case, the value specified by mm applies to the maximum number of optional clauses. For example, if a query clause is treated as stopword for one of the fields, the number of optional clauses for that field will be smaller than for the other fields. A query with such a stopword clause would not return a match in that field if mm is set to 100% because the removed clause does not count as matched.

The default value of mm is 0% (all clauses optional), unless q.op is specified as “AND”, in which case mm defaults to 100% (all clauses required).

pf (Phrase Fields) Parameter

Once the list of matching documents has been identified using the fq and qf parameters, the pf parameter can be used to “boost” the score of documents in cases where all of the terms in the q parameter appear in close proximity.

The format is the same as that used by the qf parameter: a list of fields and “boosts” to associate with each of them when making phrase queries out of the entire q parameter.

ps (Phrase Slop) Parameter

The ps parameter specifies the amount of “phrase slop” to apply to queries specified with the pf parameter. Phrase slop is the number of positions one token needs to be moved in relation to another token in order to match a phrase specified in a query.

qs (Query Phrase Slop) Parameter

The qs parameter specifies the amount of slop permitted on phrase queries explicitly included in the user’s query string with the qf parameter. As explained above, slop refers to the number of positions one token needs to be moved in relation to another token in order to match a phrase specified in a query.

The tie (Tie Breaker) Parameter

The tie parameter specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries.

When a term from the user’s input is tested against multiple fields, more than one field may match. If so, each field will generate a different score based on how common that word is in that field (for each document relative to all other documents). The tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.

A value of “0.0” – the default – makes the query a pure “disjunction max query”: that is, only the maximum scoring subquery contributes to the final score. A value of “1.0” makes the query a pure “disjunction sum query” where it doesn’t matter what the maximum scoring sub query is, because the final score will be the sum of the subquery scores. Typically a low value, such as 0.1, is useful.

bq (Boost Query) Parameter

The bq parameter specifies an additional, optional, query clause that will be added to the user’s main query as optional clauses that will influence the score. For example, if you wanted to add a boost for documents that are in a particular category you could use:

q=cheese
bq=category:food^10

You can specify multiple bq parameters, which will each be added as separate clauses with separate boosts.

q=cheese
bq=category:food^10
bq=category:deli^5

Using the bq parameter in this way is functionally equivilent to combining your q and bq parameters into a single larger boolean query, where the (original) q parameter is “mandatory” and the other clauses are optional:

q=(+cheese category:food^10 category:deli^5)

The only difference between the above examples, is that using the bq parameter allows you to specify these extra clauses independently (i.e., as configuration defaults) from the main query.

bf (Boost Functions) Parameter

The bf parameter specifies functions (with optional query boost) that will be used to construct FunctionQueries which will be added to the user’s main query as optional clauses that will influence the score. Any function supported natively by Solr can be used, along with a boost value. For example:

q=cheese
bf=div(1,sum(1,price))^1.5

Specifying functions with the bf parameter is essentially just shorthand for using the bq parameter (with the same shortcomings) combined with the {!func} parser — with the addition of the simplified “query boost” syntax.

For example, the two bf parameters listed below, are completely equivalent to the two bq parameters below:

bf=div(sales_rank,ms(NOW,release_date))
bf=div(1,sum(1,price))^1.5
bq={!func}div(sales_rank,ms(NOW,release_date))
bq={!lucene}( {!func v='div(1,sum(1,price))'} )^1.5

This is it for today, stay tuned for more such informative blogs in future.