<?xml version="1.0"?>
<oembed><version>1.0</version><provider_name>Aeologic Blog</provider_name><provider_url>https://www.aeologic.com/blog</provider_url><title>Spell Checking - Ultimate Solr Guide - Aeologic Blog</title><type>rich</type><width>600</width><height>338</height><html>&lt;blockquote class="wp-embedded-content" data-secret="Cw9ojylEtV"&gt;&lt;a href="https://www.aeologic.com/blog/ultimate-solr-guide16-spell-checking/"&gt;Spell Checking &#x2013; Ultimate Solr Guide&lt;/a&gt;&lt;/blockquote&gt;&lt;iframe sandbox="allow-scripts" security="restricted" src="https://www.aeologic.com/blog/ultimate-solr-guide16-spell-checking/embed/#?secret=Cw9ojylEtV" width="600" height="338" title="&#x201C;Spell Checking &#x2013; Ultimate Solr Guide&#x201D; &#x2014; Aeologic Blog" data-secret="Cw9ojylEtV" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"&gt;&lt;/iframe&gt;&lt;script&gt;
/*! This file is auto-generated */
!function(d,l){"use strict";l.querySelector&amp;&amp;d.addEventListener&amp;&amp;"undefined"!=typeof URL&amp;&amp;(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&amp;&amp;!/[^a-zA-Z0-9]/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret="'+t.secret+'"]'),o=l.querySelectorAll('blockquote[data-secret="'+t.secret+'"]'),c=new RegExp("^https?:$","i"),i=0;i&lt;o.length;i++)o[i].style.display="none";for(i=0;i&lt;a.length;i++)s=a[i],e.source===s.contentWindow&amp;&amp;(s.removeAttribute("style"),"height"===t.message?(1e3&lt;(r=parseInt(t.value,10))?r=1e3:~~r&lt;200&amp;&amp;(r=200),s.height=r):"link"===t.message&amp;&amp;(r=new URL(s.getAttribute("src")),n=new URL(t.value),c.test(n.protocol))&amp;&amp;n.host===r.host&amp;&amp;l.activeElement===s&amp;&amp;(d.top.location.href=t.value))}},d.addEventListener("message",d.wp.receiveEmbedMessage,!1),l.addEventListener("DOMContentLoaded",function(){for(var e,t,s=l.querySelectorAll("iframe.wp-embedded-content"),r=0;r&lt;s.length;r++)(t=(e=s[r]).getAttribute("data-secret"))||(t=Math.random().toString(36).substring(2,12),e.src+="#?secret="+t,e.setAttribute("data-secret",t)),e.contentWindow.postMessage({message:"ready",secret:t},"*")},!1)))}(window,document);
//# sourceURL=https://www.aeologic.com/blog/wp-includes/js/wp-embed.min.js
&lt;/script&gt;
</html><thumbnail_url>https://www.aeologic.com/blog/wp-content/uploads/2020/06/Spell-Checking-in-Solr-1.png</thumbnail_url><thumbnail_width>1080</thumbnail_width><thumbnail_height>622</thumbnail_height><description>The SpellCheck component is designed to provide inline query suggestions based on other, similar, terms. The basis for these suggestions can be terms in a field in Solr, externally created text files, or fields in other Lucene indexes. Configuring the SpellCheckComponent Define Spell Check in solrconfig.xml The first step is to specify the source of terms in&nbsp;solrconfig.xml. There are three approaches to spell checking in Solr, discussed below. IndexBasedSpellChecker The&nbsp;IndexBasedSpellChecker&nbsp;uses a Solr index as the basis for a parallel index used for spell checking. It requires defining a field as the basis for the index terms; a common practice is to copy terms from some fields (such as&nbsp;title,&nbsp;body, etc.) to another field created for spell checking. Here is a simple example of configuring&nbsp;solrconfig.xml&nbsp;with the&nbsp;IndexBasedSpellChecker: The first element defines the&nbsp;searchComponent&nbsp;to use the&nbsp;solr.SpellCheckComponent. The&nbsp;classname&nbsp;is the specific implementation of the SpellCheckComponent, in this case&nbsp;solr.IndexBasedSpellChecker. Defining the&nbsp;classname&nbsp;is optional; if not defined, it will default to&nbsp;IndexBasedSpellChecker. The&nbsp;spellcheckIndexDir&nbsp;defines the location of the directory that holds the spellcheck index, while the&nbsp;field&nbsp;defines the source field (defined in the Schema) for spell check terms. When choosing a field for the spellcheck index, it&#x2019;s best to avoid a heavily processed field to get more accurate results. If the field has many word variations from processing synonyms and/or stemming, the dictionary will be created with those variations in addition to more valid spelling data. Finally,&nbsp;buildOnCommit&nbsp;defines whether to build the spell check index at every commit (that is, every time new documents are added to the index). It is optional, and can be omitted if you would rather set it to&nbsp;false. DirectSolrSpellChecker The&nbsp;DirectSolrSpellChecker&nbsp;uses terms from the Solr index without building a parallel index like the&nbsp;IndexBasedSpellChecker. This spell checker has the benefit of not having to be built regularly, meaning that the terms are always up-to-date with terms in the index. Here is how this might be configured in&nbsp;solrconfig.xml When choosing a&nbsp;field&nbsp;to query for this spell checker, you want one which has relatively little analysis performed on it (particularly analysis such as stemming). Note that you need to specify a field to use for the suggestions, so like the&nbsp;IndexBasedSpellChecker, you may want to copy data from fields like&nbsp;title,&nbsp;body, etc., to a field dedicated to providing spelling suggestions. Many of the parameters relate to how this spell checker should query the index for term suggestions. The&nbsp;distanceMeasure&nbsp;defines the metric to use during the spell check query. The value &#x201C;internal&#x201D; uses the default Levenshtein metric, which is the same metric used with the other spell checker implementations. Because this spell checker is querying the main index, you may want to limit how often it queries the index to be sure to avoid any performance conflicts with user queries. The&nbsp;accuracy&nbsp;setting defines the threshold for a valid suggestion, while&nbsp;maxEdits&nbsp;defines the number of changes to the term to allow. Since most spelling mistakes are only 1 letter off, setting this to 1 will reduce the number of possible suggestions (the default, however, is 2); the value can only be 1 or 2.&nbsp;minPrefix&nbsp;defines the minimum number of characters the terms should share. Setting this to 1 means that the spelling suggestions will all start with the same letter, for example. The&nbsp;maxInspections&nbsp;parameter defines the maximum number of possible matches to review before returning results; the default is 5.&nbsp;minQueryLength&nbsp;defines how many characters must be in the query before suggestions are provided; the default is 4.&nbsp;maxQueryLength&nbsp;enables the spell checker to skip over very long query terms, which can avoid expensive operations or exceptions. There is no limit to term length by default. At first, spellchecker analyses incoming query words by looking up them in the index. Only query words, which are absent in index or too rare ones (below&nbsp;maxQueryFrequency) are considered as misspelled and used for finding suggestions. Words which are frequent than&nbsp;maxQueryFrequency&nbsp;bypass spellchecker unchanged. After suggestions for every misspelled word are found they are filtered for enough frequency with&nbsp;thresholdTokenFrequency&nbsp;as boundary value. These parameters (maxQueryFrequency&nbsp;and&nbsp;thresholdTokenFrequency) can be a percentage (such as .01, or 1%) or an absolute value (such as 4). FileBasedSpellChecker The&nbsp;FileBasedSpellChecker&nbsp;uses an external file as a spelling dictionary. This can be useful if using Solr as a spelling server, or if spelling suggestions don&#x2019;t need to be based on actual terms in the index. In&nbsp;solrconfig.xml, you would define the searchComponent as so: The differences here are the use of the&nbsp;sourceLocation&nbsp;to define the location of the file of terms and the use of&nbsp;characterEncoding&nbsp;to define the encoding of the terms file. WordBreakSolrSpellChecker WordBreakSolrSpellChecker&nbsp;offers suggestions by combining adjacent query terms and/or breaking terms into multiple words. It is a&nbsp;SpellCheckComponent&nbsp;enhancement, leveraging Lucene&#x2019;s&nbsp;WordBreakSpellChecker. It can detect spelling errors resulting from misplaced whitespace without the use of shingle-based dictionaries and provides collation support for word-break errors, including cases where the user has a mix of single-word spelling errors and word-break errors in the same query. It also provides shard support. Here is how it might be configured in&nbsp;solrconfig.xml: Some of the parameters will be familiar from the discussion of the other spell checkers, such as&nbsp;name,&nbsp;classname, and&nbsp;field. New for this spell checker is&nbsp;combineWords, which defines whether words should be combined in a dictionary search (default is true);&nbsp;breakWords, which defines if words should be broken during a dictionary search (default is true); and&nbsp;maxChanges, an integer which defines how many times the spell checker should check collation possibilities against the index (default is 10). The spellchecker can be configured with a traditional checker (i.e.,&nbsp;DirectSolrSpellChecker). The results are combined and collations can contain a mix of corrections from both spellcheckers. Add It to a Request Handler Queries will be sent to a&nbsp;RequestHandler. If every request should generate a suggestion, then you would add the following to the&nbsp;requestHandler&nbsp;that you are using: One of the possible parameters is the&nbsp;spellcheck.dictionary&nbsp;to use, and multiples can be defined. With multiple dictionaries, all specified dictionaries are consulted and results are interleaved. Collations are created with combinations from the different spellcheckers, with care taken that multiple overlapping corrections do not occur in the same collation. Here is an example with multiple dictionaries: Spell Check Parameters The SpellCheck component accepts the parameters described below.spellcheckThis parameter turns on SpellCheck [&hellip;]</description></oembed>
