In-Place Updates

 

In-place updates are very similar to atomic updates; in some sense, this is a subset of atomic updates. In regular atomic updates, the entire document is re-indexed internally during the application of the update. However, in this approach, only the fields to be updated are affected and the rest of the documents are not re-indexed internally. Hence, the efficiency of updating in-place is unaffected by the size of the documents that are updated (i.e., number of fields, size of fields, etc.). Apart from these internal differences, there is no functional difference between atomic updates and in-place updates.

An atomic update operation is performed using this approach only when the fields to be updated meet these three conditions:

  • are non-indexed (indexed="false"), non-stored (stored="false"), single valued (multiValued="false") numeric docValues (docValues="true") fields;
  • the _version_ field is also a non-indexed, non-stored single valued docValues field; and,
  • copy targets of updated fields, if any, are also non-indexed, non-stored single valued numeric docValues fields.

To use in-place updates, add a modifier to the field that needs to be updated. The content can be updated or incrementally increased.

set
Set or replace the field value(s) with the specified value(s). May be specified as a single value.
inc
Increments a numeric value by a specific amount. Must be specified as a single numeric value.

Optimistic Concurrency

 

Optimistic Concurrency is a feature of Solr that can be used by client applications which update/replace documents to ensure that the document they are replacing/updating has not been concurrently modified by another client application. This feature works by requiring a _version_ field on all documents in the index, and comparing that to a _version_ specified as part of the update command. By default, Solr’s Schema includes a _version_ field, and this field is automatically added to each new document.

In general, using optimistic concurrency involves the following work flow:

  1. A client reads a document. In Solr, one might retrieve the document with the /get handler to be sure to have the latest version.
  2. A client changes the document locally.
  3. The client resubmits the changed document to Solr, for example, perhaps with the /update handler.
  4. If there is a version conflict (HTTP error code 409), the client starts the process over.

When the client resubmits a changed document to Solr, the _version_ can be included with the update to invoke optimistic concurrency control. Specific semantics are used to define when the document should be updated or when to report a conflict.

  • If the content in the _version_ field is greater than ‘1’ (i.e., ‘12345’), then the _version_ in the document must match the _version_ in the index.
  • If the content in the _version_ field is equal to ‘1’, then the document must simply exist. In this case, no version matching occurs, but if the document does not exist, the updates will be rejected.
  • If the content in the _version_ field is less than ‘0’ (i.e., ‘-1’), then the document must not exist. In this case, no version matching occurs, but if the document exists, the updates will be rejected.
  • If the content in the _version_ field is equal to ‘0’, then it doesn’t matter if the versions match or if the document exists or not. If it exists, it will be overwritten; if it does not exist, it will be added.

If the document being updated does not include the _version_ field, and atomic updates are not being used, the document will be treated by normal Solr rules, which is usually to discard the previous version.

When using Optimistic Concurrency, clients can include an optional versions=true request parameter to indicate that the new versions of the documents being added should be included in the response. This allows clients to immediately know what the _version_ is of every document added without needing to make a redundant /get request.

Following are some examples using versions=true in queries:

In this example, we have added 2 documents “ddd” and “eee”. Because we added versions=true to the request, the response shows the document version for each document.

In this example, we’ve attempted to update document “ddd” but specified the wrong version in the request: version=999999 doesn’t match the document version we just got when we added the document. We get an error in response.

Now we’ve sent an update with a value for _version_ that matches the value in the index, and it succeeds. Because we included versions=true to the update request, the response includes a different value for the _version_ field.

So, this is it about today. Stay tuned for another post very soon.