Understanding how search works

  • search
Last updated: 28-08-2023

The engine (Elasticsearch)

RERO ILS uses Elasticsearch, a fast and powerful search engine. The key to its performance is that it doesn't query a database directly, but rather text indexes in JSON format, enabling it to find information quickly even among very large quantities of data. Its various mechanisms make it a Google-like search engine, flexible and easy to use for the uninitiated, while offering more advanced functions for technical users as well.

Elasticsearch uses mathematical vectors to assign scores to the resources returned by a query and rank them by relevance.

Data indexing

RERO ILS contains various resources described in JSON. This description follows a schema that defines field names, any sub-elements, allowed values, etc.

See: Data model.

Fields mapping

To be searchable, database fields must be indexed resource by resource in Elasticsearch. Not all data entered is necessarily indexed, in order to avoid search noise. On the other hand, some data that is not coming from the resource itself can be indexed to improve the search capabilities. E.g. variants of contributors' names, or item barcodes and call numbers are present in the documents index. In the database, an item barcode is present only in the JSON describing the item. However, in the index, it is added to the document JSON to enhance search functionnality.

For the same document, the JSON in the database (/api/documents/1) and that of the index (/api/documents/?q=pid:1) are therefore different!

Elasticsearch mappings describe, for each type of resource, which fields are indexed by the search engine and in which form. It can be useful to refer to them when using expert search:

Field analysis levels

To understand how search works, it's important to know how database fields are processed by the search engine. RERO ILS takes advantage of Elasticsearch's analysis functions, for example to forgive typing errors or substitute a word in the plural for its singular. Fields can be indexed with different levels of analysis. In RERO ILS, the most common are

  1. keyword (No analysis): Raw, unanalyzed field. Search returns only exact matching terms.
    Example: barcodes, pid, codes, field types.
    Elasticsearch ref.: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html
  2. text (Standard analyzer): Insensitive to case and punctuation marks. Used for most short text fields.
    Example: names of authors, publishers, places of publication, call numbers, etc.
    Elasticsearch ref.: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
  3. text (Standard analyzer + language analyzers): For fields containing natural language (such as document titles and summaries), word stemming is applied for French, English, German and Italian. This neutralizes diacritics, articles, singulars/plurals/verb forms/etc., and also returns terms with the same root (make, making, maker).
    Example: titles and subtitles, abstracts, notes, edition statements, subjects.
    Elasticsearch ref.: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

How indexing works with language analyzers

In the example above, all analyzers neutralize punctuation and case; the French and Italian ones neutralize articles ("un") and accents ("é"); and the French, English and German parsers stems the word "faire" because these vocabularies recognize the letter e at the end of the word as a suffix.

RERO ILS offers two types of search that are adapted to very different uses:

  • Covers most use cases for patrons and librarians alike
  • Intuitive Google-like search process
    • Users adapt and refine their search to reduce noise and find relevant results on the first page.
  • Spaces between terms interpreted as AND operator
  • Does not target a specific field, but searches all indexed fields
  • Forgives syntax errors in queries
  • Uses ElasticSearch's simple string query syntax

See: Search for resources (simple search)

  • Useful for very complex or precise searches
  • Allows you to target specific fields or sub-fields and combine different queries
  • Search process often requires thoughtful query construction and data model knowledge
  • Spaces between terms interpreted as OR operator
  • Admits no syntax error in queries
  • Uses Elasticsearch's full string query syntax

See: Search in expert mode

Data model | Search for resources (simple search)