The engine (Elasticsearch)
RERO ILS uses Elasticsearch, a fast and powerful search engine. The key to its performance is that it doesn't query a database directly, but rather text indexes in JSON format, enabling it to find information quickly even among very large quantities of data. Its various mechanisms make it a Google-like search engine, flexible and easy to use for the uninitiated, while offering more advanced functions for technical users as well.
Elasticsearch uses mathematical vectors to assign scores to the resources returned by a query and rank them by relevance.
Data indexing
RERO ILS contains various resources described in JSON. This description follows a schema that defines field names, any sub-elements, allowed values, etc.
See: Data model.
Fields mapping
To be searchable, database fields must be indexed resource by resource in Elasticsearch. Not all data entered is necessarily indexed, in order to avoid search noise. On the other hand, some data that is not coming from the resource itself can be indexed to improve the search capabilities. E.g. variants of contributors' names, or item barcodes and call numbers are present in the documents
index. In the database, an item barcode is present only in the JSON describing the item. However, in the index, it is added to the document JSON to enhance search functionnality.
For the same document, the JSON in the database (/api/documents/1
) and that of the index (/api/documents/?q=pid:1
) are therefore different!
Elasticsearch mappings describe, for each type of resource, which fields are indexed by the search engine and in which form. It can be useful to refer to them when using expert search:
Field analysis levels
To understand how search works, it's important to know how database fields are processed by the search engine. RERO ILS takes advantage of Elasticsearch's analysis functions, for example to forgive typing errors or substitute a word in the plural for its singular. Fields can be indexed with different levels of analysis. In RERO ILS, the most common are
keyword
(No analysis): Raw, unanalyzed field. Search returns only exact matching terms.
Example: barcodes,pid
, codes, field types.
Elasticsearch ref.: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.htmltext
(Standard
analyzer): Insensitive to case and punctuation marks. Used for most short text fields.
Example: names of authors, publishers, places of publication, call numbers, etc.
Elasticsearch ref.: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.htmltext
(Standard
analyzer + language analyzers): For fields containing natural language (such as document titles and summaries), word stemming is applied for French, English, German and Italian. This neutralizes diacritics, articles, singulars/plurals/verb forms/etc., and also returns terms with the same root (make, making, maker).
Example: titles and subtitles, abstracts, notes, edition statements, subjects.
Elasticsearch ref.: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
This mechanism can generate noise in search results, as each field is indexed 5 times according to 5 different analysis rules. Slight differences in results for searches with or without diacritics ("résumé" vs. "resume") may therefore appear due to the different stemming rules per language.
In the example above, all analyzers neutralize punctuation and case; the French and Italian ones neutralize articles ("un") and accents ("é"); and the French, English and German parsers stems the word "faire" because these vocabularies recognize the letter e
at the end of the word as a suffix.
Simple search VS Expert search
RERO ILS offers two types of search that are adapted to very different uses:
Simple search
- Covers most use cases for patrons and librarians alike
- Intuitive Google-like search process
- Users adapt and refine their search to reduce noise and find relevant results on the first page.
- Spaces between terms interpreted as
AND
operator - Does not target a specific field, but searches all indexed fields
- Forgives syntax errors in queries
- Uses ElasticSearch's simple string query syntax
See: Search for resources (simple search)
Expert search
- Useful for very complex or precise searches
- Allows you to target specific fields or sub-fields and combine different queries
- Search process often requires thoughtful query construction and data model knowledge
- Spaces between terms interpreted as
OR
operator - Admits no syntax error in queries
- Uses Elasticsearch's full string query syntax