Custom User Dictionaries
The search module supports 3 types of custom dictionaries for helping to tune search results for your content. These are synonyms, stop words and custom text analysis for Japanese.
In the website admin, you can upload text files (
.txt only) and, in the case of stop words and synonyms, also manually enter them on the Search Configuration screen.
Note that only a file upload is supported for the Japanese user dictionary, as it is used by the kuromoji tokenizer that splits words up during analysis. Synonyms and stop words are "filters" so multiple files (uploaded and manually entered) can be used for those.
User dictionaries can be configured for the entire network (provided the subsites match the primary site language) or at an individual site level.
Updating User Dictionaries
After updating custom dictionaries you will need to reindex your content unless you are on Elasticsearch version 7.8 or higher.
Elasticsearch 7.8+ usage is still experimental. Please contact support if you wish to try upgrading.
For better search relevance and results, a dictionary of synonyms can be very useful in the following cases:
- There are a lot of acronyms or other contractions in your content
- There are common misspellings of search terms
- There are users from different countries who may use different words for the same thing
- There are sub-categories of content that should match a more general search term
For example, you may want search results for the phrase "Checking Accounts" to match content that is titled "Current Accounts" or for common acronyms like "GDP" to match "Gross Domestic Product".
Additionally, to cater for variations in language, you could create synonyms for American English e.g. "sneakers" to British English e.g. "trainers".
Comma-separated lists of words are treated as equivalent. You should have one list of synonyms per line.
sneakers, trainers, footwear, shoes foozball, foosball, table football CPU, central processing unit
Comma-separated words or phrases followed by "=>" will be treated the same as comma-separated words or phrases to the right of the "=>" operator but not the other way around.
i-pod, i pod => ipod tent => bivouac, teepee sea biscuit, sea biscit => seabiscuit
In the above example, a search for "tent" will match content containing "bivouac" or "teepee" but a search for "teepee" will only return results containing the word "teepee" specifically.
Stop words are words that are ignored when analysing content and searching content. Altis uses the standard dictionary of stop words for each supported language. For example in english this includes "it", "of", "and" and so on.
The default list should be adequate in most cases.
A stop words dictionary should contain one word per line:
ignore these words
Japanese User Dictionary
Your site's language must be set to Japanese to see this option.
A user dictionary provides a way to control how words are broken up when searching. If there are compound words or phrases specific to your site, such as the names of imaginary places, or authors and celebrities that users may search for, they can be specified here to increase search relevancy.
The syntax for the provided text file should follow the Comma-Separated Values (CSV) format:
textis the compound word or phrase that appears in your content, such as a name
tokensmust contain the same text again with spaces added between each word
readingsmust contain the same text as
tokenswith any kanji replaced by katakana. This describes the pronunciation of the tokens.
part-of-speechdefines what the text is, for example a noun or verb
By default the text "東京スカイツリー" would be broken up into "東京", "スカイ" and "ツリ". The example below changes this behavior so that the text is treated as a custom noun:
東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,名詞
tokensare "東京" and "スカイツリー"
readingsare "トウキョウ" and "スカイツリー"
The available parts of speech are: