Improving Multilingual Search With Altis

Making it easy for website users to find the content they are looking for is an essential capability of any interactive digital experience. Discovery via hierarchical sitemaps, mega menus and directory drilldowns are a useful baseline, but don’t match the increasing pace of user expectations and experience. Visiting users expect to be able to search “file my tax return” and receive a call-to-action to do exactly that, not drill down 5 levels into a content hierarchy. 

The value and needs of search apply to websites in all languages. Not only is search a difficult problem to get right, it’s even more challenging when handling more than one language at a time.

As part of building a world-class digital experience platform for WordPress, we deemed it critical that Altis meet these growing website search expectations, so we set out to incorporate the most intuitive and useful search function possible – multilingual or otherwise.

Search without support for synonyms is very frustrating. Users should not need to know how the content is written to perform a search.
Search without support for synonyms is very frustrating. Users should not need to know how the content is written to perform a search.

It’s no secret that the default MySQL-based full text search in WordPress yields poor search results and makes content discovery difficult for users. Though WordPress has made great effort and progress in pushing the default search capability, adding support for ranking by matched words or basic stop word support — the fact remains that MySQL is not capable of performing operations that users now expect from search.

Users now expect to be able to search more freely and find content based on vague terms. WordPress search still feels stuck in the era of searching databases for “exact match” phrases. Technically this means being able to perform fuzzy searches, support for work-stemming, stop words, synonyms, and advanced ranking and sorting algorithms.

These are by no means new technologies, and great search products have existed for some time, including Google Site Search, Elasticsearch, and Algolia to just name a few. Altis makes use of Elasticsearch, incorporating the ElasticPress plugin to integrate the WordPress data into its search index. Elasticsearch then provides some additional capabilities so that Altis is able to deliver relevant search results out of the box.

Accommodating multilingual search

Search capabilities and default configuration of many search systems are biased towards English language support. Non-english languages are often treated as an afterthought (or not at all) in software development and capabilities. We’ve worked hard to make sure Altis has first-class support for non-english and even non-latin based languages.

One language that has many challenges in search is Japanese. Written Japanese can be written in 3 (or even 4) different scripts. This introduces a “normalization” problem, as a single word can be written in many ways. For search purposes, it’s important to return results regardless of the script used in the source material or the search term.

Example between analyzing Japanese and English texts. As Japanese has no machine readable separators between words, there is no way to segment search indexes based off words

Japanese also has limited use of spaces between characters and words. In latin-based languages, generally a [space] character is used to split words, and each word can then be indexed under its root form. In Japanese, this is not always possible, so phrases are broken apart typically by matching against a known set (like a dictionary) or words. Again, this has search implications when using proper nouns such as product names, as those are not in the known set of Japanese dictionary words.

Better defaults with Altis

We have worked hard to make sure Altis supports these challenges in searching across non-English websites and other digital experiences. We achieved this by combining several capabilities that are applied on a language-by-language basis. 

At the ElasticSearch level, we use many language-specific plugins for improving these specific language challengers. For example, we use  analysis-icu for better unicode support which is particularly important for Asian script languages, analysis-kuromoji specifically for Japanese indexing, analysis-smartcn for Chinese, and the same for Vietnamese and Thai.

ElasticSearch analyzers are important at the database level to correctly normalize and search across documents, however that’s not the whole story. Understanding intent in search queries and surfacing relevant content is not something that can be completely automated.

User Dictionaries

To make discovery on Japanese content even more easy, we’ve added support for user dictionaries to Altis. By providing Altis with your industry’s jargon or product proper nouns, ElasticSearch can correctly split and index all content. This is provided via an easy to use configuration page in the Altis Dashboard.

Diagram showing the process of content analysis, indexing and results

Synonyms 

Speaking of adding industry or company specific meta information to the search index, we also support defining custom synonyms for site – in any language. This can be very useful for customers using English jargon when searching a non-English-language website.

Screenshot of Synonyms in the Altis Dashboard

Stop Words

Search isn’t just about adding information and metadata! Stop words are the words that are removed from content when indexing and searching. In English, for example, words like “a”, “and”, “to”, etc. are removed. These words have little semantic meaning and typically get in the way of ranking relevance of results. By default, Altis has stop word lists for many languages, however this is not complete and can vary depending on industry or context. Just like Synonyms, Altis has first class support for defining stop words on a per-site basis.

Search with Altis contributes to better digital experiences

Enabling users to quickly find what the content they are looking for is our goal with Altis. We are focused on improving the technology and user experience in doing so. The capabilities mentioned in this post are some of the ways we believe Altis will make search better, but we continue to make improvements and introduce new approaches and technology to achieve this goal. We are not building to a feature list of search buzzwords – we want to make discovery great and that requires focusing on the user, their challenges, and their feedback. When building technology products, it’s easy to get lost in the work to be done to achieve technical goals, especially a product like Altis which provides many technology frameworks to developers. 

Reach out to us to learn more about improving digital and user experiences through multilingual search and more with the Altis digital experience platform.