Moving from MongoDB Full-Text Search to ElasticSearch (for Node.Js Applications)

Andriy Kaminskyy, Software Developer at Bitcom Systems on 2018-10-04

Background

Full-Text search is one of the most nuanced and complicated ’puzzles’ of application development. Many popular database management systems provide built-in solutions for that purpose and MongoDB is not an exception. Starting from version 2.4, Mongo has support for full-text indices, which provide out-of-the-box capability for full-text search.

This brings up the question: “Why would anyone want to switch from MongoDb to ElasticSearch (or any other dedicated solution)?”

The problem

Unfortunately, the simplest solution is usually a suboptimal one. The two main problems that plague MongoDB’s full text search are low performance and high resource usage. This, combined with a large enough dataset renders MongoDB an unusable solution for most practical uses as queries are take dozens of seconds to execute. Another problem is a complete lack of customization or adjustments for search results, so if Mongodb’s search result does not satisfy your needs then you are out of luck.

The conclusion of these comparisons is obvious:
MongoDb is a perfectly capable full-text search tool in testing and prototyping scenarios, but does not scale up well. So if your budget allows, it is preferred to use dedicated solutions (like ElasticSearch) in production environment.

Our use-case is a javascript based mobile app which ‘speaks’ to Node.js backend (with MongoDb as a main database).

Elasticsearch

ElasticSearch is a full-fledged solution for working with data, complete with data storage, REST Api and a search engine (Apache Lucene). On top of that ElasticSearch is highly customizable and has plugin support.
Since its inception elasticsearch has been used by many companies. Some of them are:

  • Ebay
  • StackExchange
  • Uber
  • Adobe

Installation

Elasticsearch is developed alongside two other products: Kibana and Logstash. Together they form “ELK” stack - a full-fledged system built to insert, transform, store, analyze and visualize your data from any source. Nonetheless, each of them is completely functional on its own.

Elasticsearch can be downloaded from here.
After installing it we recommend to edit a configuration file located at /etc/elasticsearch/elasticsearch.yml according to your needs.

Moving your data

The biggest challenge of integrating Elasticsearch into existing application is data migration. There are multiple solutions to this problem:

Mongoosastic (http://github.com/mongoosastic/mongoosastic)

Installation: npm install --save mongoosastic

If you happen to use mongoose on your project this may be the perfect solution. This package is a mongoose plugin that integrates your existing mongoose schema into Elasticsearch. Then, every insert or update request is duplicated to elasticsearch (and transformed to match elastic’s datatypes and schemes).

Mongo-connector (http://github.com/yougov/mongo-connector)

This is a standalone package that runs without any interference from your api code and, therefore, does not care which language/framework you use. On the other hand it requires you to have a working installation of Python on your machine.

Installation: pip install mongo-connector

Previously this package was maintained by MongoDb employees.

Monstache (http://rwynn.github.io/monstache-site)

This is the newest, but most actively maintained package. Written in Go, this package works similarly to mongo-connector. Its main benefit is that it is a single binary without any external dependencies and the author already provides prebuilt binaries here.

Do it yourself

Finally, if packages listed above do not suit your needs. You can make requests to put records to Elasticsearch directly from your code.

How to Use Elasticsearch

There are two ways to use Elasticsearch:

REST Api

You do interaction with REST Api via simple http requests. For example:

Create index:

PUT /customer?pretty

Put a new document to our index:

PUT /customer/_doc/1
{
    "name":  "John Doe"
}

Retrieve our document:

GET /customer/_doc/1?pretty

For advanced usage refer to documentation.

Use a library

Elasticsearch development team provides libraries for multiple languages which, essentially, are wrappers around REST Api. Specifically, we will examine NodeJs library here (browser builds are also available).

Installation: npm install --save elasticsearch

Example:

Initialize a client
var elasticsearch = require('elasticsearch');  
var client = new elasticsearch.Client({
    host:  'localhost:9200', log:  'trace'
});
Make a search query
client.search({
    index: 'twitter', 
    type: 'tweets', 
    body: { 
        query: { match: { body: 'elasticsearch' } }
    }
}).then(function (resp) {
    var hits = resp.hits.hits;
},  function (err) {
    console.trace(err.message);
});

Api reference available here.

Results

After moving our search functionality to elasticsearch, response time for search endpoints have decreased from 20 seconds to 500ms.

MongoDb query
OurCollection.find({ 
        $text: { $search: finalKeyword, $caseSensitive: false }, 
        hidden: { $ne: true }, 
        $or: [ { draft: false }, { 'source.id': user } ] 
    }, { score: { $meta: 'textScore' } 
})
.sort({ score: { $meta: 'textScore' } })
.skip(pagination.skip)
.limit(pagination.limit)
Elasticsearch query
index: 'OurCollection',
body: {
  query: {
    bool: {
      must: {  
          match: { 
              name: { query, operator: 'and', "fuzziness": "AUTO" }
          }
      },
      should: [ { term: { 'source.id': user } } ],
      must_not: [
        { bool: {
          must: { term: { draft: true } },
          must_not: { term: { 'source.id': user } }
        } },
        { term: { hidden: true } }
      ]
    }
  }
},
from: pagination.skip,
size: pagination.size

The graph below shows correlation between response times and number of records for MongoDb and Elasticsearch:

mongo vs elasticsearch graph

As you can see, when the number of documents is not very high, the MongoDb’s response time is on par with Elasticsearch. However, at a certain point response time for MongoDb plummets to unacceptable amounts while Elasticsearch’s result remains constant.

Another takeaway from this graph is that MongoDb heavily depends on RAM amount for its full-text search functionality.

Disadvantages

By now, you may think that Elasticsearch is an all-around superior solution compared to MongoDb, so you should ditch the latter one completely. Unfortunately this is not the case. There are multiple reasons for that:

Security

Elasticsearch has no features for authentication or authorization. Neither has it capabilities to manage its users’ privileges and permissions. Therefore its not suitable to be exposed to the open Internet. What’s more, it has absolutely no protection against Denial-of-service attacks.

Delays

While Elasticsearch has great response times for search queries, it comes with a hefty price: its indexing time is much longer. This lengthy and throughout indexing process is required to allow fast search without overloading the RAM.

Maintenance

Elasticsearch is a very complex tool and, therefore, requires complex maintenance, especially if your system has a lot of users. It also lacks such convenient tools as mongodump and mongorestore to allow easy backups and restores.

Complexity

As we have already said, MongoDb’s main advantage is that it’s easy to use. Writing queries and index configurations for Elasticsearch, on the other hand, can be quite daunting. This means that development time (and therefore cost) will be much higher with Elasticsearch as the main database (compared to mongo).

Reliability

Another problem is that Elasticsearch was not built to provide a complete fail-safe datastore. Events link power outages or hardware failure have a high change to bring down your search cluster to the point of complete irrecoverability.

Conclusion

ElasticSearch has proven to be a fast and efficient solution for the full-text search. Its main strengths are exceptional and reliable speed, very high customizability and outstanding flexibility. These qualities make Elasticsearch a great solution for many different use-cases and scales.