How to Build an Effective MongoDB Index Strategy - BigStep Technologies
8159
post-template-default,single,single-post,postid-8159,single-format-standard,cookies-not-set,ajax_fade,page_not_loaded,,qode_grid_1200,vss_responsive_adv,qode-child-theme-ver-1.0.0,qode-theme-ver-1.0.0,qode-theme-bridge child,bridge-child,wpb-js-composer js-comp-ver-5.1.1,vc_responsive

How to Build an Effective MongoDB Index Strategy10 min read

0

Imagine you want to learn more about database performance, and you have in your hands a very large book about databases. How can you search for your topic of interest with fewer efforts?

Without a proper index of a book (usually located on the front page), it would be a time-consuming and tedious job to find out the exact page where the database performance topic is being written. Similarly, when you ask for some document in a database, the database tries to use an index to quickly find the results for you. If there’s no index to use as a reference, it has to check each document, the same way you would have to if your book didn’t have an index. 

In this article, we will discuss indexing and strategies for MongoDB.

What is Indexing

Indexing is a data structure technique that is used to quickly locate and access the data in a database. For any kind of database, indexing is of prime importance, as it helps in the quick and efficient execution of queries. Indexes in MongoDB are like indexes in any other database. MongoDB uses indexes in order to make query processing more efficient. Failure to use indexes in MongoDB will force the database to perform a full collection scan, that is, scan through all the documents in the collection in order to select documents that match an issued query statement. Obviously, it would be time-consuming, especially if there are so many documents involved.

Types of Indexing

MongoDB involves different types of data, hence different types of indexes are derived to support these data types and queries.

1. Single Field Index: It is used to create an index on a single field of a document and it can be user-defined as well apart from the default _id. With single field indexing, MongoDB can traverse in ascending or descending order. That’s why the index key does not matter in this scenario.

 Example: Suppose we have a student collection and we are applying filters on the email field then we should add an { email: 1 } index.

2. Multikey Index:  To index a field that holds an array value, MongoDB creates an index key for each element in the array. If we index a field with an array value, MongoDB creates separate index entries for each element of the array. It, therefore, helps a query to select documents that consist arrays by matching on element or elements of the arrays. This is done automatically by MongoDB, hence no need for one to explicitly define the multikey type.

3. Compound Index: It is used to create an index on multiple fields in a document. Unlike the single indexing system, the compound index does take the order of the fields listed into account when doing a search. The index will need to be in just the right order to provide fast and accurate results. 

Example: Suppose we have a student collection and we are applying filters on first_name and last_name fields then we should add a compound index as { first_name: 1,  last_name: 1} index.

4. Text Index: It supports searching for string content in a collection. These index types do not store language-specific stop words (ie. “the”, “a”, “or”). Text indexes restrict the words in a collection to only store root words.

Regular indexes are helpful for matching the entire value of a field. If you only want to match on a specific word in a field with a lot of text, then use a text index.

5. Hash Indexes: It calculates the value of a particular field and is used to support equality matches, not range-based searches. Sharding, one of the techniques used in MongoDB to improve on horizontal scaling, often involves hash indexes.

6.Geospatial index types: MongoDB’s geospatial index types are particularly useful for those who work with geospatial coordinate data while the text indexes are for anyone who needs to search string content in a collection.

7. Partial Index: Partial indexes only index the documents in a collection that meet a specified filter expression. By indexing a subset of the documents in a collection, partial indexes have lower storage requirements and reduced performance costs for index creation and maintenance.

Indexing Strategy 

In our data-heavy world, almost every database has an indexing strategy, and it’s usually an important part of the database’s overall feature-set. When you are developing an indexing strategy for MongoDB, you will find that there are a number of factors to take into consideration, such as the structure of the data, the pattern of usage, the configuration of the database servers, the ratio of reads to writes, and the amount of free memory on your system etc.

1. Create Indexes to support your queries

An index supports a query when the index contains all the fields scanned by the query. The query scans the index and not the collection. Creating indexes that support queries results in increased query performance. If an appropriate index exists for a query, the database can use the index to limit the number of documents it must inspect.

2. Ensure Indexes fit in RAM

When your index fits in RAM, the system can avoid reading the index from the disk and you get the fastest processing. 

With WiredTiger, MongoDB utilizes both the WiredTiger internal and filesystem cache. Starting in MongoDB 3.4, the default WiredTiger internal cache size is the larger of either: 50% of (RAM – 1 GB), or 256 MB.

For example, on a system with a total of 4GB of RAM, the WiredTiger cache will use 1.5GB of RAM (0.5 * (4 GB – 1 GB) = 1.5 GB). Similarly, a system with a total of 1.25 GB of RAM will allocate 256 MB to the WiredTiger cache because that is more than half of the total RAM minus one gigabyte (0.5 * (1.25 GB – 1 GB) = 128 MB < 256 MB).

3. Use ESR Rule 

The ordering of index keys in a compound index is of vital importance, and the ESR Rule can be used as a rule of thumb to identify the optimal order in most cases.

  • First, add those fields against which Equality queries are run.
  • The next fields to be indexed should reflect the Sort order of the query.
  • The last fields represent the Range of data to be accessed.

4. Eliminate the Unused Index

 Indexes are resource-intensive & all the indexes are stored in MongoDB WiredTiger storage engine in a compressed form. So, it’s advisable to delete or hide those indexes (to save the CPU, Disk I/O) which are not in use.

5. Avoid Regular Expressions

(a) Avoid regular expressions that are not left anchored or rooted Indexes are ordered by value. Leading wildcards are inefficient and may result in full index scanning. Trailing wildcards can be efficient if there are sufficient case-sensitive leading characters in the query expression.

(b) Avoid regular expressions that are case insensitive –  If the sole reason for using a regex is case insensitivity, use a case insensitive index instead, as those are faster.

6. Use Covered queries:  A covered query is a query that can be satisfied entirely using an index and does not have to examine any documents. An index covers a query when all of the following apply:

  1. all the fields in the query are part of an index,
  2. all the fields returned in the results are in the same index
  3. no fields in the query are equal to null (i.e. {“field” : null} or {“field” : {$eq : null}} ).

When the query criteria and the projection of a query include only the indexed fields, MongoDB return results directly from the index without scanning any documents or bringing documents into memory. In this way, these covered queries are very efficient and time-consuming.

To determine whether a query is a covered query or not, use the db.collection.explain() or the explain() method and review the results.

If the explain() output displays totalDocsExamined as 0, this shows that the query is covered by an index.

7. Indexes and Collation: Collation allow users to specify language-specific rules for string comparison, such as rules for letter case and accent marks.

To use an index for string comparisons, an operation must also specify the same collation. That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.

For example, the collection product has an index on a string field category with the collation locale “fr”.

db.product.createIndex( { category: 1 }, { collation: { locale: “fr” } } )

The following query operation, which specifies the same collation as the index, can use the index:

db.product.find( { category: “cafe” } ).collation( { locale: “fr” } )

If we write query without collator then this will use default “simple” binary collator, can’t use the index

db.product.find( { category: “cafe” } )

Conclusion

The main takeaway is obvious: create indexes to support your queries — the benefits are endless. Otherwise, queries require full collection scans, which are limited by your server’s disk subsystem I/O. Server CPU and RAM resources were less of a factor than expected. By simply removing the unused and redundant indexes can significantly improve the database performance.

Here at BigStep Technologies, we have a dedicated team of MongoDB developers to implement the best database strategies. 

Have any MongoDB requirement to discuss, please feel free to contact us at info@bigsteptech.com

Hariom Kumar

Tech Lead @BigStep Technologies. Playing a vital role to develop system applications utilizing standard concepts, practices and procedures. Work closely with other groups like product, business analyst, quality assurance, and project management resources throughout delivery of solutions.

Get Connected with our Experts

Request a Quote

Tell us about your requirements and we'll get back to you soon.