1 Faerisar

Db Distinct Expression Requires A Collection Argumentative Essays

SQL queries for Azure Cosmos DB

  • 52 minutes to read
  • Contributors


The Azure Cosmos DB DocumentDB API or SQL (DocumentDB) API is now known as Azure Cosmos DB SQL API. You don't need to change anything to continue running your apps built with DocumentDB API. The functionality remains the same.

Microsoft Azure Cosmos DB supports querying documents using SQL (Structured Query Language) as a JSON query language on SQL API accounts. Azure Cosmos DB is truly schema-free. By virtue of its commitment to the JSON data model directly within the database engine, it provides automatic indexing of JSON documents without requiring explicit schema or creation of secondary indexes.

While designing the query language for Cosmos DB, we had two goals in mind:

  • Instead of inventing a new JSON query language, we wanted to support SQL. SQL is one of the most familiar and popular query languages. Cosmos DB SQL provides a formal programming model for rich queries over JSON documents.
  • As a JSON document database capable of executing JavaScript directly in the database engine, we wanted to use JavaScript's programming model as the foundation for our query language. The SQL API is rooted in JavaScript's type system, expression evaluation, and function invocation. This in-turn provides a natural programming model for relational projections, hierarchical navigation across JSON documents, self joins, spatial queries, and invocation of user-defined functions (UDFs) written entirely in JavaScript, among other features.

We believe that these capabilities are key to reducing the friction between the application and the database and are crucial for developer productivity.

We recommend getting started by watching the following video, where Aravind Ramachandran shows Cosmos DB's querying capabilities, and by visiting our Query Playground, where you can try out Cosmos DB and run SQL queries against our dataset.

Then, return to this article, where we start with a SQL query tutorial that walks you through some simple JSON documents and SQL commands.

Getting started with SQL commands in Cosmos DB

To see Cosmos DB SQL at work, let's begin with a few simple JSON documents and walk through some simple queries against it. Consider these two JSON documents about two families. With Cosmos DB, we do not need to create any schemas or secondary indices explicitly. We simply need to insert the JSON documents to a Cosmos DB collection and subsequently query. Here we have a simple JSON document for the Andersen family, the parents, children (and their pets), address, and registration information. The document has strings, numbers, Booleans, arrays, and nested properties.


Here's a second document with one subtle difference – and are used instead of and .


Now let's try a few queries against this data to understand some of the key aspects of Azure Cosmos DB's SQL query language. For example, the following query returns the documents where the id field matches . Since it's a , the output of the query is the complete JSON document:



Now consider the case where we need to reformat the JSON output in a different shape. This query projects a new JSON object with two selected fields, Name and City, when the address' city has the same name as the state. In this case, "NY, NY" matches.



The next query returns all the given names of children in the family whose id matches ordered by the city of residence.



We would like to draw attention to a few noteworthy aspects of the Cosmos DB query language through the examples we've seen so far:

  • Since SQL API works on JSON values, it deals with tree shaped entities instead of rows and columns. Therefore, the language lets you refer to nodes of the tree at any arbitrary depth, like , similar to relational SQL referring to the two part reference of .
  • The structured query language works with schema-less data. Therefore, the type system needs to be bound dynamically. The same expression could yield different types on different documents. The result of a query is a valid JSON value, but is not guaranteed to be of a fixed schema.
  • Cosmos DB only supports strict JSON documents. This means the type system and expressions are restricted to deal only with JSON types. Refer to the JSON specification for more details.
  • A Cosmos DB collection is a schema-free container of JSON documents. The relations in data entities within and across documents in a collection are implicitly captured by containment and not by primary key and foreign key relations. This is an important aspect worth pointing out in light of the intra-document joins discussed later in this article.

Cosmos DB indexing

Before we get into the SQL syntax, it is worth exploring the indexing design in Azure Cosmos DB.

The purpose of database indexes is to serve queries in their various forms and shapes with minimum resource consumption (like CPU and input/output) while providing good throughput and low latency. Often, the choice of the right index for querying a database requires much planning and experimentation. This approach poses a challenge for schema-less databases where the data doesn’t conform to a strict schema and evolves rapidly.

Therefore, when we designed the Cosmos DB indexing subsystem, we set the following goals:

  • Index documents without requiring schema: The indexing subsystem does not require any schema information or make any assumptions about schema of the documents.
  • Support for efficient, rich hierarchical, and relational queries: The index supports the Cosmos DB query language efficiently, including support for hierarchical and relational projections.
  • Support for consistent queries in face of a sustained volume of writes: For high write throughput workloads with consistent queries, the index is updated incrementally, efficiently, and online in the face of a sustained volume of writes. The consistent index update is crucial to serve the queries at the consistency level in which the user configured the document service.
  • Support for multi-tenancy: Given the reservation-based model for resource governance across tenants, index updates are performed within the budget of system resources (CPU, memory, and input/output operations per second) allocated per replica.
  • Storage efficiency: For cost effectiveness, the on-disk storage overhead of the index is bounded and predictable. This is crucial because Cosmos DB allows the developer to make cost-based tradeoffs between index overhead in relation to the query performance.

Refer to the Azure Cosmos DB samples on MSDN for samples showing how to configure the indexing policy for a collection. Let’s now get into the details of the Azure Cosmos DB SQL syntax.

Basics of an Azure Cosmos DB SQL query

Every query consists of a SELECT clause and optional FROM and WHERE clauses per ANSI-SQL standards. Typically, for each query, the source in the FROM clause is enumerated. Then the filter in the WHERE clause is applied on the source to retrieve a subset of JSON documents. Finally, the SELECT clause is used to project the requested JSON values in the select list.

FROM clause

The clause is optional unless the source is filtered or projected later in the query. The purpose of this clause is to specify the data source upon which the query must operate. Commonly the whole collection is the source, but one can specify a subset of the collection instead.

A query like indicates that the entire Families collection is the source over which to enumerate. A special identifier ROOT can be used to represent the collection instead of using the collection name. The following list contains the rules that are enforced per query:

  • The collection can be aliased, such as or simply . Here is the equivalent of . is an optional keyword to alias the identifier.
  • Once aliased, the original source cannot be bound. For example, is syntactically invalid since the identifier "Families" cannot be resolved anymore.
  • All properties that need to be referenced must be fully qualified. In the absence of strict schema adherence, this is enforced to avoid any ambiguous bindings. Therefore, is syntactically invalid since the property is not bound.


The source can also be reduced to a smaller subset. For instance, to enumerating only a subtree in each document, the subroot could then become the source, as shown in the following example:



While the above example used an array as the source, an object could also be used as the source, which is what's shown in the following example: Any valid JSON value (not undefined) that can be found in the source is considered for inclusion in the result of the query. If some families don’t have an value, they are excluded in the query result.



WHERE clause

The WHERE clause () is optional. It specifies the condition(s) that the JSON documents provided by the source must satisfy in order to be included as part of the result. Any JSON document must evaluate the specified conditions to "true" to be considered for the result. The WHERE clause is used by the index layer in order to determine the absolute smallest subset of source documents that can be part of the result.

The following query requests documents that contain a name property whose value is . Any other document that does not have a name property, or where the value does not match is excluded.



The previous example showed a simple equality query. The SQL API also supports a variety of scalar expressions. The most commonly used are binary and unary expressions. Property references from the source JSON object are also valid expressions.

The following binary operators are currently supported and can be used in queries as shown in the following examples:

Bitwise|, &, ^, <<, >>, >>> (zero-fill right shift)
LogicalAND, OR, NOT
Comparison=, !=, <, >, <=, >=, <>
String|| (concatenate)

Let’s take a look at some queries using binary operators.

The unary operators +,-, ~ and NOT are also supported, and can be used inside queries as shown in the following example:

In addition to binary and unary operators, property references are also allowed. For example, returns the JSON document containing the property where the property's value is equal to the JSON value. Any other values (false, null, Undefined, , , , , etc.) leads to the source document being excluded from the result.

Equality and comparison operators

The following table shows the result of equality comparisons in the SQL API between any two JSON types.

Undefined Undefined Undefined Undefined Undefined Undefined Undefined Undefined
Null Undefined OK Undefined Undefined Undefined Undefined Undefined
Boolean Undefined Undefined OK Undefined Undefined Undefined Undefined
Number Undefined Undefined Undefined OK Undefined Undefined Undefined
String Undefined Undefined Undefined Undefined OK Undefined Undefined
Object Undefined Undefined Undefined Undefined Undefined OK Undefined
Array Undefined Undefined Undefined Undefined Undefined Undefined OK

For other comparison operators such as >, >=, !=, < and <=, the following rules apply:

  • Comparison across types results in Undefined.
  • Comparison between two objects or two arrays results in Undefined.

If the result of the scalar expression in the filter is Undefined, the corresponding document would not be included in the result, since Undefined doesn't logically equate to "true".

BETWEEN keyword

You can also use the BETWEEN keyword to express queries against ranges of values like in ANSI SQL. BETWEEN can be used against strings or numbers.

For example, this query returns all family documents in which the first child's grade is between 1-5 (both inclusive).

Unlike in ANSI-SQL, you can also use the BETWEEN clause in the FROM clause like in the following example.

For faster query execution times, remember to create an indexing policy that uses a range index type against any numeric properties/paths that are filtered in the BETWEEN clause.

The main difference between using BETWEEN in the SQL API and ANSI SQL is that you can express range queries against properties of mixed types – for example, you might have "grade" be a number (5) in some documents and strings in others ("grade4"). In these cases, like in JavaScript, a comparison between two different types results in "undefined", and the document will be skipped.

Logical (AND, OR and NOT) operators

Logical operators operate on Boolean values. The logical truth tables for these operators are shown in the following tables.


IN keyword

The IN keyword can be used to check whether a specified value matches any value in a list. For example, this query returns all family documents where the id is one of "WakefieldFamily" or "AndersenFamily".

This example returns all documents where the state is any of the specified values.

Ternary (?) and Coalesce (??) operators

The Ternary and Coalesce operators can be used to build conditional expressions, similar to popular programming languages like C# and JavaScript.

The Ternary (?) operator can be very handy when constructing new JSON properties on the fly. For example, now you can write queries to classify the class levels into a human readable form like Beginner/Intermediate/Advanced as shown below.

You can also nest the calls to the operator like in the query below.

As with other query operators, if the referenced properties in the conditional expression are missing in any document, or if the types being compared are different, then those documents are excluded in the query results.

The Coalesce (??) operator can be used to efficiently check for the presence of a property (a.k.a. is defined) in a document. This is useful when querying against semi-structured or data of mixed types. For example, this query returns the "lastName" if present, or the "surname" if it isn't present.

Quoted property accessor

You can also access properties using the quoted property operator . For example, and are equivalent. This syntax is useful when you need to escape a property that contains spaces, special characters, or happens to share the same name as a SQL keyword or reserved word.

SELECT clause

The SELECT clause () is mandatory and specifies what values are retrieved from the query, just like in ANSI-SQL. The subset that's been filtered on top of the source documents are passed onto the projection phase, where the specified JSON values are retrieved and a new JSON object is constructed, for each input passed onto it.

The following example shows a typical SELECT query.



Nested properties

In the following example, we are projecting two nested properties and .



Projection also supports JSON expressions as shown in the following example:



Let's look at the role of here. The clause needs to create a JSON object and since no key is provided, we use implicit argument variable names starting with . For example, this query returns two implicit argument variables, labeled and .




Now let's extend the example above with explicit aliasing of values. AS is the keyword used for aliasing. It's optional as shown while projecting the second value as .

In case a query has two properties with the same name, aliasing must be used to rename one or both of the properties so that they are disambiguated in the projected result.



Scalar expressions

In addition to property references, the SELECT clause also supports scalar expressions like constants, arithmetic expressions, logical expressions, etc. For example, here's a simple "Hello World" query.



Here's a more complex example that uses a scalar expression.



In the following example, the result of the scalar expression is a Boolean.



Object and array creation

Another key feature of the SQL API is array/object creation. In the previous example, note that we created a new JSON object. Similarly, one can also construct arrays as shown in the following examples:



VALUE keyword

The VALUE keyword provides a way to return JSON value. For example, the query shown below returns the scalar instead of .



The following query returns the JSON value without the label in the results.



The following example extends this to show how to return JSON primitive values (the leaf level of the JSON tree).



* Operator

The special operator (*) is supported to project the document as-is. When used, it must be the only projected field. While a query like is valid, and are not valid.



TOP Operator

The TOP keyword can be used to limit the number of values from a query. When TOP is used in conjunction with the ORDER BY clause, the result set is limited to the first N number of ordered values; otherwise, it returns the first N number of results in an undefined order. As a best practice, in a SELECT statement, always use an ORDER BY clause with the TOP clause. This is the only way to predictably indicate which rows are affected by TOP.



TOP can be used with a constant value (as shown above) or with a variable value using parameterized queries. For more details, please see parameterized queries below.

Aggregate Functions

You can also perform aggregations in the clause. Aggregate functions perform a calculation on a set of values and return a single value. For example, the following query returns the count of family documents within the collection.



You can also return the scalar value of the aggregate by using the keyword. For example, the following query returns the count of values as a single number:



You can also perform aggregates in combination with filters. For example, the following query returns the count of documents with the address in the state of Washington.



The following table shows the list of supported aggregate functions in the SQL API. and are performed over numeric values, whereas , , and can be performed over numbers, strings, Booleans, and nulls.

COUNTReturns the number of items in the expression.
SUMReturns the sum of all the values in the expression.
MINReturns the minimum value in the expression.
MAXReturns the maximum value in the expression.
AVGReturns the average of the values in the expression.

Aggregates can also be performed over the results of an array iteration. For more information, see Array Iteration in Queries.


When using the Azure portal's Data Explorer, note that aggregation queries may return the partially aggregated results over a query page. The SDKs produces a single cumulative value across all pages.

In order to perform aggregation queries using code, you need .NET SDK 1.12.0, .NET Core SDK 1.1.0, or Java SDK 1.9.5 or above.

ORDER BY clause

Like in ANSI-SQL, you can include an optional Order By clause while querying. The clause can include an optional ASC/DESC argument to specify the order in which results must be retrieved.

For example, here's a query that retrieves families in order of the resident city's name.



And here's a query that retrieves families in order of creation date, which is stored as a number representing the epoch time, i.e, elapsed time since Jan 1, 1970 in seconds.



Advanced database concepts and SQL queries


A new construct was added via the IN keyword in the SQL API to provide support for iterating over JSON arrays. The FROM source provides support for iteration. Let's start with the following example:



Now let's look at another query that performs iteration over children in the collection. Note the difference in the output array. This example splits and flattens the results into a single array.



This can be further used to filter on each individual entry of the array as shown in the following example:



You can also perform aggregation over the result of array iteration. For example, the following query counts the number of children among all families.




In a relational database, the need to join across tables is important. It's the logical corollary to designing normalized schemas. Contrary to this, the SQL API deals with the denormalized data model of schema-free documents. This is the logical equivalent of a "self-join".

The syntax that the language supports is <from_source1> JOIN <from_source2> JOIN ... JOIN <from_sourceN>. Overall, this returns a set of N-tuples (tuple with N values). Each tuple has values produced by iterating all collection aliases over their respective sets. In other words, this is a full cross product of the sets participating in the join.

The following examples show how the JOIN clause works. In the following example, the result is empty since the cross product of each document from source and an empty set is empty.



In the following example, the join is between the document root and the subroot. It's a cross product between two JSON objects. The fact that children is an array is not effective in the JOIN since we are dealing with a single root that is the children array. Hence the result contains only two results, since the cross product of each document with the array yields exactly only one document.



The following example shows a more conventional join:



The first thing to note is that the of the JOIN clause is an iterator. So, the flow in this case is as follows:

  • Expand each child element c in the array.
  • Apply a cross product with the root of the document f with each child element c that was flattened in the first step.
  • Finally, project the root object f name property alone.

The first document () contains only one child element, so the result set contains only a single object corresponding to this document. The second document () contains two children. So, the cross product produces a separate object for each child, thereby resulting in two objects, one for each child corresponding to this document. The root fields in both these documents are the same, just as you would expect in a cross product.

The real utility of the JOIN is to form tuples from the cross-product in a shape that's otherwise difficult to project. Furthermore, as we see in the example below, you could filter on the combination of a tuple that lets' the user chose a condition satisfied by the tuples overall.



This example is a natural extension of the preceding example, and performs a double join. So, the cross product can be viewed as the following pseudo-code:

has one child who has one pet. So, the cross product yields one row (1*1*1) from this family. WakefieldFamily however has two children, but only one child "Jesse" has pets. Jesse has two pets though. Hence the cross product yields 1*1*2 = 2 rows from this family.

In the next example, there is an additional filter on . This excludes all the tuples where the pet name is not "Shadow". Notice that we are able to build tuples from arrays, filter on any of the elements of the tuple, and project any combination of the elements.



JavaScript integration

Azure Cosmos DB provides a programming model for executing JavaScript based application logic directly on the collections in terms of stored procedures and triggers. This allows for both:

  • Ability to do high-performance transactional CRUD operations and queries against documents in a collection by virtue of the deep integration of JavaScript runtime directly within the database engine.
  • A natural modeling of control flow, variable scoping, and assignment and integration of exception handling primitives with database transactions. For more details about Azure Cosmos DB support for JavaScript integration, please refer to the JavaScript server-side programmability documentation.

User-Defined Functions (UDFs)

Along with the types already defined in this article, the SQL API provides support for User Defined Functions (UDF). In particular, scalar UDFs are supported where the developers can pass in zero or many arguments and return a single argument result back. Each of these arguments is checked for being legal JSON values.

The SQL syntax is extended to support custom application logic using these User-Defined Functions. UDFs can be registered with SQL API and then be referenced as part of a SQL query. In fact, the UDFs are exquisitely designed to be invoked by queries. As a corollary to this choice, UDFs do not have access to the context object which the other JavaScript types (stored procedures and triggers) have. Since queries execute as read-only, they can run either on primary or on secondary replicas. Therefore, UDFs are designed to run on secondary replicas unlike other JavaScript types.

Below is an example of how a UDF can be registered at the Cosmos DB database, specifically under a document collection.

The preceding example creates a UDF whose name is . It accepts two JSON string values and and checks if the first matches the pattern specified in the second using JavaScript's string.match() function.

We can now use this UDF in a query in a projection. UDFs must be qualified with the case-sensitive prefix "udf." when called from within queries.


Prior to 3/17/2015, Cosmos DB supported UDF calls without the "udf." prefix like SELECT REGEX_MATCH(). This calling pattern has been deprecated.



The UDF can also be used inside a filter as shown in the example below, also qualified with the "udf." prefix:



In essence, UDFs are valid scalar expressions and can be used in both projections and filters.

To expand on the power of UDFs, let's look at another example with conditional logic:

Below is an example that exercises the UDF.



As the preceding examples showcase, UDFs integrate the power of JavaScript language with the SQL API to provide a rich programmable interface to do complex procedural, conditional logic with the help of inbuilt JavaScript runtime capabilities.

The SQL API provides the arguments to the UDFs for each document in the source at the current stage (WHERE clause or SELECT clause) of processing the UDF. The result is incorporated in the overall execution pipeline seamlessly. If the properties referred to by the UDF parameters are not available in the JSON value, the parameter is considered as undefined and hence the UDF invocation is entirely skipped. Similarly if the result of the UDF is undefined, it's not included in the result.

In summary, UDFs are great tools to do complex business logic as part of the query.

Operator evaluation

Cosmos DB, by the virtue of being a JSON database, draws parallels with JavaScript operators and its evaluation semantics. While Cosmos DB tries to preserve JavaScript semantics in terms of JSON support, the operation evaluation deviates in some instances.

In the SQL API, unlike in traditional SQL, the types of values are often not known until the values are retrieved from database. In order to efficiently execute queries, most of the operators have strict type requirements.

The SQL API doesn't perform implicit conversions, unlike JavaScript. For instance, a query like matches documents that contain an Age property whose value is 21. Any other document whose Age property matches string "21", or other possibly infinite variations like "021", "21.0", "0021", "00021", etc. will not be matched. This is in contrast to the JavaScript where the string values are implicitly casted to numbers (based on operator, ex: ==). This choice is crucial for efficient index matching in the SQL API.

Parameterized SQL queries

Cosmos DB supports queries with parameters expressed with the familiar @ notation. Parameterized SQL provides robust handling and escaping of user input, preventing accidental exposure of data through SQL injection.

For example, you can write a query that takes last name and address state as parameters, and then execute it for various values of last name and address state based on user input.

This request can then be sent to Cosmos DB as a parameterized JSON query like shown below.

The argument to TOP can be set using parameterized queries like shown below.

Parameter values can be any valid JSON (strings, numbers, Booleans, null, even arrays or nested JSON). Also since Cosmos DB is schema-less, parameters are not validated against any type.

Built-in functions

Cosmos DB also supports a number of built-in functions for common operations, that can be used inside queries like user-defined functions (UDFs).

Function groupOperations

If you’re currently using a user-defined function (UDF) for which a built-in function is now available, you should use the corresponding built-in function as it is going to be quicker to run and more efficiently.

Mathematical functions

The mathematical functions each perform a calculation, based on input values that are provided as arguments, and return a numeric value. Here’s a table of supported built-in mathematical functions.

[ABS (num_expr)Returns the absolute (positive) value of the specified numeric expression.
CEILING (num_expr)Returns the smallest integer value greater than, or equal to, the specified numeric expression.
FLOOR (num_expr)Returns the largest integer less than or equal to the specified numeric expression.
EXP (num_expr)Returns the exponent of the specified numeric expression.
LOG (num_expr [,base])Returns the natural logarithm of the specified numeric expression, or the logarithm using the specified base
LOG10 (num_expr)Returns the base-10 logarithmic value of the specified numeric expression.
ROUND (num_expr)Returns a numeric value, rounded to the closest integer value.
TRUNC (num_expr)Returns a numeric value, truncated to the closest integer value.
SQRT (num_expr)Returns the square root of the specified numeric expression.
SQUARE (num_expr)Returns the square of the specified numeric expression.
POWER (num_expr, num_expr)Returns the power of the specified numeric expression to the value specified.
SIGN (num_expr)Returns the sign value (-1, 0, 1) of the specified numeric expression.
ACOS (num_expr)Returns the angle, in radians, whose cosine is the specified numeric expression; also called arccosine.
ASIN (num_expr)Returns the angle, in radians, whose sine is the specified numeric expression. This is also called arcsine.
ATAN (num_expr)Returns the angle, in radians, whose tangent is the specified numeric expression. This is also called arctangent.
ATN2 (num_expr)Returns the angle, in radians, between the positive x-axis and the ray from the origin to the point (y, x), where x and y are the values of the two specified float expressions.
COS (num_expr)Returns the trigonometric cosine of the specified angle, in radians, in the specified expression.
COT (num_expr)Returns the trigonometric cotangent of the specified angle, in radians, in the specified numeric expression.
DEGREES (num_expr)Returns the corresponding angle in degrees for an angle specified in radians.
PI ()Returns the constant value of PI.
RADIANS (num_expr)Returns radians when a numeric expression, in degrees, is entered.
SIN (num_expr)Returns the trigonometric sine of the specified angle, in radians, in the specified expression.
TAN (num_expr)Returns the tangent of the input expression, in the specified expression.

For example, you can now run queries like the following:



The main difference between Cosmos DB’s functions compared to ANSI SQL is that they are designed to work well with schema-less and mixed schema data. For example, if you have a document where the Size property is missing, or has a non-numeric value like “unknown”, then the document is skipped over, instead of returning an error.

Type checking functions

The type checking functions allow you to check the type of an expression within SQL queries. Type checking functions can be used to determine the type of properties within documents on the fly when it is variable or unknown. Here’s a table of supported built-in type checking functions.

IS_ARRAY (expr)Returns a Boolean indicating if the type of the value is an array.
IS_BOOL (expr)Returns a Boolean indicating if the type of the value is a Boolean.
IS_NULL (expr)Returns a Boolean indicating if the type of the value is null.
IS_NUMBER (expr)Returns a Boolean indicating if the type of the value is a number.
IS_OBJECT (expr)Returns a Boolean indicating if the type of the value is a JSON object.
IS_STRING (expr)Returns a Boolean indicating if the type of the value is a string.
IS_DEFINED (expr)Returns a Boolean indicating if the property has been assigned a value.
IS_PRIMITIVE (expr)Returns a Boolean indicating if the type of the value is a string, number, Boolean or null.

Using these functions, you can now run queries like the following:



String functions

The following scalar functions perform an operation on a string input value and return a string, numeric or Boolean value. Here's a table of built-in string functions:

LENGTH (str_expr)Returns the number of characters of the specified string expression
CONCAT (str_expr, str_expr [, str_expr])Returns a string that is the result of concatenating two or more string values.
SUBSTRING (str_expr, num_expr, num_expr)Returns part of a string expression.
STARTSWITH (str_expr, str_expr)Returns a Boolean indicating whether the first string expression starts with the second
ENDSWITH (str_expr, str_expr)Returns a Boolean indicating whether the first string expression ends with the second
CONTAINS (str_expr, str_expr)Returns a Boolean indicating whether the first string expression contains the second.
INDEX_OF (str_expr, str_expr)Returns the starting position of the first occurrence of the second string expression within the first specified string expression, or -1 if the string is not found.
LEFT (str_expr, num_expr)Returns the left part of a string with the specified number of characters.
RIGHT (str_expr, num_expr)Returns the right part of a string with the specified number of characters.
LTRIM (str_expr)Returns a string expression after it removes leading blanks.
RTRIM (str_expr)Returns a string expression after truncating all trailing blanks.
LOWER (str_expr)Returns a string expression after converting uppercase character data to lowercase.
UPPER (str_expr)Returns a string expression after converting lowercase character data to uppercase.
REPLACE (str_expr, str_expr, str_expr)Replaces all occurrences of a specified string value with another string value.
REPLICATE (str_expr, num_expr)Repeats a string value a specified number of times.
REVERSE (str_expr)Returns the reverse order of a string value.

Using these functions, you can now run queries like the following. For example, you can return the family name in uppercase as follows:



Or concatenate strings like in this example:



String functions can also be used in the WHERE clause to filter results, like in the following example:



Array functions

The following scalar functions perform an operation on an array input value and return numeric, Boolean or array value. Here's a table of built-in array functions:

Array functions can be used to manipulate arrays within JSON. For example, here's a query that returns all documents where one of the parents is "Robin Wakefield".



You can specify a partial fragment for matching elements within the array. The following query finds all parents with the of .



Here's another example that uses ARRAY_LENGTH to get the number of children per family.



Spatial functions

Cosmos DB supports the following Open Geospatial Consortium (OGC) built-in functions for geospatial querying.

ST_DISTANCE (point_expr, point_expr)Returns the distance between the two GeoJSON Point, Polygon, or LineString expressions.
ST_WITHIN (point_expr, polygon_expr)Returns a Boolean expression indicating whether the first GeoJSON object (Point, Polygon, or LineString) is within the second GeoJSON object (Point, Polygon, or LineString).
ST_INTERSECTS (spatial_expr, spatial_expr)Returns a Boolean expression indicating whether the two specified GeoJSON objects (Point, Polygon, or LineString) intersect.
ST_ISVALIDReturns a Boolean value indicating whether the specified GeoJSON Point, Polygon, or LineString expression is valid.
ST_ISVALIDDETAILEDReturns a JSON value containing a Boolean value if the specified GeoJSON Point, Polygon, or LineString expression is valid, and if invalid, additionally the reason as a string value.

Spatial functions can be used to perform proximity queries against spatial data. For example, here's a query that returns all family documents that are within 30 km of the specified location using the ST_DISTANCE built-in function.



For more details on geospatial support in Cosmos DB, please see Working with geospatial data in Azure Cosmos DB. That wraps up spatial functions, and the SQL syntax for Cosmos DB. Now let's take a look at how LINQ querying works and how it interacts with the syntax we've seen so far.


LINQ is a .NET programming model that expresses computation as queries on streams of objects. Cosmos DB provides a client-side library to interface with LINQ by facilitating a conversion between JSON and .NET objects and a mapping from a subset of LINQ queries to Cosmos DB queries.

The picture below shows the architecture of supporting LINQ queries using Cosmos DB. Using the Cosmos DB client, developers can create an IQueryable object that directly queries the Cosmos DB query provider, which then translates the LINQ query into a Cosmos DB query. The query is then passed to the Cosmos DB server to retrieve a set of results in JSON format. The returned results are deserialized into a stream of .NET objects on the client side.

.NET and JSON mapping

The mapping between .NET objects and JSON documents is natural - each data member field is mapped to a JSON object, where the field name is mapped to the "key" part of the object and the "value" part is recursively mapped to the value part of the object. Consider the following example: The Family object created is mapped to the JSON document as shown below. And vice versa, the JSON document is mapped back to a .NET object.

C# Class


LINQ to SQL translation

The Cosmos DB query provider performs a best effort mapping from a LINQ query into a Cosmos DB SQL query. In the following description, we assume the reader has a basic familiarity of LINQ.

First, for the type system, we support all JSON primitive types – numeric types, boolean, string, and null. Only these JSON types are supported. The following scalar expressions are supported.

  • Constant values – these include constant values of the primitive data types at the time the query is evaluated.
  • Property/array index expressions – these expressions refer to the property of an object or an array element.

    family.Id; family.children[0].familyName; family.children[0].grade; family.children[n].grade; //n is an int variable

  • Arithmetic expressions - These include common arithmetic expressions on numerical and boolean values. For the complete list, refer to the SQL specification.

    2 * family.children[0].grade; x + y;

  • String comparison expression - these include comparing a string value to some constant string value.

    mother.familyName == "Smith"; child.givenName == s; //s is a string variable

  • Object/array creation expression - these expressions return an object of compound value type or anonymous type or an array of such objects. These values can be nested.

    new Parent { familyName = "Smith", givenName = "Joe" }; new { first = 1, second = 2 }; //an anonymous type with two fields
    new int[] { 3, child.grade, 5 };

List of supported LINQ operators

Here is a list of supported LINQ operators in the LINQ provider included with the SQL .NET SDK.

  • Select: Projections translate to the SQL SELECT including object construction
  • Where: Filters translate to the SQL WHERE, and support translation between && , || and ! to the SQL operators
  • SelectMany: Allows unwinding of arrays to the SQL JOIN clause. Can be used to chain/nest expressions to filter on array elements
  • OrderBy and OrderByDescending: Translates to ORDER BY ascending/descending
  • Count, Sum, Min, Max, and Average operators for aggregation, and their async equivalents CountAsync, SumAsync, MinAsync, MaxAsync, and AverageAsync.
  • CompareTo: Translates to range comparisons. Commonly used for strings since they’re not comparable in .NET
  • Take: Translates to the SQL TOP for limiting results from a query
  • Math Functions: Supports translation from .NET’s Abs, Acos, Asin, Atan, Ceiling, Cos, Exp, Floor, Log, Log10, Pow, Round, Sign, Sin, Sqrt, Tan, Truncate to the equivalent SQL built-in functions.
  • String Functions: Supports translation from .NET’s Concat, Contains, EndsWith, IndexOf, Count, ToLower, TrimStart, Replace, Reverse, TrimEnd, StartsWith, SubString, ToUpper to the equivalent SQL built-in functions.
  • Array Functions: Supports translation from .NET’s Concat, Contains, and Count to the equivalent SQL built-in functions.
  • Geospatial Extension Functions: Supports translation from stub methods Distance, Within, IsValid, and IsValidDetailed to the equivalent SQL built-in functions.
  • User-Defined Function Extension Function: Supports translation from the stub method UserDefinedFunctionProvider.Invoke to the corresponding user-defined function.
  • Miscellaneous: Supports translation of the coalesce and conditional operators. Can translate Contains to String CONTAINS, ARRAY_CONTAINS, or the SQL IN depending on context.

SQL query operators

Here are some examples that illustrate how some of the standard LINQ query operators are translated down to Cosmos DB queries.

Select Operator

The syntax is , where is a scalar expression.

LINQ lambda expression


LINQ lambda expression


LINQ lambda expression


SelectMany operator

The syntax is , where is a scalar expression that returns a collection type.

LINQ lambda expression


Where operator

The syntax is , where is a scalar expression, which returns a Boolean value.

LINQ lambda expression


LINQ lambda expression


Composite SQL queries

The above operators can be composed to form more powerful queries. Since Cosmos DB supports nested collections, the composition can either be concatenated or nested.


The syntax is . A concatenated query can start with an optional query followed by multiple or operators.

LINQ lambda expression


LINQ lambda expression


LINQ lambda expression


LINQ lambda expression



The syntax is where Q is a , , or operator.

In a nested query, the inner query is applied to each element of the outer collection. One important feature is that the inner query can refer to the fields of the elements in the outer collection like self-joins.

LINQ lambda expression


LINQ lambda expression


LINQ lambda expression


Executing SQL queries

Cosmos DB exposes resources through a REST API that can be called by any language capable of making HTTP/HTTPS requests. Additionally, Cosmos DB offers programming libraries for several popular languages like .NET, Node.js, JavaScript, and Python. The REST API and the various libraries all support querying through SQL. The .NET SDK supports LINQ querying in addition to SQL.

The following examples show how to create a query and submit it against a Cosmos DB database account.


Cosmos DB offers an open RESTful programming model over HTTP. Database accounts can be provisioned using an Azure subscription. The Cosmos DB resource model consists of a set of resources under a database account, each of which is addressable using a logical and stable URI. A set of resources is referred to as a feed in this document. A database account consists of a set of databases, each containing multiple collections, each of which in-turn contain documents, UDFs, and other resource types.

The basic interaction model with these resources is through the HTTP verbs GET, PUT, POST, and DELETE with their standard interpretation. The POST verb is used for creation of a new resource, for executing a stored procedure or for issuing a Cosmos DB query. Queries are always read-only operations with no side-effects.

The following examples show a POST for a SQL API query made against a collection containing the two sample documents we've reviewed so far. The query has a simple filter on the JSON name property. Note the use of the

This article briefly introduces databases, and how to use them with Node/Express apps. It then goes on to show how we can use Mongoose to provide database access for the LocalLibrary website. It explains how object schema and models are declared, the main field types, and basic validation. It also briefly shows a few of the main ways in which you can access model data.


Library staff will use the Local Library website to store information about books and borrowers, while library members will use it to browse and search for books, find out whether there are any copies available, and then reserve or borrow them. In order to store and retrieve information efficiently, we will store it in a database.

Express apps can use many different databases, and there are several approaches you can use for performing Create, Read, Update and Delete (CRUD) operations. This tutorial provides a brief overview of some of the available options, and then goes on to show in detail the particular mechanisms selected.

What databases can I use?

Express apps can use any database supported by Node (Express itself doesn't define any specific additional behaviour/requirements for database management). There are many popular options, including PostgreSQL, MySQL, Redis, SQLite, and MongoDB.

When choosing a database, you should consider things like time-to-productivity/learning curve, performance, ease of replication/backup, cost, community support, etc. While there is no single "best" database, almost any of the popular solutions should be more than acceptable for a small-to-medium-sized site like our Local Library.

For more information on the options see: Database integration (Express docs).

What is the best way to interact with a database?

There are two approaches for interacting with a database: 

  • Using the databases' native query language (e.g. SQL)
  • Using an Object Data Model ("ODM") / Object Relational Model ("ORM"). An ODM/ORM represents the website's data as JavaScript objects, which are then mapped to the underlying database. Some ORMs are tied to a specific database, while others provide a database-agnostic backend.

The very best performance can be gained by using SQL, or whatever query language is supported by the database. ODM's are often slower because they use translation code to map between objects and the database format, which may not use the most efficient database queries (this is particularly true if the ODM supports different database backends, and must make greater compromises in terms of what database features are supported).

The benefit of using an ORM is that programmers can continue to think in terms of JavaScript objects rather than database semantics — this is particularly true if you need to work with different databases (on either the same or different websites). They also provide an obvious place to perform validation and checking of data.

Tip:  Using ODM/ORMs often results in lower costs for development and maintenance! Unless you're very familiar with the native query language or performance is paramount, you should strongly consider using an ODM.

What ORM/ODM should I use?

There are many ODM/ORM solutions available on the NPM package manager site (check out the odm and orm tags for a subset!).

A few solutions that were popular at the time of writing are:

  • Mongoose: Mongoose is a MongoDB object modeling tool designed to work in an asynchronous environment.
  • Waterline: An ORM extracted from the Express-based Sails web framework. It provides a uniform API for accessing numerous different databases, including Redis, mySQL, LDAP, MongoDB, and Postgres.
  • Bookshelf: Features both promise-based and traditional callback interfaces, providing transaction support, eager/nested-eager relation loading, polymorphic associations, and support for one-to-one, one-to-many, and many-to-many relations. Works with PostgreSQL, MySQL, and SQLite3.
  • Objection: Makes it as easy as possible to use the full power of SQL and the underlying database engine (supports SQLite3, Postgres, and MySQL).
  • Sequelize is a promise-based ORM for Node.js and io.js. It supports the dialects PostgreSQL, MySQL, MariaDB, SQLite and MSSQL and features solid transaction support, relations, read replication and more.

As a general rule you should consider both the features provided and the "community activity" (downloads, contributions, bug reports, quality of documentation, etc.) when selecting a solution. At time of writing Mongoose is by far the most popular ORM, and is a reasonable choice if you're using MongoDB for your database.

Using Mongoose and MongoDb for the LocalLibrary

For the Local Library example (and the rest of this topic) we're going to use the Mongoose ODM to access our library data. Mongoose acts as a front end to MongoDB, an open source NoSQL database that uses a document-oriented data model. A “collection” of “documents”, in a MongoDB database, is analogous to a “table” of “rows” in a relational database.

This ODM and database combination is extremely popular in the Node community, partially because the document storage and query system looks very like JSON, and is hence familiar to JavaScript developers.

Tip: You don't need to know MongoDB in order to use Mongoose, although parts of the Mongoose documentationare easier to use and understand if you are already familiar with MongoDB.

The rest of this tutorial shows how to define and access the Mongoose schema and models for the LocalLibrary website example.

Designing the LocalLibrary models

Before you jump in and start coding the models, it's worth taking a few minutes to think about what data we need to store and the relationships between the different objects.

We know that we need to store information about books (title, summary, author, genre, ISBN) and that we might have multiple copies available (with globally unique ids, availability statuses, etc.). We might need to store more information about the author than just their name, and there might be multiple authors with the same or similar names. We want to be able to sort information based on book title, author, genre, and category.

When designing your models it makes sense to have separate models for every "object" (group of related information). In this case the obvious objects are books, book instances, and authors.

You might also want to use models to represent selection-list options (e.g. like a drop down list of choices), rather than hard coding the choices into the website itself — this is recommended when all the options aren't known up front or may change. The obvious candidate for a model of this type is the book genre (e.g. Science Fiction, French Poetry, etc.)

Once we've decided on our models and fields, we need to think about the relationships between them.

With that in mind, the UML association diagram below shows the models we'll define in this case (as boxes). As discussed above, we've created models for book (the generic details of the book), book instance (status of specific physical copies of the book available in the system), and author. We have also decided to have a model for genre, so that values can be created dynamically. We've decided not to have a model for the — we will hard code the acceptable values because we don't expect these to change. Within each of the boxes you can see the model name, the field names and types, and also the methods and their return types.

The diagram also shows the relationships between the models, including their multiplicities. The multiplicities are the numbers on the diagram showing the numbers (maximum and minimum) of each model that may be present in the relationship. For example, the connecting line between the boxes shows that and a are related. The numbers close to the model show that a book must have zero or more (as many as you like), while the numbers on the other end of the line next to the show that it can have zero or more associated books.

Note: As discussed in our Mongoose primer below it is often better to have the field that defines the relationship between the documents/models in just one model (you can still find the reverse relationship by searching for the associated in the other model). Below we have chosen to define the relationship between Book/Genre and Book/Author in the Book schema, and the relationship between the Book/BookInstance in the BookInstance Schema. This choice was somewhat arbitrary — we could equally well have had the field in the other schema.

Note: The next section provides a basic primer explaining how models are defined and used. As you read it, consider how we will construct each of the models in the diagram above.

Mongoose primer

This section provides an overview of how to connect Mongoose to a MongoDB database, how to define a schema and a model, and how to make basic queries. 

Installing Mongoose and MongoDB

Mongoose is installed in your project (package.json) like any other dependency — using NPM. To install it, use the following command inside your project folder:

Installing Mongoose adds all its dependencies, including the MongoDB database driver, but it does not install MongoDB itself. If you want to install a MongoDB server then you can download installers from here for various operating systems and install it locally. You can also use cloud-based MongoDB instances.

Note: For this tutorial we'll be using the mLab cloud-based database as a servicesandbox tier to provide the database. This is suitable for development, and makes sense for the tutorial because it makes "installation" operating system independent (database-as-a-service is also one approach you might well use for your production database).

Connecting to MongoDB

Mongoose requires a connection to a MongoDB database. You can and connect to a locally hosted database with , as shown below.

//Import the mongoose module var mongoose = require('mongoose'); //Set up default mongoose connection var mongoDB = 'mongodb://'; mongoose.connect(mongoDB); // Get Mongoose to use the global promise library mongoose.Promise = global.Promise; //Get the default connection var db = mongoose.connection; //Bind connection to error event (to get notification of connection errors) db.on('error', console.error.bind(console, 'MongoDB connection error:'));

You can get the default object with . Once connected, the open event is fired on the instance.

Tip: If you need to create additional connections you can use . This takes the same form of database URI (with host, database, port, options etc.) as and returns a object).

Defining and creating models

Models are defined using the interface. The Schema allows you to define the fields stored in each document along with their validation requirements and default values. In addition, you can define static and instance helper methods to make it easier to work with your data types, and also virtual properties that you can use like any other field, but which aren't actually stored in the database (we'll discuss a bit further below).

Schemas are then "compiled" into models using the method. Once you have a model you can use it to find, create, update, and delete objects of the given type.

Note: Each model maps to a collection of documents in the MongoDB database. The documents will contain the fields/schema types defined in the model .

Defining schemas

The code fragment below shows how you might define a simple schema. First you mongoose, then use the Schema constructor to create a new schema instance, defining the various fields inside it in the constructor's object parameter.

//Require Mongoose var mongoose = require('mongoose'); //Define a schema var Schema = mongoose.Schema; var SomeModelSchema = new Schema({ a_string: String, a_date: Date });

In the case above we just have two fields, a string and a date. In the next sections we will show some of the other field types, validation, and other methods.

Creating a model

Models are created from schemas using the method:

// Define schema var Schema = mongoose.Schema; var SomeModelSchema = new Schema({ a_string: String,   a_date: Date }); // Compile model from schema var SomeModel = mongoose.model('SomeModel', SomeModelSchema );

The first argument is the singular name of the collection that will be created for your model (Mongoose will create the database collection for the above model SomeModel above), and the second argument is the schema you want to use in creating the model.

Note: Once you've defined your model classes you can use them to create, update, or delete records, and to run queries to get all records or particular subsets of records. We'll show you how to do this in the Using models section, and when we create our views.

Schema types (fields)

A schema can have an arbitrary number of fields — each one represents a field in the documents stored in MongoDB. An example schema showing many of the common field types and how they are declared is shown below.

var schema = new Schema( { name: String, binary: Buffer, living: Boolean, updated: { type: Date, default: Date.now }, age: { type: Number, min: 18, max: 65, required: true }, mixed: Schema.Types.Mixed, _someId: Schema.Types.ObjectId, array: [], ofString: [String], // You can also have an array of each of the other types too. nested: { stuff: { type: String, lowercase: true, trim: true } } })

Most of the SchemaTypes (the descriptors after “type:” or after field names) are self explanatory. The exceptions are:

  • : Represents specific instances of a model in the database. For example, a book might use this to represent its author object. This will actually contain the unique ID () for the specified object. We can use the method to pull in the associated information when needed.
  • Mixed: An arbitrary schema type.
  • []: An array of items. You can perform JavaScript array operations on these models (push, pop, unshift, etc.). The examples above show an array of objects without a specified type and an array of  objects, but you can have an array of any type of object.

The code also shows both ways of declaring a field:

  • Field name and type as a key-value pair (i.e. as done with fields , and ).
  • Field name followed by an object defining the , and any other options for the field. Options include things like:
    • default values.
    • built-in validators (e.g. max/min values) and custom validation functions.
    • Whether the field is required
    • Whether fields should automatically be set to lowercase, uppercase, or trimmed (e.g. )

For more information about options see SchemaTypes (Mongoose docs).


Mongoose provides built-in and custom validators, and synchronous and asynchronous validators. It allows you to specify both the acceptable range or values and the error message for validation failure in all cases.

The built-in validators include:

  • All SchemaTypes have the built-in required validator. This is used to specify whether the field must be supplied in order to save a document.
  • Numbers have min and max validators.
  • Strings have:
    • enum: specifies the set of allowed values for the field.
    • match: specifies a regular expression that the string must match.
    • maxlength and minlength for the string.

The example below (slightly modified from the Mongoose documents) shows how you can specify some of the validator types and error messages:

For complete information on field validation see Validation (Mongoose docs).

Virtual properties

Virtual properties are document properties that you can get and set but that do not get persisted to MongoDB. The getters are useful for formatting or combining fields, while setters are useful for de-composing a single value into multiple values for storage. The example in the documentation constructs (and deconstructs) a full name virtual property from a first and last name field, which is easier and cleaner than constructing a full name every time one is used in a template.

Note: We will use a virtual property in the library to define a unique URL for each model record using a path and the record's value.

For more information see Virtuals (Mongoose documentation).

Methods and query helpers

A schema can also have instance methods, static methods, and query helpers. The instance and static methods are similar, but with the obvious difference that an instance method is associated with a particular record and has access to the current object. Query helpers allow you to extend mongoose's chainable query builder API (for example, allowing you to add a query "byName" in addition to the , and methods).

Using models

Once you've created a schema you can use it to create models. The model represents a collection of documents in the database that you can search, while the model's instances represent individual documents that you can save and retrieve.

We provide a brief overview below. For more information see: Models (Mongoose docs).

Creating and modifying documents

To create a record you can define an instance of the model and then call . The examples below assume SomeModel is a model (with a single field "name") that we have created from our schema.


Creation of records (along with updates, deletes, and queries) are asynchronous operations — you supply a callback that is called when the operation completes. The API uses the error-first argument convention, so the first argument for the callback will always be an error value (or null). If the API returns some result, this will be provided as the second argument.

You can also use to define the model instance at the same time as you save it. The callback will return an error for the first argument and the newly-created model instance for the second argument.


Every model has an associated connection (this will be the default connection when you use ). You create a new connection and call on it to create the documents on a different database.

You can access the fields in this new record using the dot syntax, and change the values. You have to call  or to store modified values back to the database.

// Access model field values using dot notation console.log(); //should log '' // Change record by modifying the fields, then calling save(). .name="New cool name";

Searching for records

You can search for records using query methods, specifying the query conditions as a JSON document. The code fragment below shows how you might find all athletes in a database that play tennis, returning just the fields for athlete name and age. Here we just specify one matching field (sport) but you can add more criteria, specify regular expression criteria, or remove the conditions altogether to return all athletes.

If you specify a callback, as shown above, the query will execute immediately. The callback will be invoked when the search completes.

Note: All callbacks in Mongoose use the pattern . If an error occurs executing the query, the parameter will contain an error document, and will be null. If the query is successful, the parameter will be null, and the will be populated with the results of the query.

If you don't specify a callback then the API will return a variable of type Query. You can use this query object to build up your query and then execute it (with a callback) later using the method.

Above we've defined the query conditions in the method. We can also do this using a function, and we can chain all the parts of our query together using the dot operator (.) rather than adding them separately. The code fragment below is the same as our query above, with an additional condition for the age.

The find() method gets all matching records, but often you just want to get one match. The following methods query for a single record:

Note: There is also a method that you can use to get the number of items that match conditions. This is useful if you want to perform a count without actually fetching the records.

There is a lot more you can do with queries. For more information see: Queries (Mongoose docs).

Working with related documents — population

You can create references from one document/model instance to another using the schema field, or from one document to many using an array of . The field stores the id of the related model. If you need the actual content of the associated document, you can use the method in a query to replace the id with the actual data.

For example, the following schema defines authors and stories. Each author can have multiple stories, which we represent as an array of . Each story can have a single author. The "ref" (highlighted in bold below) tells the schema which model can be assigned to this field.

We can save our references to the related document by assigning the value. Below we create an author, then a book, and assign the author id to our stories author field.

Our story document now has an author referenced by the author document's ID. In order to get the author information in our story results we use , as shown below.

Note: Astute readers will have noted that we added an author to our story, but we didn't do anything to add our story to our author's array. How then can we get all stories by a particular author? One way would be to add our author to the stories array, but this would result in us having two places where the information relating authors and stories needs to be maintained.

A better way is to get the of our author, then use to search for this in the author field across all stories.

This is almost everything you need to know about working with related items for this tutorial. For more detailed information see Population (Mongoose docs).

One schema/model per file

While you can create schemas and models using any file structure you like, we highly recommend defining each model schema in its own module (file), exporting the method to create the model. This is shown below:

You can then require and use the model immediately in other files. Below we show how you might use it to get all instances of the model.

Setting up the MongoDB database

Now that we understand something of what Mongoose can do and how we want to design our models, it's time to start work on the LocalLibrary website. The very first thing we want to do is set up a MongoDb database that we can use to store our library data.

For this tutorial we're going to use mLab's free cloud-hosted "sandbox" database. This database tier is not considered suitable for production websites because it has no redundancy, but it is great for development and prototyping. We're using it here because it is free and easy to set up, and because mLab is a popular database as a service vendor that you might reasonably choose for your production database (other popular choices at the time of writing include Compose, ScaleGrid and MongoDB Atlas).

Note: If you prefer you can set up a MongoDb database locally by downloading and installing the appropriate binaries for your system. The rest of the instructions in this article would be similar, except for the database URL you would specify when connecting.

You will first need to create an account with mLab (this is free, and just requires that you enter basic contact details and acknowledge their terms of service). 

After logging in, you'll be taken to the home screen:

  1. Click Create New in the MongoDB Deployments section.
  2. This will open the Cloud Provider Selection screen.

    • Select the SANDBOX (Free) plan from the Plan Type section. 
    • Select any provider from the Cloud Provider section. Different providers offer different regions (displayed below the selected plan type).
    • Click the Continue button.
  3. This will open the Select Region screen.

    • Select the region closest to you and then Continue.

  4. This will open the Final Details screen.

    • Enter the name for the new database as and then select Continue.

  5. This will open the Order Confirmation screen.

    • Click Submit Order to create the database.

  6. You will be returned to the home screen. Click on the new database you just created to open its details screen. As you can see the database has no collections (data).

    The URL that you need to use to access your database is displayed on the form above (shown for this database circled above). In order to use this you need to create a database user that you can specify in the URL.

  7. Click the Users tab and select the Add database user button.
  8. Enter a username and password (twice), and then press Create. Do not select Make read only.

You now have now created the database, and have an URL (with username and password) that can be used to access it. This will look something like: .

Install Mongoose

Open a command prompt and navigate to the directory where you created your skeleton Local Library website. Enter the following command to install Mongoose (and its dependencies) and add it to your package.json file, unless you have already done so when reading the Mongoose Primer above.

npm install mongoose --save

Connect to MongoDB

Open /app.js (in the root of your project) and copy the following text below where you declare the Express application object (after the line ). Replace the database url string ('insert_your_database_url_here') with the location URL representing your own database (i.e. using the information from from mLab).

//Set up mongoose connection var mongoose = require('mongoose'); var mongoDB = 'insert_your_database_url_here'; mongoose.connect(mongoDB); mongoose.Promise = global.Promise; var db = mongoose.connection; db.on('error', console.error.bind(console, 'MongoDB connection error:'));

As discussed in the Mongoose primer above, this code creates the default connection to the database and binds to the error event (so that errors will be printed to the console). 

Defining the LocalLibrary Schema

We will define a separate module for each model, as discussed above. Start by creating a folder for our models in the project root (/models) and then create separate files for each of the models:

/express-locallibrary-tutorial //the project root /modelsauthor.jsbook.jsbookinstance.jsgenre.js

Author model

Copy the schema code shown below and paste it into your ./models/author.js file. The scheme defines an author has having SchemaTypes for the first and family names, that are required and have a maximum of 100 characters, and fields for the date of birth and death.

var mongoose = require('mongoose'); var Schema = mongoose.Schema; var AuthorSchema = new Schema( { first_name: {type: String, required: true, max: 100}, family_name: {type: String, required: true, max: 100}, date_of_birth: {type: Date}, date_of_death: {type: Date}, } ); // Virtual for author's full name AuthorSchema .virtual('name') .get(function () { return this.family_name + ', ' + this.first_name; }); // Virtual for author's URL AuthorSchema .virtual('url') .get(function () { return '/catalog/author/' + this._id; }); //Export model module.exports = mongoose.model('Author', AuthorSchema);

We've also declared a virtual for the AuthorSchema named "url" that returns the absolute URL required to get a particular instance of the model — we'll use the property in our templates whenever we need to get a link to a particular author.

Note: Declaring our URLs as a virtual in the schema is a good idea because then the URL for an item only ever needs to be changed in one place.
At this point a link using this URL wouldn't work, because we haven't got any routes handling code for individual model instances. We'll set those up in a later article!

At the end of the module we export the model.

Book model

Copy the schema code shown below and paste it into your ./models/book.js file. Most of this is similar to the author model — we've declared a schema with a number of string fields and a virtual for getting the URL of specific book records, and we've exported the model.

var mongoose = require('mongoose'); var Schema = mongoose.Schema; var BookSchema = new Schema( { title: {type: String, required: true}, author: {type: Schema.ObjectId, ref: 'Author', required: true}, summary: {type: String, required: true}, isbn: {type: String, required: true}, genre: [{type: Schema.ObjectId, ref: 'Genre'}] } ); // Virtual for book's URL BookSchema .virtual('url') .get(function () { return '/catalog/book/' + this._id; }); //Export model module.exports = mongoose.model('Book', BookSchema);

The main difference here is that we've created two references to other models:

  • author is a reference to a single model object, and is required.
  • genre is a reference to an array of model objects. We haven't declared this object yet!

BookInstance model

Finally, copy the schema code shown below and paste it into your ./models/bookinstance.js file. The represents a specific copy of a book that someone might borrow, and includes information about whether the copy is available or on what date it is expected back, "imprint" or version details.

var mongoose = require('mongoose'); var Schema = mongoose.Schema; var BookInstanceSchema = new Schema( { book: { type: Schema.ObjectId, ref: 'Book', required: true }, //reference to the associated book imprint: {type: String, required: true}, status: {type: String, required: true, enum: ['Available', 'Maintenance', 'Loaned', 'Reserved'], default: 'Maintenance'}, due_back: {type: Date, default: Date.now} } ); // Virtual for bookinstance's URL BookInstanceSchema .virtual('url') .get(function () { return '/catalog/bookinstance/' + this._id; }); //Export model module.exports = mongoose.model('BookInstance', BookInstanceSchema);

The new things we show here are the field options:

  • : This allows us to set the allowed values of a string. In this case we use it to specify the availability status of our books (using an enum means that we can prevent mis-spellings and arbitrary values for our status)
  • : We use default to set the default status for newly created bookinstances to maintenance and the default date to (note how you can call the Date function when setting the date!)

Everything else should be familiar from our previous schema.

Genre model - challenge!

Open your ./models/genre.js file and create a schema for storing genres (the category of book, e.g. whether it is fiction or non-fiction, romance or military history, etc).

The definition will be very similar to the other models:

  • The model should have a SchemaType called to describe the genre.
  • This name should be required and have between 3 and 100 characters.
  • Declare a virtual for the genre's URL, named .
  • Export the model.

Testing — create some items

That's it. We now have all models for the site set up!

In order to test the models (and to create some example books and other items that we can use in our next articles) we'll now run an independent script to create items of each type:

  1. Download (or otherwise create) the file populatedb.js inside your express-locallibrary-tutorial directory (in the same level as ).

    Note: You don't need to know how populatedb.js works; it just adds sample data into the database.

  2. Enter the following commands in the project root to install the async module that is required by the script (we'll discuss this in later tutorials, ) npm install async --save
  3. Run the script using node in your command prompt, passing in the URL of your MongoDB database (the same one you replaced the insert_your_database_url_here placeholder with, inside earlier): node populatedb <your mongodb url>​​​​
  4. The script should run through to completion, displaying items as it creates them in the terminal.

Tip: Go to your database on mLab. You should now be able to drill down into individual collections of Books, Authors, Genres and BookInstances, and check out individual documents.


In this article we've learned a bit about databases and ORMs on Node/Express, and a lot about how Mongoose schema and models are defined. We then used this information to design and implement , , and models for the LocalLibrary website.

Last of all we tested our models by creating a number of instances (using a standalone script). In the next article we'll look at creating some pages to display these objects.

See also

In this module

Leave a Comment


Your email address will not be published. Required fields are marked *