Azure Cosmos DB API Services

Naftal Teddy Kerecha
CodeX
Published in
4 min readMar 21, 2022

--

Azure Cosmos DB is a multi-model fully managed NoSQL Database offered by Azure. The service offers multiple database APIs that are chosen depending on the type of data that needs to be stored. The four types of API services offered are Core SQL API, Gremlin API, MongoDB API, Cassandra API, and Azure Table API.

It is important to note that once an instance has been initiated with a particular API, It cannot be changed later on. The Core SQL API is the default API used for instances if an API is not specified, this is because the Core SQL API adequately meets the needs for most use cases. Certain considerations need to be made when deciding what API to use, these include — the existing database that is currently in use, reluctance to rewrite your entire data access layer, and the need to use key features of Azure Cosmos DB.

Azure Cosmos DB offers a unique set of features such as global distribution, elastic storage scaling, low latency, ability to run transactional and analytical workloads and use of a fully managed platform which may be added benefits that a user would like to take advantage of. Let’s look at each of the API’s

Core SQL API — As mentioned earlier, this is the default offering for Azure Cosmos DB and satisfies most scenarios. The API stores the data in a document format, it provides the ability to query the data using SQL. It is important to note that the language is not SQL but very SQL like.

This is also a good option for users who are trying to migrate from other databases such as Oracle, Dynamo DB, and HBase.

MongoDB API — This API allows you to use Cosmos DB as if it were a Mongo DB instance and even extends the use of Mongo DB drivers, tools and SDKs. Mongo DB fills the gap between key-value stores and RDBMS systems which allows for more advanced query functionality.

Utilizing the Mongo DB API offers some key advantages such as Instantaneous scalability, Automatic and transparent sharding, five 9s of availability, and server-less deployments.

The Mongo DB API is implemented through a wire protocol which enables compatibility with tools and SDKs as mentioned above. The wire protocol used is a socket-based request-response protocol that allows the clients to communicate with the database through a regular TCP-IP socket.

The Mongo DB API can be deployed in three ways:

  • Provisioned Throughput — This setting is best for situations where the user has consistent workloads.
  • Auto-scale — With this setting an upper-bound is set on the throughput and it scales automatically to meet your needs. This is the converse of the provisioned as it is great for workloads that lack consistency.
  • Server-less — In this setting you only pay for the throughput that you actually use.

Gremlin API — This allows the users to make graph queries and store data as edges and vertices. This is mainly used in cases where data may not be easily modeled by relational databases. The Gremlin API essentially enables us to utilize the power of graph databases.

The Gremlin API is based on the Apache Tinker-Pop graph computing framework. Apache Tinker-Pop is a graph abstraction layer that works with numerous different graph databases and graph processors which are made up of a graph API and a process API.

For the read and write operations the API uses the Cosmos DB partition strategy.

Cassandra API — Cassandra is a distributed database from Apache that is designed to manage very large amounts of structured data. It provides high availability with no single point of failure. This API is meant to store data in a columnar structure. Just like the Mongo DB API, Cassandra API also uses a wire protocol to enable access to tools and SDKs specific to Cassandra DB.

Cassandra is designed to handle big data workloads across multiple nodes. It offers a peer-to-peer distributed systems across its nodes and data is distributed across all nodes in the cluster. The properties of the nodes in the cluster are:

  • All the nodes in a cluster play the same role, each node is independent and at the same time interconnected to other nodes.
  • Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster.
  • When a node goes down, read or write requests can be served from other nodes in the network.

The replication model in Cassandra uses one or more of the nodes in a cluster as replicas for a given piece of data. If it is detected that some of the nodes responded with an out-of-date value, Cassandra will return the most recent value to the client. After returning the most recent value, Cassandra performs a read repair in the background to update the stale values.

I hope the details above provided more context regarding the various API clients that can be used for Azure Cosmos DB.

--

--

Naftal Teddy Kerecha
CodeX

Data Engineer | Bsc Data Science, Wilfrid Laurier University | Writer for CodeX, Nerd For Tech and Geek Culture | Passionate about Data and the Cloud