AWS offers 14 databases that support diverse data models and include the following types of databases: relational, key-value, document, in-memory, graph, time series, and ledger databases. In this article, we’ll cover the following AWS databases:
To dive deeper into Amazon databases, check out Cloud Academy’s Working with AWS Databases Learning Path. This learning path introduces you to the different AWS database services and some of the features relating to AWS database types.
Data has become more and more valuable to organizations. The information that can be extracted from it has brought rise to the use of data lakes which provide a deeper insight into all of your data, enabling a greater business strategy. As a part of this, the Internet of Things (IoT) industry has also exponentially grown, along with the data that these systems provide. Much of this data can be classified as time-series data — essentially data that assesses how events change over time. To help gather, maintain, and query this data, AWS has developed a new database called Amazon Timestream.
Amazon Timestream is a serverless database offering that is specifically focused on time-series data. Much like other AWS database services, Amazon Timestream is fully managed and takes much of the administration and maintenance out of your hands, giving you time to work with and manage your data. As your data grows, so does your storage. It’s fully scalable, ensuring you never run out of space. Other features include its ability to automatically configure retention, tiering, and data compression. When these features are combined, all of these benefits help to provide a service at a reduced cost.
One of its key features is that it can store and process trillions of events every single day — with huge cost savings against typical relational databases — and it operates at just 1/10th of the cost in fact. Not only is it cost-effective, it can also run 1,000 times faster than these other databases too. Amazon Timestream is optimized to assess, query, and store timestream data by storing data is set time intervals, ranging from milliseconds, microseconds, and even nanoseconds. Whereas, other relational databases are not as efficient to handle data sets across a specified time interval effectively.
There are several use cases where Amazon Timestream would make an effective choice of database. For example, store and analyze clickstream data at scale, where possible, and then use Timestream’s built-in analytic function to gain further insight in the data to understand the customer’s path-to-purchase. Or you might decide to use Amazon Timestream to store IoT time-series data. Using functions such as smoothing, approximation, and interpolation, you can quickly and easily process and analyze this data.
Once you have your time-series data stored within your Amazon Timestream database, you can then use other business intelligence tools or machine learning services to extract information and gain additional insight into the data.
Amazon RDS on VMware
AWS and VMware have partnered when AWS launched the service “VMWare on AWS.” But this time AWS and VMware are collaborating again, but to provide essentially the Amazon RDS service within on-premises VMware environments. It features the same benefits and advantages, such as being a managed-database service. This ensures the administration and management operations — such as patching, database setup, backups, point-in-time restore automatic scaling, and health monitoring — through integration with CloudWatch. However, the actual provisioning of hardware, such as the servers and storage components, are provisioned by you as the customer. It’s important to understand that the service allows you to run Amazon RDS on top of your VMware control plane. There are no AWS physical resources installed in your data center like what AWS Outposts offer. To allow AWS to perform management capabilities, a dedicated VPN tunnel is configured between your AWS environment and your data center.
Having RDS on-site makes it very easy for you to operate and integrate Amazon RDS on VMware within your existing VMware vSphere private data centers. If you are familiar with using Amazon RDS, then the same interface to manage Amazon RDS on VMware is used. And you have the option of running database engines such as Microsoft SQL Server, PostgreSQL, MySQL, and MariaDB database engines, with Oracle to follow shortly after.
The great thing about Amazon RDS on VMware is that you can still leverage the power of the AWS cloud with this service. For example, backup and scaling:
Source: Amazon RDS on VMware
As the service is provisioned over the top of your existing infrastructure on-site, by default the data is also stored on your local volumes. From a backup perspective, you can then leverage the capabilities of Amazon S3 to store the backup snapshots of your data, giving you the comfort of the 11 9’s of availability (99.999999999%) that S3 provides. You will also have the ability to create an Amazon RDS database in AWS using one of the snapshots created from your RDS on VMware resources.
You can also implement other common RDS features, such as standby instances, to provide a level of geographical resiliency and business continuity in the event your primary on-premises instance fails. When configured, a secondary RDS instance, known as a standby, is deployed within the AWS cloud environment. It will then provide a failover option for a primary Amazon RDS on VMware instance. Note that this standby instance is not to be used as a secondary replica to offload read-only traffic — this is the role of the read replica, which is very different.
Speaking of read replicas, it’s also possible to implement these with RDS on VMware as well. Read replicas offer you the ability to serve read-only traffic to remove some performance hit against your primary instance, which is best used for write traffic. For example, let’s assume your RDS on VMware instance — which serves both read and write traffic — has a large amount of read-intensive traffic being directed to it for queries, and the performance of the instance is taking a hit. To help resolve this, you can create a read replica within the AWS environment. This read replica will then maintain a secure link between itself and the primary database. At this point, read-only traffic can be directed to the read replica to serve queries and perhaps from business intelligence tools. Again, the same APIs and interface is used to manage all of these resources as is used when operating Amazon RDS instances.
Similarly to AWS Outposts, you may have a requirement to keep data local within your own data center due to compliance or latency. This makes it a great service to help maintain that level of localized functionality with the additional benefits of controlling all of your Amazon RDS instances through a single pane of glass.
If and when you choose to perform a migration of your RDS on VMware from your local data center to Amazon RDS, then you can do so with a simple one-click migration. This allows you to quickly and easily migrate your databases to multiple regions within AWS, easily and simply without interrupting your customer base. As with all AWS services, security is key. As a result, you can encrypt all of your data both at rest and in transit.
Before getting started with RDS on VMware cloud, do be aware that you will need the following:
- VMware vSphere 6.5+ cluster with active VMware support
- Outbound connectivity to the internet
- Administrative privileges to the cluster
- An AWS account
Amazon Quantum Ledger Database (QLDB)
The final database to be discussed is the Amazon Quantum Ledge Database (QLDB), which was introduced by Andy Jassy alongside the Amazon-Managed Blockchain service. In my “What is Blockchain?,” article, I mentioned that the service integrated with QLDB to allow Managed Blockchain to replicate its network activity to QLDB, and then provide an immutable history of blockchain network activity.
Despite that, what actually is QLDB? It’s yet another fully managed and serverless database service, but one that acts as a ledger database. This means its a great use case for recording financial data over a period of time. For example, it would allow you to maintain a complete history of accounting and transactional data between multiple parties in an immutable, transparent, and cryptographic way through the use of the cryptographic algorithm SHA-256, making it highly secure.
This means you can rest assured that nothing has been changed or can be changed through the use of the database journal which is configured as append-only, essentially the immutable transaction log that records all entries in a sequenced manner over time. This service, therefore, negates the need for an organization to develop and implement their own ledger applications.
This may sound similar to the blockchain technology I discussed in a previous blog, where a ledger is also used. However, in a blockchain, that ledger is distributed across multiple hosts in a decentralized environment; whereas, QLDB is owned and managed by a central and trusted authority. This removes the requirement of a consensus of everyone across the network, which is required with blockchain. Often, ledger applications to fulfill these requirements are added to relational databases. This quickly becomes difficult to manage since they are not immutable, which makes errors difficult to trace — especially during audits.
I mentioned earlier that QLDB is serverless. So again, the administration of having to maintain the underlying infrastructure is removed, and all scaling is managed by AWS which includes any read and write limitations of the database.