An Overview on the MySQL, Data, and Businesses
The fast-changing business landscape in the last couple of years has set a host of new standards in motion. And, in every single case, efficiency plays a cardinal role – shaping the innovations that, in turn, shape modern-day businesses. The inception and multi-sectoral implementation of the MySQL database is a quintessential example where the efficiency of the open-source, relational database for its reliability, security, performance, flexibility, scalability, and worldwide community support changed the game for many businesses. It is not a surprise that today, MySQL has a prominent share of database users globally, including Facebook, Twitter, Netflix, YouTube, Uber, Airbnb, eBay, Booking.com, Bank of America, and NASA.
Moreover, as the volume of data that organizations will have to manage is significantly increasing, none other than MySQL could have been the popular choice. For instance, per IDC, the total of the world’s data—the DataSphere—is set for a phenomenal growth of 116 zettabytes (ZB), i.e., from 59 ZB in 2020 to 175 ZB by the year 2025. It was also forecasted that by the same year, nearly half of all the data will be stored in public cloud environments with 30% of the data produced and consumed in real-time.
The Challenges of Existing Data & Analytics Solutions
However, what MySQL users also traded alongside the efficiency – is the con of extracting, transforming, loading (ETL) data—from the MySQL database to a dedicated database for analytics, reporting, and results optimization—that invites unnecessary errors and security risks, delays, overhead, and complexities in the process. This not only diminishes the quality and volume of advantages MySQL is designed to offer, however, also stands between a business and the agility of running transactions, reports, and analytics in real-time – all via a single database. In short, the ability to run online transaction processing (OLTP) and online analytical processing (OLAP) workloads on the same MySQL database in real-time is only apt for matching the level of efficiency demanded by businesses today.
Put simply, MySQL is optimized for OLTP, however, it is not designed for analytic processing (OLAP). As a result, organizations which need to efficiently run analytics need to move their data to another database. The approach for moving data to another database introduces complexity and additional cost to customers in multiple ways:
- Applications required to define complex logic for excavating pertinent data from MySQL
- The extracted data needs to be transported to another database across networks securely, consuming network bandwidth and incurring latency
- Data in the other database needs to be manually kept in sync with the MySQL database and as a result the data on which analytics is performed can become stale
- Additional cost and overhead of managing multiple databases for running OLTP and analytics applications
The Oracle MySQL Database Service with HeatWave
The Oracle MySQL Database Service with HeatWave alone serves this purpose, and lets enterprises seamlessly leverage one of the world’s most popular databases for their OLTP and OLAP workloads in real-time. With the help of HeatWave—which is a scalable, distributed, in-memory, and shared-nothing query accelerator for overcoming the limitations of traditional data warehouse and analytics environments that involves periodic, long-running ETL batch jobs required to refresh data—and Oracle’s managed services for the MySQL database on Oracle Cloud Infrastructure (OCI), MySQL can achieve a 400x performance increase than the standard version.
Cost, Time, and Speed: However, that is not all. In comparison to AWS Aurora and Redshift, the Oracle MySQL Database Service affords enterprises a whopping 1100x greater query performance, at one-third of the price. Coupled with HeatWave’s parallel processing, the price-performance goes up to 8100x better against RDS. This form of performance at this price is not extended by any major cloud vendor including Azure, AWS, or Google.
Compatibility: MySQL Database Service also features compatibility with OLTP (Social, Ecommerce, Fintech, SaaS) and analytic applications (Oracle Analytics Cloud [OAC], Tableau, Qlik, Looker) that are also significantly fast than with any other service provider for the database.
OCI/Cloud-Native: The Oracle MySQL Database Service is only available on the Oracle cloud for harnessing all the OCI services including the OAC for great visualization, Cloud Guard for industry-leading security, Container Engine for Kubernetes (OKE), etc. while remaining current with the latest updates from the MySQL.
On-Premises Support: Oracle MySQL Database Service extends support for enterprises that choose to remain on their on-premises data centers and still avail the benefits of the HeatWave analytics engine. Oracle allows enterprises to use the services with the help of standard MySQL replication, for replicating their data to OCI and easily shut down the services after use.
Automation: Pervasive usage of ML and AI in servers and services such as in provisioning nodes, and scheduling queries.
The HeatWave Architecture
There are three key architecture choices which lead to a compelling performance and cost advantage with HeatWave:
- Innovative in-memory columnar analytics engine designed for scalability and performance which implements state of the art algorithms
- Optimized for Oracle Cloud Infrastructure
- The Heatwave engine uses a columnar in-memory representation that facilitates vectorized processing leading to very good query performance. The data is encrypted and compressed prior to loading in memory. This compressed and optimized in memory representation is used for both numeric and string data. This results in significant performance speed up and reduced memory footprint which translates to reduced cost for customer.
One of the key design points of the HeatWave engine is to massively partition data across a cluster of HeatWave nodes, which can be operated upon in parallel in each node. This enables high cache hits for analytic operations and offers exceptional inter-node scalability. Each HeatWave node in a cluster and each core inside a node can process partitioned data in parallel, comprising parallel scans, group-by, joins, aggregation, and top-k processing. HeatWave has employed the state of art algorithms for distributed in-memory analytic processing. Joins within a partition are processed instantly by leveraging vectorized build and probe join kernels. The highly optimized network communication amongst analytics nodes is clocked via asynchronous batch I/Os. The algorithms are built to overlap compute time with communication of data across nodes which supports achieve great scalability.
The HeatWave Deployment Scenarios
HeatWave – a fully Oracle-managed service is available only in the OCI. The MySQL database is significantly enhanced to natively integrate this service, and as a result, customers who store data in MySQL can seamlessly run analytics by simply enabling the service.
Coming to a HeatWave instance, which is a cluster consisting of a MySQL Database Service (MDS) instance and several analytics nodes. When HeatWave is enabled, HeatWave server is installed on the MDS node. It is responsible for cluster management, loading data into the memory of the HeatWave nodes, query scheduling and query execution. MySQL applications written in Java, PHP, Ruby, etc. work seamlessly with HeatWave using standard MySQL ODBC/JDBC connectors. HeatWave is also fully compatible with the MySQL syntax. Hence, all the current applications and tools built leveraging the standard SQL will work without necessitating any adjustment to the queries.
Data which is needed for analytic processing is stored in HeatWave nodes in memory, in a hybrid columnar compressed format. The number of nodes needed to run for a workload depends on the amount of data present for analytic processing, the compression factor which is achieved on the dataset, and the query characteristics. The number of nodes needed can be automatically derived by using the Auto Provisioning advisor which is available with HeatWave.
HeatWave currently supports up to 24 nodes in one cluster and with a processing capacity of approximately 10 TB of analytics data. 10TB is the approximate amount of data which can be populated in the memory of the HeatWave nodes at a given moment. There is no limit to the amount of data which can be stored in the MySQL database and customers can choose which tables or columns from MySQL database schema to load into the memory of HeatWave nodes. If the tables are no longer needed by queries, user can remove the tables from the memory to make room for other data. HeatWave extends an exceptional method for customers who are required to run massive transactional and analytical workloads, simultaneously. It is only made possible by running transactional queries in the MySQL node, and transparently propagating data updated in MySQL InnoDB to the HeatWave cluster for enhanced analytical processing. This empowers customers to simultaneously run both OLTP and real-time OLAP workloads in a single database platform.
On-premises customers who cannot move their MySQL deployment to a Cloud due to compliance or regulatory requirements, can still leverage HeatWave by using the hybrid deployment model as shown in below figure. In such a hybrid deployment, customers can leverage MySQL replication to replicate on-premises MySQL data to HeatWave without the need for ETL.
Native MySQL Analytics
Integration of HeatWave with MySQL Database Service provides a single data management platform for all OLTP, and analytics needs of an enterprise. HeatWave is designed as a MySQL pluggable storage engine, which completely shields all the low-level implementation details at the storage level from the end users. Consequently, users can manage both HeatWave and MySQL Database Service via the identical management tools such as the OCI console, CLI, and REST APIs.
Since HeatWave is an in-memory processing engine, data is persisted in MySQL InnoDB storage engine. This also enables customers to manage analytics data identical to the manner they manage transactional data in MySQL.
Users and applications interact with HeatWave through the MySQL database node in the cluster. Users connect to HeatWave through standard tools and standard-based ODBC/JDBC connectors. HeatWave is compatible with the similar ACID and ANSI SQL standards as MySQL while also supporting diverse data types. This enables existing applications to take advantage of HeatWave without any changes to their application, allowing easy and quick integration.
Upon users submitting a query to the MySQL database, the MySQL query optimizer jumps into action for transparently choosing if the query needs offloading to HeatWave cluster for instant execution. This logically depends on the functions/operators’ (cited in the query) adaptability by HeatWave and the estimated time for query processing in MySQL and HeatWave. As soon as the decision is complete, the query is either sent to HeatWave nodes or MySQL itself for processing. Once processed, the results are sent back to the MySQL database node and returned to users.
Data of HeatWave is persisted in MySQL InnoDB. Any updates to the tables are automatically propagated to the memory of the HeatWave nodes in real time. This allows subsequent queries to always have access to the latest data. This is done behind the scenes by a light-weight change propagation algorithm that can keep up with MySQL data update rates.