Hybrid OLTP/Analytics Database Workloads: Replicating MySQL Data to ClickHouse

February 20, 2019, 12:11 pm

≫ Next: Hybrid OLTP/Analytics Database Workloads in Galera Cluster Using Asynchronous Slaves

≪ Previous: How to Cluster Odoo 12 with PostgreSQL Streaming Replication for High Availability

How to run analytics on MySQL?

MySQL is a great database for Online Transaction Processing (OLTP) workloads. For some companies, it used to be more than enough for a long time. Times have changed and the business requirements along with them. As businesses aspire to be more data-driven, more and more data is stored for further analysis; customer behavior, performance patterns, network traffic, logs, etc. No matter what industry you are in, it is very likely that there is data which you want to keep and analyze to better understand what is going on and how to improve your business. Unfortunately, for storing and querying the large amount of data, MySQL is not the best option. Sure, it can do it and it has tools to help accommodate large amounts of data (e.g.,InnoDB compression), but using a dedicated solution for Online Analytics Processing (OLAP) most likely will greatly improve your ability to store and query a large quantity of data.

One way of tackling this problem will be to use a dedicated database for running analytics. Typically, you want to use a columnar datastore for such tasks - they are more suitable for handling large quantities of data: data stored in columns typically is easier to compress, it is also easier to access on per column basis - typically you ask for some data stored in a couple of columns - an ability to retrieve just those columns instead of reading all of the rows and filter out unneeded data makes the data accessed faster.

How to replicate data from MySQL to ClickHouse?

An example of the columnar datastore which is suitable for analytics is ClickHouse, an open source column store. One challenge is to ensure the data in ClickHouse is in sync with the data in MySQL. Sure, it is always possible to setup a data pipeline of some sort and perform automated batch loading into ClickHouse. But as long as you can live with some limitations, there’s a better way of setting up almost real-time replication from MySQL into ClickHouse. In this blog post we will take a look at how it can be done.

ClickHouse installation

First of all we need to install ClickHouse. We’ll use the quickstart from the ClickHouse website.

sudo apt-get install dirmngr    # optional
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4    # optional

echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
sudo service clickhouse-server start

Once this is done, we need to find a means to transfer the data from MySQL into ClickHouse. One of the possible solutions is to use Altinity’s clickhouse-mysql-data-reader. First of all, we have to install pip3 (python3-pip in Ubuntu) as Python in version at least 3.4 is required. Then we can use pip3 to install some of the required Python modules:

pip3 install mysqlclient
pip3 install mysql-replication
pip3 install clickhouse-driver

Once this is done, we have to clone the repository. For Centos 7, RPM’s are also available, it is also possible to install it using pip3 (clickhouse-mysql package) but we found that the version available through pip does not contain the latest updates and we want to use master branch from git repository:

git clone https://github.com/Altinity/clickhouse-mysql-data-reader

Then, we can install it using pip:

pip3 install -e /path/to/clickhouse-mysql-data-reader/

Next step will be to create MySQL users required by clickhouse-mysql-data-reader to access MySQL data:

mysql> CREATE USER 'chreader'@'%' IDENTIFIED BY 'pass';
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE USER 'chreader'@'127.0.0.1' IDENTIFIED BY 'pass';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE USER 'chreader'@'localhost' IDENTIFIED BY 'pass';
Query OK, 0 rows affected (0.02 sec)
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'chreader'@'%';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'chreader'@'127.0.0.1';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'chreader'@'localhost';
Query OK, 0 rows affected, 1 warning (0.01 sec)

You should also review your MySQL configuration to ensure you have binary logs enabled, max_binlog_size is set to 768M, binlogs are in ‘row’ format and that the tool can connect to the MySQL. Below is an excerpt from the documentation:

[mysqld]
# mandatory
server-id        = 1
log_bin          = /var/lib/mysql/bin.log
binlog-format    = row # very important if you want to receive write, update and delete row events
# optional
expire_logs_days = 30
max_binlog_size  = 768M
# setup listen address
bind-address     = 0.0.0.0

Importing the data

When everything is ready you can import the data into ClickHouse. Ideally you would run the import on a host with tables locked so that no change will happen during the process. You can use a slave as the source of the data. The command to run will be:

clickhouse-mysql --src-server-id=1 --src-wait --nice-pause=1 --src-host=10.0.0.142 --src-user=chreader --src-password=pass --src-tables=wiki.pageviews --dst-host=127.0.0.1 --dst-create-table --migrate-table

It will connect to MySQL on host 10.0.0.142 using given credentials, it will copy the table ‘pageviews’ in the schema ‘wiki’ to a ClickHouse running on the local host (127.0.0.1). Table will be created automatically and data will be migrated.

For the purpose of this blog we imported roughly 50 million rows from “pageviews” dataset made available by Wikimedia Foundation. The table schema in MySQL is:

mysql> SHOW CREATE TABLE wiki.pageviews\G
*************************** 1. row ***************************
       Table: pageviews
Create Table: CREATE TABLE `pageviews` (
  `date` date NOT NULL,
  `hour` tinyint(4) NOT NULL,
  `code` varbinary(255) NOT NULL,
  `title` varbinary(1000) NOT NULL,
  `monthly` bigint(20) DEFAULT NULL,
  `hourly` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`date`,`hour`,`code`,`title`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)

The tool translated this into following ClickHouse schema:

vagrant.vm :) SHOW CREATE TABLE wiki.pageviews\G

SHOW CREATE TABLE wiki.pageviews

Row 1:
──────
statement: CREATE TABLE wiki.pageviews ( date Date,  hour Int8,  code String,  title String,  monthly Nullable(Int64),  hourly Nullable(Int64)) ENGINE = MergeTree(date, (date, hour, code, title), 8192)

1 rows in set. Elapsed: 0.060 sec.

Once the import is done, we can compare the contents of MySQL:

mysql> SELECT COUNT(*) FROM wiki.pageviews\G
*************************** 1. row ***************************
COUNT(*): 50986914
1 row in set (24.56 sec)

and in ClickHouse:

vagrant.vm :) SELECT COUNT(*) FROM wiki.pageviews\G

SELECT COUNT(*)
FROM wiki.pageviews

Row 1:
──────
COUNT(): 50986914

1 rows in set. Elapsed: 0.014 sec. Processed 50.99 million rows, 50.99 MB (3.60 billion rows/s., 3.60 GB/s.)

Even in such a small table you can clearly see that MySQL required more time to scan through it than ClickHouse.

When starting the process to watch the binary log for events, ideally you would pass the information about the binary log file and position from where the tool should start listening. You can easily check that on the slave after the initial import completed.

clickhouse-mysql --src-server-id=1 --src-resume --src-binlog-file='binlog.000016' --src-binlog-position=194 --src-wait --nice-pause=1 --src-host=10.0.0.142 --src-user=chreader --src-password=pass --src-tables=wiki.pageviews --dst-host=127.0.0.1 --pump-data --csvpool

If you won’t pass it, it will just start listening for anything that comes in:

clickhouse-mysql --src-server-id=1 --src-resume --src-wait --nice-pause=1 --src-host=10.0.0.142 --src-user=chreader --src-password=pass --src-tables=wiki.pageviews --dst-host=127.0.0.1 --pump-data --csvpool

Let’s load some more data and see how it will work out for us. We can see that everything seems ok by looking at the logs of clickhouse-mysql-data-reader:

2019-02-11 15:21:29,705/1549898489.705732:INFO:['wiki.pageviews']
2019-02-11 15:21:29,706/1549898489.706199:DEBUG:class:<class 'clickhouse_mysql.writer.poolwriter.PoolWriter'> insert
2019-02-11 15:21:29,706/1549898489.706682:DEBUG:Next event binlog pos: binlog.000016.42066434
2019-02-11 15:21:29,707/1549898489.707067:DEBUG:WriteRowsEvent #224892 rows: 1
2019-02-11 15:21:29,707/1549898489.707483:INFO:['wiki.pageviews']
2019-02-11 15:21:29,707/1549898489.707899:DEBUG:class:<class 'clickhouse_mysql.writer.poolwriter.PoolWriter'> insert
2019-02-11 15:21:29,708/1549898489.708083:DEBUG:Next event binlog pos: binlog.000016.42066595
2019-02-11 15:21:29,708/1549898489.708659:DEBUG:WriteRowsEvent #224893 rows: 1

What we have to keep in mind are the limitations of the tool. The biggest one is that it supports INSERTs only. There’s no support for DELETE or UPDATE. There is also no support for DDL’s therefore any incompatible schema changes executed on MySQL will break the MySQL to ClickHouse replication.

Also worth noting is the fact that the developers of the script recommend to use pypy to improve the performance of the tool. Let’s go through some steps required to set this up.

At first you have to download and decompress pypy:

wget https://bitbucket.org/squeaky/portable-pypy/downloads/pypy3.5-7.0.0-linux_x86_64-portable.tar.bz2
tar jxf pypy3.5-7.0.0-linux_x86_64-portable.tar.bz2
cd pypy3.5-7.0.0-linux_x86_64-portable

Next, we have to install pip and all the requirements for the clickhouse-mysql-data-reader - exactly the same things we covered earlier, while describing regular setup:

./bin/pypy -m ensurepip
./bin/pip3 install mysql-replication
./bin/pip3 install clickhouse-driver
./bin/pip3 install mysqlclient

Last step will be to install clickhouse-mysql-data-reader from the github repository (we assume it has already been cloned):

./bin/pip3 install -e /path/to/clickhouse-mysql-data-reader/

That’s all. Starting from now you should run all the commands using the environment created for pypy:

./bin/pypy ./bin/clickhouse-mysql

Tests

Data has been loaded, we can verify that everything went smoothly by comparing the size of the table:

MySQL:

mysql> SELECT COUNT(*) FROM wiki.pageviews\G
*************************** 1. row ***************************
COUNT(*): 204899465
1 row in set (1 min 40.12 sec)

ClickHouse:

vagrant.vm :) SELECT COUNT(*) FROM wiki.pageviews\G

SELECT COUNT(*)
FROM wiki.pageviews

Row 1:
──────
COUNT(): 204899465

1 rows in set. Elapsed: 0.100 sec. Processed 204.90 million rows, 204.90 MB (2.04 billion rows/s., 2.04 GB/s.)

Everything looks correct. Let’s run some queries to see how ClickHouse behaves. Please keep in mind all this setup is far from production-grade. We used two small VM’s, 4GB of memory, one vCPU each. Therefore even though the dataset wasn’t big, it was enough to see the difference. Due to small sample it is quite hard to do “real” analytics but we still can throw some random queries.

Let’s check what days of the week we have data from and how many pages have been viewed per day in our sample data:

vagrant.vm :) SELECT count(*), toDayOfWeek(date) AS day FROM wiki.pageviews GROUP BY day ORDER BY day ASC;

SELECT
    count(*),
    toDayOfWeek(date) AS day
FROM wiki.pageviews
GROUP BY day
ORDER BY day ASC

┌───count()─┬─day─┐
│  50986896 │   2 │
│ 153912569 │   3 │
└───────────┴─────┘

2 rows in set. Elapsed: 2.457 sec. Processed 204.90 million rows, 409.80 MB (83.41 million rows/s., 166.82 MB/s.)

In case of MySQL this query looks like below:

mysql> SELECT COUNT(*), DAYOFWEEK(date) AS day FROM wiki.pageviews GROUP BY day ORDER BY day;
+-----------+------+
| COUNT(*)  | day  |
+-----------+------+
|  50986896 |    3 |
| 153912569 |    4 |
+-----------+------+
2 rows in set (3 min 35.88 sec)

As you can see, MySQL needed 3.5 minute to do a full table scan.

Now, let’s see how many pages have monthly value greater than 100:

vagrant.vm :) SELECT count(*), toDayOfWeek(date) AS day FROM wiki.pageviews WHERE  monthly > 100 GROUP BY day;

SELECT
    count(*),
    toDayOfWeek(date) AS day
FROM wiki.pageviews
WHERE monthly > 100
GROUP BY day

┌─count()─┬─day─┐
│   83574 │   2 │
│  246237 │   3 │
└─────────┴─────┘

2 rows in set. Elapsed: 1.362 sec. Processed 204.90 million rows, 1.84 GB (150.41 million rows/s., 1.35 GB/s.)

In case of MySQL it’s again 3.5 minutes:

mysql> SELECT COUNT(*), DAYOFWEEK(date) AS day FROM wiki.pageviews WHERE YEAR(date) = 2018 AND monthly > 100 GROUP BY day;
^@^@+----------+------+
| COUNT(*) | day  |
+----------+------+
|    83574 |    3 |
|   246237 |    4 |
+----------+------+
2 rows in set (3 min 3.48 sec)

Another query, just a lookup based on some string values:

vagrant.vm :) select * from wiki.pageviews where title LIKE 'Main_Page' AND code LIKE 'de.m' AND hour=6;

SELECT *
FROM wiki.pageviews
WHERE (title LIKE 'Main_Page') AND (code LIKE 'de.m') AND (hour = 6)

┌───────date─┬─hour─┬─code─┬─title─────┬─monthly─┬─hourly─┐
│ 2018-05-01 │    6 │ de.m │ Main_Page │       8 │      0 │
└────────────┴──────┴──────┴───────────┴─────────┴────────┘
┌───────date─┬─hour─┬─code─┬─title─────┬─monthly─┬─hourly─┐
│ 2018-05-02 │    6 │ de.m │ Main_Page │      17 │      0 │
└────────────┴──────┴──────┴───────────┴─────────┴────────┘

2 rows in set. Elapsed: 0.015 sec. Processed 66.70 thousand rows, 4.20 MB (4.48 million rows/s., 281.53 MB/s.)

Another query, doing some lookups in the string and a condition based on ‘monthly’ column:

vagrant.vm :) select title from wiki.pageviews where title LIKE 'United%Nations%' AND code LIKE 'en.m' AND monthly>100 group by title;

SELECT title
FROM wiki.pageviews
WHERE (title LIKE 'United%Nations%') AND (code LIKE 'en.m') AND (monthly > 100)
GROUP BY title

┌─title───────────────────────────┐
│ United_Nations                  │
│ United_Nations_Security_Council │
└─────────────────────────────────┘

2 rows in set. Elapsed: 0.083 sec. Processed 1.61 million rows, 14.62 MB (19.37 million rows/s., 175.34 MB/s.)

In case of MySQL it looks as below:

mysql> SELECT * FROM wiki.pageviews WHERE title LIKE 'Main_Page' AND code LIKE 'de.m' AND hour=6;
+------------+------+------+-----------+---------+--------+
| date       | hour | code | title     | monthly | hourly |
+------------+------+------+-----------+---------+--------+
| 2018-05-01 |    6 | de.m | Main_Page |       8 |      0 |
| 2018-05-02 |    6 | de.m | Main_Page |      17 |      0 |
+------------+------+------+-----------+---------+--------+
2 rows in set (2 min 45.83 sec)

So, almost 3 minutes. The second query is the same:

mysql> select title from wiki.pageviews where title LIKE 'United%Nations%' AND code LIKE 'en.m' AND monthly>100 group by title;
+---------------------------------+
| title                           |
+---------------------------------+
| United_Nations                  |
| United_Nations_Security_Council |
+---------------------------------+
2 rows in set (2 min 40.91 sec)

Of course, one can argue that you can add more indexes to improve query performance, but the fact is, adding indexes will require additional data to be stored on disk. Indexes require disk space and they also pose operational challenges - if we are talking about real world OLAP data sets, we are talking about terabytes of data. It takes a lot of time and requires a well-defined and tested process to run schema changes on such environment. This is why dedicated columnar datastores can be very handy and help tremendously to get better insight into all the analytics data that everyone stores.

Tags:

mysql replication

oltp

analytics

clickhouse

↧

Hybrid OLTP/Analytics Database Workloads in Galera Cluster Using Asynchronous Slaves

February 21, 2019, 3:04 am

≫ Next: Galera Cluster for MySQL - Tutorial

≪ Previous: Hybrid OLTP/Analytics Database Workloads: Replicating MySQL Data to ClickHouse

Using Galera cluster is a great way of building a highly available environment for MySQL or MariaDB. It is a shared-nothing cluster environment which can be scaled even beyond 12-15 nodes. Galera has some limitations, though. It shines in low-latency environments and even though it can be used across WAN, the performance is limited by network latency. Galera performance can also be impacted if one of the nodes starts to behave incorrectly. For example, excessive load on one of the nodes may slow it down, resulting in slower handling of the writes and that will impact all of the other nodes in the cluster. On the other hand, it is quite impossible to run a business without analyzing your data. Such analysis, typically, requires running heavy queries, which is quite different from an OLTP workload. In this blog post, we will discuss an easy way of running analytical queries for data stored in Galera Cluster for MySQL or MariaDB, in a way that it does not impact the performance of the core cluster.

How to run analytical queries on Galera Cluster?

As we stated, running long running queries directly on a Galera cluster is doable, but perhaps not so good idea. Hardware-dependant, this can be acceptable solution (if you use strong hardware and you will not run a multi-threaded analytical workload) but even if CPU utilization will not be a problem, the fact that one of the nodes will have mixed workload (OLTP and OLAP) will alone pose some performance challenges. OLAP queries will evict data required for your OLTP workload from the buffer pool, and this will slow down your OLTP queries. Luckily, there is a simple yet efficient way of separating analytical workload from regular queries - an asynchronous replication slave.

Replication slave is a very simple solution - all you need is just another host which can be provisioned and asynchronous replication has to be configured from Galera Cluster to that node. With asynchronous replication, the slave will not impact the rest of the cluster in any way. No matter if it is heavily loaded, uses different (less powerful) hardware, it will just continue replicating from the core cluster. The worst case scenario is that the replication slave will start lagging behind but then it is up to you to implement multi-threaded replication or, eventually to scale up the replication slave.

Once the replication slave is up and running, you should run the heavier queries on it and offload the Galera cluster. This can be done in multiple ways, depending on your setup and environment. If you use ProxySQL, you can easily direct queries to the analytical slave based on the source host, user, schema or even the query itself. Otherwise it will be up to your application to send analytical queries to the correct host.

Setting up a replication slave is not very complex but it still can be tricky if you are not proficient with MySQL and tools like xtrabackup. The whole process would consist of setting up the repository on a new server and installing the MySQL database. Then you will have to provision that host using data from Galera cluster. You can use xtrabackup for that but other tools like mydumper/myloader or even mysqldump will work as well (as long as you execute them correctly). Once the data is there, you will have to setup the replication between a master Galera node and the replication slave. Finally, you would have to reconfigure your proxy layer to include the new slave and route the traffic towards it or make tweaks in how your application connects to the database in order to redirect some of the load to the replication slave.

What is important to keep in mind, this setup is not resilient. If the “master” Galera node would go down, the replication link will be broken and it will take a manual action to slave the replica off another master node in the Galera cluster.

This is not a big deal, especially if you use replication with GTID (Global Transaction ID) but you have to identify that the replication is broken and then take the manual action.

How to set up the asynchronous slave to Galera Cluster using ClusterControl?

Luckily, if you use ClusterControl, the whole process can be automated and it requires just a handful of clicks. The initial state has already been set up using ClusterControl - a 3 node Galera cluster with 2 ProxySQL nodes and 2 Keepalived nodes for high availability of both database and proxy layer.

Adding the replication slave is just a click away:

Replication, obviously, requires binary logs to be enabled. If you do not have binlogs enabled on your Galera nodes, you can do it also from the ClusterControl. Please keep in mind that enabling binary logs will require a node restart to apply the configuration changes.

Even if one node in the cluster has binary logs enabled (marked as “Master” on the screenshot above), it’s still good to enable binary log on at least one more node. ClusterControl can automatically failover the replication slave after it detects that the master Galera node crashed, but for that, another master node with binary logs enabled is required or it won’t have anything to fail over to.

As we stated, enabling binary logs requires restart. You can either perform it straight away, or just make the configuration changes and perform the restart at some other time.

After binlogs have been enabled on some of the Galera nodes, you can proceed with adding the replication slave. In the dialog you have to pick the master host, pass the hostname or IP address of the slave. If you have recent backups at hand (which you should do), you can use one to provision the slave. Otherwise ClusterControl will provision it using xtrabackup - all the recent master data will be streamed to the slave and then the replication will be configured.

After the job completed, a replication slave has been added to the cluster. As stated earlier, should the 10.0.0.101 die, another host in the Galera cluster will be picked as the master and ClusterControl will automatically slave 10.0.0.104 off another node.

As we use ProxySQL, we need to configure it. We’ll add a new server into ProxySQL.

We created another hostgroup (30) where we put our asynchronous slave. We also increased “Max Replication Lag” to 50 seconds from default 10. It is up to your business requirements how badly analytics slave can be lagging before it becomes a problem.

After that we have to configure a query rule that will match our OLAP traffic and route it to the OLAP hostgroup (30). On the screenshot above we filled several fields - this is not mandatory. Typically you will need to use one, two of them at most. Above screenshot serves as an example so we can easily see that you can match queries using schema (if you have a separate schema with analytical data), hostname/IP (if OLAP queries are executed from some particular host), user (if application uses particular user for analytical queries. You can also match queries directly by either passing a full query or by marking them with SQL comments and let ProxySQL route all queries with a “OLAP_QUERY” string to our analytical hostgroup.

As you can see, thanks to ClusterControl we were able to deploy a replication slave to Galera Cluster in just a couple of clicks. Some may argue that MySQL is not the most suitable database for analytical workload and we tend to agree. You can easily extend this setup using ClickHouse and by setting up a replication from asynchronous slave to ClickHouse columnar datastore for much better performance of analytical queries. We described this setup in one of the earlier blog posts.

Tags:

↧

Galera Cluster for MySQL - Tutorial

February 22, 2019, 2:43 am

≫ Next: Migrating from DB2 to PostgreSQL - What You Should Know

≪ Previous: Hybrid OLTP/Analytics Database Workloads in Galera Cluster Using Asynchronous Slaves

1. Introduction

This tutorial covers basic information about Galera Cluster for MySQL/MariaDB, with information about the latest features introduced. At the end, you should be able to answer questions like:

How does Galera Cluster work?
How is Galera Cluster different from MySQL replication?
What is the recommended setup for a Galera Cluster?
What are the benefits and drawbacks of Galera Cluster?
What are the benefits of combining Galera Cluster and MySQL Replication in the same setup?
What are the differences, if any, between the three Galera vendors (Codership, MariaDB and Percona)?
How do I manage my Galera Cluster?
What backup strategies should I use with Galera?

There is also a more hands-on, practical section on how to quickly deploy and manage a replication setup using ClusterControl. You would need 4 hosts/VMs if you plan on doing this practical exercise.

2. What is Galera Cluster?

Galera Cluster is a synchronous multi-master replication plug-in for InnoDB. It is very different from the regular MySQL Replication, and addresses a number of issues including write conflicts when writing on multiple masters, replication lag and slaves being out of sync with the master. Users do not have to know which server they can write to (the master) and which servers they can read from (the slaves).

An application can write to any node in a Galera cluster, and transaction commits (row-based replication events) are then applied on all servers, via a certification-based replication. Certification-based replication is an alternative approach to synchronous database replication using Group Communication and transaction ordering techniques.

A minimal Galera cluster consists of 3 nodes and it is recommended to run with odd number of nodes. The reason is that, should there be a problem applying a transaction on one node (e.g., network problem or the machine becomes unresponsive), the two other nodes will have a quorum (i.e. a majority) and will be able to proceed with the transaction commit.

This plug-in is open-source and developed by Codership as a patch for standard MySQL. There are 3 Galera variants - MySQL Galera Cluster by Codership, Percona XtraDB Cluster by Percona and MariaDB Galera Cluster (5.5 and 10.0) by MariaDB. Starting from MariaDB Server 10.1, the Galera is already included in the regular server (and not anymore in a separate cluster version). Enabling Galera cluster is just a matter of activating the correct configuration parameters in any MariaDB Server installation.

3. Difference between MySQL Replication and Galera Cluster

The following diagram illustrates some high-level differences between MySQL Replication and Galera Cluster:

3.1. MySQL Replication Implementation

MySQL uses 3 threads to implement replication, one on the master and two on the slaves per master-slave connection:

Binlog dump thread - The master creates a thread to send the binary log contents to a slave when the slave connects. This thread can be identified in the output of SHOW PROCESSLIST on the master as the Binlog Dump thread.
Slave IO thread - The slave creates an IO thread, which connects to the master and asks it to send the updates recorded in its binary logs. The slave I/O thread reads the updates that the master's Binlog Dump thread sends and copies them to local files that comprise the slave's relay log.
Slave SQL thread - The slave creates an SQL thread to read the relay log that is written by the slave I/O thread and execute the events contained therein.

MySQL Replication is part of the standard MySQL database, and is mainly asynchronous in nature. Updates should always be done on one master, these are then propagated to slaves. It is possible to create a ring topology with multiple masters, however this is not recommended as it is very easy for the servers to get out of sync in case of a master failing. With introduction of GTID in MySQL 5.6 and later, it simplifies the management of the replication data flow and failover activities in particular, however, there is no automatic failover or resynchronization.

3.2. Galera Cluster Implementation

Galera Cluster implements replication using 4 components:

Database Management System - The database server that runs on the individual node. The supported DBMS are MySQL Server, Percona Server for MySQL and MariaDB Server.
wsrep API - The interface and the responsibilities for the database server and replication provider. It provides integration with the database server engine for write-set replication.
Galera Plugin - The plugin that enables the write-set replication service functionality.
Group Communication plugins - The various group communication systems available to Galera Cluster.

A database vendor that would like to leverage Galera Cluster technology would need to incorporate the WriteSet Replication (wsrep) API patch into its database server codebase. This will allow the Galera plugin which works as a wsrep provider to communicate and replicate transactions (writesets in Galera terms) via group communication protocol. This enables a synchronous master-master setup for InnoDB. Transactions are synchronously committed on all nodes.

In case of a node failing, the other nodes will continue to operate and kept up to date. When the failed node comes up again, it automatically synchronizes with the other nodes through State Snapshot Transfer (SST) or Incremental State Transfer (IST) depending on the last known state, before it is allowed back into the cluster. No data is lost when a node fails.

Galera Cluster makes use of certification based replication, that is a form of synchronous replication with reduced overhead.

4. What is Certification based Replication?

Certification based replication uses group communication and transaction ordering techniques to achieve synchronous replication. Transactions execute optimistically in a single node (or replica) and, at commit time, run a coordinated certification process to enforce global consistency. Global coordination is achieved with the help of a broadcast service, that establishes a global total order among concurrent transactions.

Pre-requisites for certification based replication:

database is transactional (i.e. it can rollback uncommitted changes)
each replication event changes the database atomically
replicated events are globally ordered (i.e. applied on all instances in the same order)

The main idea is that a transaction is executed conventionally until the commit point, under the assumption that there will be no conflict. This is called optimistic execution. When the client issues a COMMIT command (but before the actual commit has happened), all changes made to the database by the transaction and the primary keys of changed rows are collected into a writeset. This writeset is then replicated to the rest of the nodes. After that, the writeset undergoes a deterministic certification test (using the collected primary keys) on each node (including the writeset originator node) which determines if the writeset can be applied or not.

If the certification test fails, the writeset is dropped and the original transaction is rolled back. If the test succeeds, the transaction is committed and the writeset is applied on the rest of the nodes.

The certification test implemented in Galera depends on the global ordering of transactions. Each transaction is assigned a global ordinal sequence number during replication. Thus, when a transaction reaches the commit point, it is known what was the sequence number of the last transaction it did not conflict with. The interval between those two numbers is an uncertainty land: transactions in this interval have not seen the effects of each other. Therefore, all transactions in this interval are checked for primary key conflicts with the transaction in question. The certification test fails if a conflict is detected.

Since the procedure is deterministic and all replicas receive transactions in the same order, all nodes reach the same decision about the outcome of the transaction. The node that started the transaction can then notify the client application if the transaction has been committed or not.

Certification based replication (or more precisely, certification-based conflict resolution) is based on academic research, in particular on Fernando Pedone's Ph.D. thesis.

5. Galera Cluster Strengths and Weaknesses

Galera Cluster has a number of benefits:

A high availability solution with synchronous replication, failover and resynchronization
No loss of data
All servers have up-to-date data (no slave lag)
Read scalability
'Pretty good' write scalability
High availability across data centers

Like any solution, there are some limitations:

It supports only InnoDB or XtraDB storage engine
With increasing number of writeable masters, the transaction rollback rate may increase, especially if there is write contention on the same dataset (a.k.a hotspot). This increases transaction latency.
It is possible for a slow/overloaded master node to affect performance of the Galera Cluster, therefore it is recommended to have uniform servers across the cluster.

We would suggest you to also look at these resources which explains the subject in great details:

6. Deploying a Galera Cluster

A Galera Cluster can be deployed using ClusterControl. Our architecture is illustrated as below:

Install ClusterControl by following the instructions on the Getting Started page. Do not forget to setup passwordless SSH from ClusterControl to all nodes (including the ClusterControl node itself). We are going to use root user for deployment. On ClusterControl node as root user, run:

$ ssh-keygen -t rsa
$ ssh-copy-id 192.168.10.100 # clustercontrol
$ ssh-copy-id 192.168.10.101 # galera1
$ ssh-copy-id 192.168.10.102 # galera2
$ ssh-copy-id 192.168.10.103 # galera3

Open the ClusterControl UI, go to the “Deploy Database Cluster” and open the “MySQL Galera” tab. In the dialog, there are two steps as shown in the following screenshots.

6.1. Cluster Deployment

6.1.1. General and SSH Settings

Under “General & SSH Settings”, specify the required information:

SSH User - Specify the SSH user that ClusterControl will use to connect to the target host.
SSH Key Path - Passwordless SSH requires an SSH key. Specify the physical path to the key file here.
Sudo Password - Sudo password if the sudo user uses password to escalate privileges. Otherwise, leave it blank.
SSH Port Number - Self-explanatory. Default is 22.
Cluster Name - Name of the cluster that will be deployed.

Keep the checkboxes as default so ClusterControl installs the software and configures the security options accordingly. If you would like to to keep the firewall settings, uncheck the “Disable Firewall” however make sure MySQL related ports are opened before the deployment begins, as shown in this documentation page.

6.1.2. Define MySQL Servers

Move to the next tab, define the MySQL Servers installation and configuration options:

Vendor - Currently supported vendor is Percona Server, MariaDB and Codership.
Version - MySQL major version. Version 5.7 (Codership/Percona) or 10.1 (MariaDB) is recommended.
Server Data Directory - The physical location of MySQL data directory. Default is /var/lib/mysql.
Server Port - MySQL server port. Default is 3306.
my.cnf Template - MySQL configuration template. Leave it to use the default template located under /usr/share/cmon/templates.
Root Password - MySQL root password. ClusterControl will set this up for you.
Repository - Choose the default value is recommended, unless if you want to use existing repositories on the database nodes. You can also choose “Create New Repository” to create and mirror the current database vendor’s repository and then deploy using the local mirrored repository.
Add Node - Specify the IP address or hostname of the target hosts. ClusterControl must able to reach the specified server through passwordless SSH.

ClusterControl then performs the necessary actions to provision, install, configure and monitor the Galera nodes. Once the deployment completes, you will see the database cluster in the ClusterControl dashboard.

6.2. Scaling Out

Introducing a new Galera node is automatic. It follows the same process as recovering a failed node. When the node is introduced, the state is expected to be empty, thus Galera will do a full state snapshot transfer (SST) from the selected donor.

Using ClusterControl, adding a new Galera node is very straightforward. You just need to setup passwordless SSH to the target node and then go to ClusterControl > Cluster Actions > Add Node > Create and add a new DB Node:

If you already have a load balancer deployed, you can set “Include in LoadBalancer set” to true so ClusterControl will automatically add it into the chosen load balancer.

6.3. Scaling Down

Scaling down is trivial. You just need to remove the corresponding database node from Galera through the ClusterControl UI by clicking on the “X” icon under Nodes page. If a Galera node stops with a proper shutdown, the node is marked as leaving the cluster gracefully and can be considered as a scale down. ClusterControl will then remove the node from the load balancer (if exists) and update the wsrep_cluster_address variable on the available nodes accordingly.

6.4. Attaching Replication Slave

Another common setup in Galera Cluster is to have an asynchronous replication slave attach to it. There are a few good reasons to attach an asynchronous slave to a Galera Cluster. For one, long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. An asynchronous slave can also serve as a remote live backup.

To add a new replication slave, you have to perform the following steps:

Enable binary logging on the selected Galera node (master). Go to ClusterControl > Nodes > choose the Galera node > Node Actions > Enable Binary Logging. This requires a MySQL restart on the corresponding node. You can repeat this step for the rest of the Galera nodes if you would like to have multiple masters.
Setup a passwordless SSH to the target node.
Go to ClusterControl > Cluster Actions > Add a Replication Slave > New Replication Slave and specify the required information like the slave node IP address/hostname, pick the right master and streaming port.

There is also an option to setup a delayed slave by specifying the amount of time in seconds. We have covered this in details in this blog post, How to Set Up Asynchronous Replication from Galera Cluster to Standalone MySQL server with GTID.

7. Accessing the Galera Cluster

By now, we should have a Galera Cluster setup ready. The next thing is to import an existing database or to create a brand new database for a new application. Galera Cluster maintains the native MySQL look and feel. Therefore, most of the existing MySQL libraries and connectors out there are compatible with it. However, there are some considerations when connecting to the cluster, as explained in the next section.

7.1. Direct Access

A running MySQL server in Galera Cluster does not really mean it is operational. Take an example when Galera nodes are partitioned. During this time, the wsrep_cluster_state value would report as Non-Primary. Note that only Primary Component is a healthy cluster. You could still connect to the MySQL server and perform a number of statements like SHOW and SET though. Keep in mind that DML statements like SELECT, INSERT and UPDATE would not work at this point.

There are three states that must be true which exposed through MySQL’s SHOW statement when determining Galera healthiness:

wsrep_cluster_state: Primary
wsrep_local_state_comment: Synced
wsrep_ready: On

Check for those states before sending any transaction to the Galera nodes. If the probing logic adds complexity to your applications, you can use a reverse proxy approach as described in the next section.

7.2. Reverse Proxy

All database nodes in a Galera Cluster can be treated equally, since they are all holding the same dataset and are able to process reads and writes simultaneously. Therefore, it can be load balanced with reverse proxies.

A reverse proxy or load balancer typically sits in front of the Galera nodes and directs client requests to the appropriate backend server. This setup will reduce the complexity when dealing with multiple MySQL masters, distribute the MySQL connections efficiently and make the MySQL service more resilient to failure. Applications just have to send queries to the load balancer servers, and the queries are then re-routed to the correct backends. The application side does not need to perform health checks for cluster state and database availability as these tasks have been taken care of by the reverse proxy.

By adding a reverse-proxy into the picture, our architecture should look like this:

ClusterControl supports the deployment of HAProxy, ProxySQL and MariaDB MaxScale.

7.2.1. HAProxy

HAProxy as MySQL load balancer works similarly to a TCP forwarder, which operates in the transport layer of the TCP/IP model. It does not understand the MySQL queries (which operates in the higher layer) that it distributes to the backend MySQL servers. Setting up HAProxy for Galera nodes requires an external script planted into the database node called mysqlchk. This health check script will report the database state via an HTTP status code on port 9200. ClusterControl will install this script automatically on the selected Galera node.

To create an HAProxy instance in front of the Galera nodes, go to Manage > Load Balancer > HAProxy > Deploy HAProxy. The following screenshot shows the HAProxy instance stats:

The application can then send the MySQL queries through port 3307 of the load balancer node.

Read more about this in our tutorial on MySQL load balancing with HAProxy.

7.2.2. ProxySQL

ProxySQL is a new high-performance MySQL proxy with an open-source GPL license. It was released as generally available (GA) for production usage towards the end of 2015. It accepts incoming traffic from MySQL clients and forwards it to backend MySQL servers. It supports various MySQL Replication topologies as well as multi-master setup with Galera Cluster, with capabilities like query routing (e.g, read/write split), sharding, queries rewrite, query mirroring, connection pooling and lots more.

To deploy a ProxySQL instance, simply go to Manage > Load Balancer > ProxySQL > Deploy ProxySQL and specify the necessary information. Choose the server instance to be included into the load balancing set and specify the max replication lag for each of them. ClusterControl will configure two hostgroups for the Galera Cluster, one for multi-master and another one is single-master. By default, all queries will be routed to hostgroup 10 (single-master pool). You can customize the query routing under “Query Rules” at later stage to suit your needs.

Once deployed, you can simply send the MySQL connection to the load balancer host on port 6033. The following screenshot shows the single-master hostgroup (Hostgroup 10) with some stats captured by ProxySQL:

7.2.3. MariaDB MaxScale

MariaDB MaxScale is a database proxy that allows the forwarding of database statements to one or more MySQL/MariaDB database servers. The recent MaxScale 2.0 is licensed under MariaDB BSL which is free to use on up to two database servers. ClusterControl installs MaxScale 1.4 which is free to use on an unlimited number of database servers.

7.2.4. Other Load Balancers

We have built a healthcheck script called clustercheck-iptables that can be used with other load balancers like nginx, IPVS, Keepalived and Pen. This background script checks the availability of a Galera node, and adds a redirection port using iptables if the Galera node is healthy (instead of returning HTTP response). This allows other TCP-load balancers with limited health check capabilities to understand and monitor the backend Galera nodes correctly.

7.2.5. Keepalived

Keepalived is commonly used to provide a single endpoint through virtual IP address, which allows an IP address to float around between load balancer nodes. In case of the primary load balancer goes down, the IP address will be failed over to the backup load balancer node. Once the primary load balancer comes back up, the IP address will be failed back to this node.

You can deploy a Keepalived instance using ClusterControl > Manage > Load Balancer > Keepalived > Deploy Keepalived. It requires at least two load balancers deployed by or imported to ClusterControl.

8. Failure handling with ClusterControl and Galera

In order to keep the database cluster stable and running, it is important for the system to be resilient to failures. Failures are caused by either software bugs or hardware problems, and can happen at any time. In case a server goes down, failure handling, failover and reconfiguration of the Galera cluster needs to be automatic, so as to minimize downtime.

In case of a node failing, applications connected to that node can connect to another node and continue to do database requests. Keepalive messages are sent between nodes in order to detect failures, in which case the failed node is excluded from the cluster.

ClusterControl will restart the failed database process, and point it to one of the existing nodes (a 'donor') to resynchronize. The resynchronization process is handled by Galera. It can be either through state snapshot transfer (SST) or incremental snapshot transfer (IST).

While the failed node is resynchronizing, any new transactions (writesets) coming from the existing nodes will be cached in a slave queue. Once the node has caught up, it will be considered as synced and is ready to accept client connections.

View this webinar for more information on how to recover MySQL and MariaDB clusters.

9. Operations - Managing your Galera Cluster

9.1. Backup

We have covered previously about backup strategies for MySQL. ClusterControl supports mysqldump and xtrabackup (full and incremental) to perform backups. Backups can be performed or scheduled on any database node and stored locally or stored centrally on the ClusterControl node. When storing backups on the ClusterControl node, the backup is first created on the target database node and then streamed over using netcat to the controller node. You can also choose to backup individual databases or all databases. Backup progress is available underneath it and you will get notification on the backup status each time it is created.

To create a backup, simply go to Backup > Create Backup and specify the necessary details:

To schedule backups, click on “Schedule Backup” and configure the scheduling accordingly:

Backups created by ClusterControl can be restored on one of the database node.

9.2. Restore

ClusterControl has ability to restore backups (mysqldump and xtrabackup) created by ClusterControl or externally via some other tool. For external backup, the backup files must exist on the ClusterControl node and only xbstream, xbstream.gz and tar.gz extensions are supported.

All incremental backups are automatically grouped together under the last full backup and expandable with a drop down. Each created backup will have “Restore” and “Log” buttons:

To restore a backup, simply click on the “Restore” button for the respective backup. You should then see the following Restore wizard and a couple of post-restoration options:

If the backup was taken using Percona Xtrabackup, the replication has to be stopped. The following steps will be performed:

Stop all nodes in the Galera setup.
Copy the backup files to the selected server.
Restore the backup.
Once the restore job is completed, ClusterControl will bootstrap the restored node.
ClusterControl will start the remaining nodes by using the bootstrapped node as the donor.

9.3. Software Upgrade

You can perform a database software upgrade via ClusterControl > Manage > Upgrades > Upgrade. Upgrades are online and are performed on one node at a time. One node will be stopped, then the software is updated through package manager and finally the node is started again. If a node fails to upgrade, the upgrade process is aborted. Upgrades should only be performed when there is as little traffic as possible on the database hosts.

You can monitor the MySQL upgrade progress from ClusterControl > Activity > Jobs, as shown in the following screenshot:

ClusterControl performs upgrade of Galera Cluster by upgrading them one at a time. Once the job is completed, verify the new version is correct from the “Cluster Overview” page.

9.4. Configuration Management

System variables are found in my.cnf. Some of the variables are dynamic and can be set at runtime, others not. ClusterControl provides an interface to update MySQL configuration parameters on all DB instances at once. Select DB instance(s), configuration group and parameter and ClusterControl will perform the necessary changes via SET GLOBAL (if possible) and also make it persistent in my.cnf.

If restart is required, ClusterControl will acknowledge that in the Config Change Log dialog:

More information in this blog post, Updating your MySQL Configuration.

9.5. Online Schema Upgrade

Any DDL statement that runs on the database, such as CREATE TABLE, ALTER TABLE or GRANT, upgrades the schema. Traditionally, a schema change in MySQL was a blocking operation - a table had to be locked for the duration of the changes. In Galera Cluster, there are two ways you can perform the online schema changes (OSU):

Total Order Isolation (TOI)
Rolling Schema Upgrade (RSU)

These two options, TOI and RSU are part of the wsrep_osu_method variable.

9.5.1. Total Order Isolation (TOI)

This is the default value of wsrep_osu_method. Where the schema upgrades run on all cluster nodes in the same total order sequence, preventing other transactions from committing for the duration of the operation. In Total Order Isolation, queries that update the schema replicate as statements to all nodes in the cluster. The nodes wait for all preceding transactions to commit then, simultaneously, they execute the schema upgrade in isolation. For the duration of the DDL processing, no other transactions can commit.

More details in this blog on schema changes using TOI.

9.5.2. Rolling Schema Upgrade (RSU)

The schema upgrades run locally, affecting only the node on which they are run. The changes do not replicate to the rest of the cluster. DDL statements that update the schema are only processed on the local node. While the node processes the schema upgrade, it desynchronizes with the cluster. When it finishes processing the schema upgrade it applies delayed replication events and synchronizes itself with the cluster.

More details can be found in this blog on schema changes using RSU.

Each method has its own pros and cons. More details in this blog post, Become a MySQL DBA blog series - Common operations - Schema Changes.

10. Common Questions about Galera Cluster

10.1. What can cause Galera to crash?

Stay clear of Galera's known limitations to avoid problems.

There are several reasons which can cause Galera to crash:

Too many deadlocks under heavy load when writing to the same set of rows,
OS is swapping and/or high iowait,
Out of disk space,
InnoDB crashes,
Use only binlog_format=ROW,
Every table must have a PRIMARY KEY,
Replication of MyISAM table is experimental and MyISAM tables should be avoided,
Delete from table which does not have primary key can cause a cluster crash,
No primary components available or cluster is out of quorum,
MySQL misconfiguration,
Galera software bug

10.2. What happens when disk is full?

Galera node provisioning is smart enough to kick out any problematic node from the cluster if it detects inconsistency among members. When the mysqld runs out of disk space (in the data directory), the node is not able to apply writesets. Galera detects this as failed transactions. Since this compromises node consistency, Galera will then signal it to close the group communication and force mysqld to terminate.

Restarting the node will give disk-full errors (such as "no space left on device"), and quota-exceeded errors (such as "write failed" or "user block limit reached"). You might want to add another data file on another disk or clear up some space before the node can rejoin the cluster.

10.3. What happens if OS is swapping?

If the OS starts swapping and/or if iowait is very high it can "freeze" the server for duration of time. During this time the Galera node may stop responding to the other nodes and will be deemed dead. In virtualized environments it can also be the Host OS that is swapping.

10.4. How to handle Galera crash?

First of all, make sure you are running the latest Galera stable release so you do not run into older bugs that have already been fixed. Start with inspecting the MySQL error log on the Galera nodes as Galera will be logging to this file. Try to shed some light to any relevant line which indicates error or failing. If the Galera nodes happen to be responsive, you may also try to collect following output:

mysql> SHOW FULL PROCESSLIST; 
mysql> SHOW PROCESSLIST; 
mysql> SHOW ENGINE INNODB STATUS; 
mysql> SHOW STATUS LIKE 'wsrep%';

Next, inspect the system resources by checking network, firewall, disk usage and memory utilization as well as inspecting the general system activity log (syslog, message, dmesg). If still no indication of the problem found, you may hit into a bug which you can report it directly at Galera bugs on Launchpad page or request for technical support assistance directly from the vendor (Codership, Percona or MariaDB). You may also join the Galera Google Group mailing list to seek for open assistance.

Note:

If you are using rsync for state transfer, and a node crashes before the state transfer is over, rsync process might hang forever, occupying the port and not allowing to restart the node. The problem will show up as 'port in use' in the server error log. Find the orphan rsync process and kill it manually.
Before re-initializing the cluster, you can determine which DB node is having the most updated data by comparing the wsrep_last_commited value among nodes. The one which holding the highest number is recommended to be the reference node when bootstrapping the cluster.

10.5. What happens if I don't have primary keys in my table?

Each transaction is assigned a global ordinal sequence number during replication. Thus, when a transaction reaches the commit point, it is known what was the sequence number of the last transaction it did not conflict with. The interval between those two numbers is an uncertainty land: transactions in this interval have not seen the effects of each other. Therefore, all transactions in this interval are checked for primary key conflicts with the transaction in question. The certification test fails if a conflict is detected.

DELETE FROM statement requires PK or node(s) will die. Always define an explicit PRIMARY KEY in all tables. A simple AUTO_INCREMENT primary key will be just enough.

10.6. Is it true that MySQL Galera is as slow as the slowest node?

Yes it is. Galera relies on group communication between nodes. For any matters, it will wait for all nodes to return the status of certification test before proceed with committing or rollbacking. At this phase, an overloaded node will surely facing a hard time to reply within the time manner, delaying the rest of the cluster to wait for it. Therefore it is recommended to have uniform servers across the cluster. Also, transaction latency is no shorter than the RTT (round trip time) to the slowest node.

10.7. Can I Use Galera to Replicate between Data Centers?

Although Galera Cluster is synchronous, it is possible to deploy a Galera Cluster across data centers. Synchronous replication like MySQL Cluster (NDB) implements two-phase commit, where messages are sent to all nodes in a cluster in a 'prepare' phase, and another set of messages are sent in a 'commit' phase. This approach is usually not suitable for geographically disparate nodes, because of the latencies in sending messages between nodes.

Galera Cluster makes use of certification based replication, that is a form of synchronous replication with reduced overhead.

10.8. Usage of MyISAM tables? Why is it not recommended?

MySQL Galera treats MyISAM tables in quite different way:

All DDL (create, drop, alter table...) on MyISAM will be replicated.
DML (update, delete, insert) on MyISAM tables only, will not be replicated.
Transactions containing both InnoDB and MyISAM access will be replicated.

So, MyISAM tables will appear in all nodes (since DDL is replicated). If you access the MyISAM tables outside of InnoDB transactions, then all data changes in MyISAM tables will remain locally in each node. If you access MyISAM tables inside InnoDB transactions, then MyISAM changes are replicated along InnoDB changes. However, if there happen cluster wide conflicts, MyISAM changes cannot be rolled back and your MyISAM tables will remain inconsistent.

10.9. Which version of Galera is the best (PXC, Maria, Codership)?

The Galera technology is developed by Codership Oy and is available as a patch for standard MySQL and InnoDB. Percona and MariaDB leverage the Galera library in Percona XtraDB Cluster (PXC) and MariaDB Galera Cluster respectively.

Since they all leverage the same Galera library, replication performance should be fairly similar. The Codership build usually has the latest version of Galera, although that could change in the future.

Tutorials Image:

↧

Migrating from DB2 to PostgreSQL - What You Should Know

February 22, 2019, 2:48 am

≫ Next: New Webinar: How to Migrate from Oracle DB to MariaDB

≪ Previous: Galera Cluster for MySQL - Tutorial

Whether migrating a database or an application from DB2 to PostgreSQL with only one type of database knowledge is not sufficient, there are few things to know about the differences between the two database systems.

PostgreSQL is world’s most widely used advanced open source database. PostgreSQL database has rich feature set and PostgreSQL community is very strong and they are continuously improving the existing features and add new features. As per the db-engine.com, PostgreSQL is the DBMS of the year 2017 and 2018.

As you know DB2 and PostgreSQL are RDBMS but there are some incompatibilities. In this blog, we can see some of these incompatibilities.

Why Migrate From DB2 to PostgreSQL

Flexible open source licencing and easy availability from public cloud providers like AWS, Google cloud, Microsoft Azure.
Benefit from open source add-ons to improve database performance.

You can see in the below image that PostgreSQL popularity is increasing over time as compared to DB2.

Interest Over Time

Migration Assessment

The first step of migration is to analyze the application and database object, find out the incompatibilities between both the databases and estimate the time and cost required for migration.

Data Type Mapping

Some of the data types of IBM DB2 does not match directly with PostgreSQL data types, so you need to change it to corresponding PostgreSQL data type.

Please check the below table.

IBM DB2		PostgreSQL
BIGINT	64-bit integer	BIGINT
BLOB(n)	Binary large object	BYTEA
CLOB(n)	Character large object	TEXT
DBCLOB(n)	UTF-16 character large object	TEXT
NCLOB(n)	UTF-16 character large object	TEXT
CHAR(n), CHARACTER(n)	Fixed-length string	CHAR(n)
CHARACTER VARYING(n)	Variable-length string	VARCHAR(n)
NCHAR(n)	Fixed-length UTF-16 string	CHAR(n)
NCHAR VARYING(n)	Variable-length UTF-16 string	VARCHAR(n)
VARCHAR(n)	Variable-length string	VARCHAR(n)
VARGRAPHIC(n)	Variable-length UTF-16 string	VARCHAR(n)
VARCHAR(n) FOR BIT DATA	Variable-length byte string	BYTEA
NVARCHAR(n)	Varying-length UTF-16 string	VARCHAR(n)
GRAPHIC(n)	Fixed length UTF-16 string	CHAR(n)
INTEGER	32-bit integer	INTEGER
NUMERIC(p,s)	Fixed-point number	NUMERIC(p,s)
DECIMAL(p,s)	Fixed-point number	DECIMAL(p,s)
DOUBLE PRECISION	Double precision floating point number	DOUBLE PRECISION
FLOAT(p)	Double precision floating point number	DOUBLE PRECISION
REAL	Single precision floating point number	REAL
SMALLINT	16-bit integer	SMALLINT
DATE	Date(year, month and day)	DATE
TIME	TIME (hour, minute, and second)	TIME(0)
TIMESTAMP(p)	Date and time with fraction	TIMESTAMP(p)
DECFLOAT(16 \| 34)	IEEE Floating point number	FLOAT

Incompatibilities in DB2 and PostgreSQL

There are many incompatibilities present in DB2 and PostgreSQL, you can see some of them here. You can automate them by creating extensions so that you can use the DB2 function as it is in PostgreSQL and you can save your time. Please check the behaviour of DB2 function in PostgreSQL

TABLESPACE

TABLESPACE clause defines the name of the tablespace in which the newly created table resides.

DB2 uses IN clause for TABLESPACE so it should be replaced by TABLESPACE clause in PostgreSQL.

Example:

DB2:

IN <tablespace_name>

PostgreSQL:

TABLESPACE <tablespace_name>

FIRST FETCH n ROWS ONLY

In DB2, you can use FETCH FIRST n ROWS ONLY clause to retrieve no more than n rows. In PostgreSQL, you can use LIMIT n which is equivalent to FETCH FIRST n ROWS ONLY.

Example:

DB2:

SELECT * FROM EMP
 ORDER BY EMPID
 FETCH FIRST 10 ROWS ONLY;

PostgreSQL:

SELECT * FROM EMP
 ORDER BY EMPID
 LIMIT 10;

GENERATED BY DEFAULT AS IDENTITY

The IDENTITY column in DB2 can be replaced by Serial column in PostgreSQL.

DB2:

CREATE TABLE <table_name> (
<column_name> INTEGER NOT NULL
 GENERATED BY DEFAULT AS IDENTITY (START WITH 1, INCREMENT BY 1, CACHE 20) 
);

PostgreSQL:

CREATE TABLE <table_name> (
<column_name>  SERIAL NOT NULL
);

Select from SYSIBM.SYSDUMMY1

There is no “SYSIBM.SYSDUMMY1” table in PostgreSQL. PostgreSQL allows a “SELECT” without ”FROM” clause. You can remove this by using script.

Scalar Functions: DB2 vs PostgreSQL

CEIL/CEILING

CEIL or CEILING returns the next smallest integer value that is greater than or equal to the input (e.g. CEIL(122.89) returns 123, also CEIL(122.19) returns 123).

DB2:

SELECT CEIL(123.89) FROM SYSIBM.SYSDUMMY1; 
SELECT CEILING(123.89) FROM SYSIBM.SYSDUMMY1;

PostgreSQL:

SELECT CEIL(123.89) ; 
SELECT CEILING(123.89) ;

DATE

It converts the input to date values. You can convert DATE function to TO_DATE function in PostgreSQL.

DB2:

SELECT DATE ('2018-09-21') FROM SYSIBM.SYSDUMMY1;

PostgreSQL:

SELECT TO_DATE ('21-09-2018',’DD-MM-YYYY’) ;

DAY

It returns the day (day of the month) part of a date or equivalent value. The output format is integer.

DB2:

SELECT DAY (DATE('2016-09-21')) FROM SYSIBM.SYSDUMMY1;

PostgreSQL:

SELECT DATE_PART('day', '2016- 09-21'::date);

MONTH

It returns the month part of the date value. The output format is integer.

DB2:

SELECT MONTH (DATE('2016-09-21')) FROM SYSIBM.SYSDUMMY1;

PostgreSQL:

SELECT DATE_PART ('month', '2016-09- 21'::date);

POSSTR

Returns the position of string. The POSSTR function is replaced by POSITION function in PostgreSQL.

DB2:

Usage : POSSTR(<Filed_1>,<Field2>)
SELECT POSSTR('PostgreSQL and DB2', 'and') FROM SYSIBM.SYSDUMMY1;

PostgreSQL:

Usage: POSITION(<Field_1> IN<Field_2>)
SELECT POSITION('and' IN'PostgreSQL and DB2');

RAND

It returns a pseudorandom floating-point value in the range of zero to one inclusive. You can replace RAND function to RANDOM in PostgreSQL.

DB2:

SELECT RAND() FROM SYSIBM.SYSDUMMY1;

PostgreSQL:

SELECT RANDOM();

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Tools

You can use some tools to migrate DB2 database to PostgreSQL. Please test the tool before use it.

Db2topg
It is an automated tool for DB2 to PostgreSQL migration like ora2pg. The scripts in the db2pg tool converts as much as possible of a DB2 UDB database. This tool does not work with DB2 zOS. It is very simple to use, you need a SQL dump of your schema and then use db2pg script to convert it to a PostgreSQL schema.
Full convert
Enterprise tool quickly copies DB2 database to PostgreSQL. The conversion of DB2 to PostgreSQL database using Full Convert tool is very simple.
Steps:
- Connect to the source database i.e. DB2
- Optional: Choose the tables that you want to convert(by default all the tables selected)
- Start the conversion.

Conclusion

As we could see, migrating from DB2 to PostgreSQL is not rocket science, but we need to keep in mind the thing that we saw earlier to avoid big issues in our system. So, we only need to be careful in the task and go ahead, you can migrate to the most advanced open source database and take advantage of its benefits.

Tags:

↧

New Webinar: How to Migrate from Oracle DB to MariaDB

February 25, 2019, 6:13 am

≫ Next: How to Migrate from Oracle DB to MariaDB

≪ Previous: Migrating from DB2 to PostgreSQL - What You Should Know

Migrating from Oracle database to MariaDB can come with a number of benefits: lower cost of ownership, access to and use of an open source database engine, tight integration with the web, an active community of MariaDB database users and more.

Over the years MariaDB has gained Enterprise support and maturity to run critical and complex data transaction systems. With the recent version, MariaDB has added some great new features such as SQL_Mode=Oracle compatibility, making the transition process easier than ever before.

Whether you’re planning to migrate from Oracle database to MariaDB manually or with the help of a commercial tool to automate the entire migration process, you need to know all the possible bottlenecks and methods involved in the process and the results validation.

Join us on March 12th as we walk you through all you need to know to plan and execute a successful migration from Oracle database to MariaDB.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, March 12th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

North America/LatAm

Tuesday, March 12th at 10:00 Pacific Time (US) / 13:00 Eastern Time (US)

Agenda

A brief introduction to the platform
- Oracle vs MariaDB
- Platform support
- Installation process
- Database access
- Backup process
- Controlling query execution
- Security
- Replication options
Migration
- Planning and development strategy
- Assessment or preliminary check
- Data type mapping
- Migration tools
- Migration process
- Testing
Post-migration
- Monitoring and Alerting
- Performance Management
- Backup Management
- High availability
- Upgrades
- Scaling
- Staff training

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

We look forward to “seeing” you there!

Tags:

↧

How to Migrate from Oracle DB to MariaDB

February 25, 2019, 6:17 am

≫ Next: Migrating from Oracle Database to MariaDB - What You Should Know

≪ Previous: New Webinar: How to Migrate from Oracle DB to MariaDB

Join us on March 12th as we walk you through all you need to know to plan and execute a successful migration from Oracle database to MariaDB.

Migrating from Oracle database to MariaDB can come with a number of benefits: lower cost of ownership, access to and use of an open source database engine, tight integration with the web, wide circle of MariaDB database professionals and more.

Find out how it could benefit your organisation on March 12th!

Image:

Agenda:

A brief introduction to the platform
- Oracle vs MariaDB
- Platform support
- Installation process
- Database access
- Backup process
- Controlling query execution
- Security
- Replication options
Migration
- Planning and development strategy
- Assessment or preliminary check
- Data type mapping
- Migration tools
- Migration process
- Testing
Post-migration
- Monitoring and Alerting
- Performance Management
- Backup Management
- High availability
- Upgrades
- Scaling
- Staff training

Date & Time v2:

Tuesday, March 12, 2019 - 10:00 to 11:15

Tuesday, March 12, 2019 - 13:00 to 14:15

↧

Migrating from Oracle Database to MariaDB - What You Should Know

February 26, 2019, 5:36 am

≫ Next: The Current State of Open Source Backup Management for PostgreSQL

≪ Previous: How to Migrate from Oracle DB to MariaDB

Gartner predicts that by 2022, 50% of existing commercial databases will have been converted to open source databases. Even more, 70% of new in-house applications will be developed on an open source database platform (State of the Open-Source DBMS Market, 2018).

These are high numbers considering the maturity, stability, and criticality of popular, proprietary database software. The same may be observed in the top database ranking, where most of the top ten databases are open source.

https://db-engines.com/en/ranking

What is pushing companies to do such moves?

There could be many reasons to migrate database systems. For some, the main reason will be the cost of license and ownership; but is it really only about the cost? And s open source stable enough to move critical production systems to that new open source world?

Open source databases, especially new ones brought into an organization, often come from a developer on a project team. It’s chosen because it’s free (doesn’t impact the direct project’s external spend) and meets the technical requirements of the moment.

But the free aspect doesn't actually come simply with no cost as you have to consider many factors including the migration and the cost of manhours. The smoother the migration the less time and money spent on the project.

Database migrations can be challenging especially for heterogeneous proprietary database migrations such as Oracle to PostgreSQL, Oracle to Percona or MySQL. The complex schema structure, data types, and database code like PL/SQL can be quite different from those of the target databases,
requiring a schema and code transformation step before the data migration starts.

In the recent article by my colleague Paul Namuag, he examined how to migrate Oracle to Percona.

This time we will take a look at what you should know before migrating from Oracle to MariaDB.

MariaDB promises the enterprise features and migration features which can help to migrate Oracle databases into the open source world.

In this blog post we will cover the following:

Why migrate?
Storage engine differences
Database connectivity considerations
Installation & administration simplicity
Security differences
Replication & HA
PL/SQL and database code
Clustering and scaling
Backup and recovery
Cloud compatibility
Miscellaneous considerations

Why Migrate from Oracle?

Most enterprises will run Oracle or SQL Server, or a combination of both, with small pockets of isolated open source databases operating independently. Small to medium-sized businesses would tend to deploy mainly open source databases, especially for new applications. But this is changing, and often, open source is the main choice even for big organizations.

A quick comparison of these two databases systems looks as follows:

Only Oracle Express Edition is free of cost, but it has very limited features when compared to MariaDB. For extensive features, either Oracle Standard Edition or Oracle Enterprise Edition has to be purchased.
On the other hand, MariaDB and MySQL community were working hard to minimize the potential features gap. Security compliance, hot backups, and many other enterprise features are now available in MariaDB.

There are things that were always more flexible in MariaDB/MySQL than in massive Oracle setups. One of them is the ease of replication and horizontal cluster scalability.

Storage engine differences

First, let’s start with some basics. You can still hear a lot of legends and myths regarding MySQL or MariaDB limitations, which mostly refer to the dark times when the main storage engine was MyISAM.

MyISAM was the default storage engine from MySQL 3.23 until it was replaced by InnoDB in MariaDB 5.5. It's a light, non-transactional engine with great performance but does not offer row-level locking or the reliability of InnoDB.

With InnoDB (default storage engine), MariaDB offers the two standard row-level locks, which are shared locks(S) and exclusive locks(X). A shared lock is obtained to read a row and allows other transactions to read the locked row. Different transactions may also acquire their own shared locks.
The particular lock is obtained to write to a row and stops additional transactions from locking the same row.

InnoDB has definitely covered the biggest transactional feature gap between these two systems.

Due to the pluggable nature of MariaDB, it offers even more storage engines so you can better adjust it to a specific workload. I.e. when space matters you can go with TokuDB which offers great compression ratio, Spider optimized for partitioning and data sharding, ColumnStore for big data scaling.

Nevertheless, for those migrating from Oracle, my recommendation would be to go first with the InnoDB storage engine.

Connectivity Considerations

MariaDB shares with Oracle good support for database access including ODBC and JDBC drivers, as well as access libraries for Perl, Python, and PHP. MySQL and Oracle both support binary large objects, character, numeric, and date data types. So you should have no issues with finding the right connector for your application services.

MariaDB doesn't have the dedicated listener process to maintain database connections nori SCAN address for the clustered database like we know from Oracle. You will also not find flexible database services. Instead, you will need to configure manually between Unix socket (a local, most secure way to connect DB - app on the same server), remote connections (by default, MariaDB does not permit remote logins) and also pipe and memory available on Windows systems only. For the cluster, the SCAN address needs to be replaced by the load balancer. MariaDB recommends using their other product MaxScale, but you can also find others like ProxySQL or HAproxy that will work with MariaDB, with some limitations. While using external load balancers for MariaDB can be difficult you may find great features which, by way of comparison, are not available in the Oracle database.

A load balancer would also be a recommendation for those who are looking for Oracle Transparent Application Failover (TAF), Oracle Firewall DB or some of the advanced security features like Oracle Connection Manager. You can find more about choosing the right load balancer in the following white paper.

While these technologies are free and can be deployed manually using script-based installations, systems such as ClusterControl automate the process with their point-and-click interface. ClusterControl also lets you deploy caching technologies.

Installation & Administration Simplicity

The latest available Oracle DB version added a long-awaited installation feature: Oracle 18c can now be installed on Oracle Linux using an RPM. Dedicated Java-based installation was always a problem for those who wanted to write automation for their cookbooks or Puppet code snippets. You could go with predefined silent installation but the file was changing from time to time and still, you had to deal with dependency-hell. RPM-based installation was definitely a good move.

So how does it work in MariaDB?

For those who are moving from the Oracle world, it’s always a nice surprise to see how fast you can deploy instances, create new databases or even set up complex replication flows. The installation and configuration process is probably the smoothest part of the migration process. Although choosing the right settings takes time and knowledge.

Oracle provides a set of binary distributions of MySQL. These include generic binary distributions in the form of compressed tar files (files with a .tar.gz extension) for a number of platforms and binaries in platform-specific packages. On the Windows platform, you can find a standard installation wizard via a GUI.

The Oracle database configuration assistant (DBCA) is basically not needed as you will be able to create a database with a single line command.

CREATE [OR REPLACE] {DATABASE | SCHEMA} [IF NOT EXISTS] db_name
    [create_specification] ...

create_specification:
    [DEFAULT] CHARACTER SET [=] charset_name
  | [DEFAULT] COLLATE [=] collation_name

You can also have a database with different database collations and character sets under the same MariaDB instance.

Replication setup is just to enable binary logging on a master (similar to archive log in Oracle) and running the following command on the slave to attach it to the master.

CHANGE MASTER TO
MASTER_HOST = host,
MASTER_PORT = port,
MASTER_USER = replication_user,
MASTER_PASSWORD = password,
MASTER_AUTO_POSITION = 1;

Security and Compliance

Oracle provides enhanced database security.

User authentication is performed in Oracle by specifying global roles in addition to location, username, and password. In Oracle, User authentication is performed by different authentication methods including database authentication, external authentication, and proxy authentication.

For a long time roles were not available in MariaDB or MySQL. MariaDB added roles with version 10.2 after they appeared in MySQL 8.0.

Roles, an option that is heavily used in common Oracle DB setups, can be easily transformed in MariaDB, so you don’t have to waste time on single user permission adjustments.

Create, alter user, passwords: it all works similarly to Oracle DB.

To achieve enterprise security compliance standards, MariaDB offers built-in features like:

Audit plugin in
Encryption of data-at-rest
Certificates, TSS connection
PAM Plugin

Audit plugin offers a sort of fine-grained auditing (FGA) or AUDI SQL available in Oracle. It does not offer the same set of features but usually, it’s good enough to satisfy security compliance audits.

Encryption of data-at-rest Data at rest encryption can be a requirement for security regulations like HIPAA or PCI DSS. Such encryption can be implemented on multiple levels - you can encrypt the whole disk on which the files are stored. You can encrypt only the MySQL database through functionality available in the latest versions of MySQL or MariaDB. Encryption can also be implemented in the application so that it encrypts the data before storing it in the database. Every option has its pros and cons: disk encryption can help only when disks are physically stolen, but the files would not be encrypted on a running database server.

PAM Plugin extends logging functionality to tight user accounts with LDAP settings. In fact, I find it much easier to set up than LDAP Integration with Oracle Database.

Replication & HA

MariaDB is well known for its replication simplicity and flexibility. By default, you can read or even write to your standby/slave servers. Luckily, MySQL 10.X versions brought many significant enhancements to Replication, including Global Transaction IDs, event checksums, multi-threaded slaves and crash-safe slaves/masters at making replication even better. DBAs accustomed to MySQL replication reads and writes would expect a similar or even simpler solution from it's bigger brother, Oracle. Unfortunately not by default.

The standard physical standby implementation for Oracle is closed for any read-write operations. In fact, Oracle offers logical variation but it has many limitations, and it's not designed for HA. The solution to this problem is an additional paid feature called Active Data Guard, which you can use to read data from the standby while you apply redo logs.

Active Data Guard is a paid add-on solution to Oracle’s free Data Guard disaster recovery software available only for Oracle Database Enterprise Edition (highest cost license). It delivers read-only access, while continuously applying changes sent from the primary database. As an active standby database, it helps offload read queries, reporting and incremental backups from the primary database. The product’s architecture is designed to allow standby databases to be isolated from failures that may occur at the primary database.

An exciting feature of Oracle database 12c and something that Oracle DBA would miss is the data corruption validation. Oracle Data Guard corruption checks are performed to ensure that data is in exact alignment before data is copied to a standby database. This mechanism can also be used to restore data blocks on the primary directly from the standby database.

MariaDB offers various replication methods and replication features like:

synchronous,
asynchronous,
semi-synchronous

The feature set for MariaDB replication is rich. With synchronous replication, you can set up failover with no write transaction loss. To reduce asynchronous replication delays you may wish to go with In-order parallelized replication on slaves. The events that can be compressed are the events that normally can be of significant size: Query events (for DDL and DML in statement-based replication), and row events (for DML in row-based replication). Similar to other compression options MariaDB compressed replication is transparent. As mentioned before, the whole process is very easy compared to Oracle Data Guard physical and logical replication.

PL/SQL and database code

Now we come to the tough part: PL/SQL.

While replication and HA with MariaDB reign supreme. Oracle is the king of PL/SQL, no doubt there.

PL/SQL is the main obstacle for migration into the open source world in many organizations . But MariaDB does not give up here.

MariaDB 10.3 (also known as MariaDB TX 3.0) has added some amazing new features including SEQUENCE constructs, Oracle-style packages, and the ROW data type – making migrations much easier.

With new parameter SQL_MODE = ORACLE, MariaDB is now able to parse, depending on the case, a bunch of the legacy Oracle PL/SQL without rewriting the code.

As we can find on their customer story page using the core Oracle PL/SQL compatibility in MariaDB TX 3.0, the Development Bank of Singapore (DBS) has been able to migrate more than half of their business-critical applications in just 12 months from Oracle Database to MariaDB.

The new compatibility mode helps with the following syntax:

Loop Syntax
Variable Declaration
Non-ANSI Stored Procedure Construct
Cursor Syntax
Stored Procedure Parameters
Data Type inheritance (%TYPE, %ROWTYPE)
PL/SQL style Exceptions
Synonyms for Basic SQL Types (VARCHAR2, NUMBER, …)

But if we take a look at the older version 10.2, some of the compatibility between Oracle and MariaDB appeared before like:

Common table expressions
Recursive SQL queries
Windows Functions, NTILETE, RANK, DENESE_RANK.

Native PL/SQL parsing or is some cases direct execution of native Oracle procedures can greatly reduce the cost of development.

Another very useful feature added by SQL_MODE=Oracle is sequences. The implementation of sequences in MariaDB Server 10.3 follows the SQL:2003 standard and includes syntax compatibility with Oracle.

To create a sequence, a create statement is used:

CREATE SEQUENCE Sequence_1 
  START WITH 1  
  INCREMENT BY 1;

When created sequences can be used for example with inserts like:

INSERT INTO database (database_id, database_name) VALUES(Sequence_1.NEXTVAL, 'MariaDB');

Clustering and scaling

MariaDB is an asynchronous, active-active, multi-master database cluster.

MariaDB Cluster differs from what is known as Oracle’s MySQL Cluster - NDB.

MariaDB cluster is based on the multi-master replication plugin provided by Codership (Galera). Since version 5.5, the Galera technology (wsrep API) is an integral part of MariaDB. The Galera plugin architecture stands on three core layers: certification, replication, and group communication framework.

The certification layer prepares the write-sets and does the certification checks on them, guaranteeing that they can be applied.

The Replication layer manages the replication protocol and provides the total ordering capability.

Group Communication Framework implements a plugin architecture which allows other systems to connect via gcomm backend schema.

The main difference from Oracle RAC is that each node has separated data. Oracle RAC is commonly mistaken as a complementary HA solution while disks are usually in the same disk array. MariaDB not only offers redundant storage but it also supports geo located clustering without the need of dedicated fiber.

Backup and Recovery

Oracle offers many backup mechanisms including hot backup, backup, import, export, and many others.

Contrary to MySQL, MariaDB offers an external tool for hot backups called mariabackup. It is a fork of Percona XtraBackup designed to work with encrypted and compressed tables and is the recommended backup method for MariaDB databases.

MariaDB Server 10.1 introduced MariaDB Compression and Data-at-Rest Encryption, but the existing backup solutions did not support full backup capability for these features. So MariaDB decided to extend XtraBackup (version 2.3.8) and named this solution Mariabackup.

Percona and Mariabackup offer similar functionalities, but if you are interested in differences, you can find them here.

What MariaDB does not offer is the recovery catalog of your database backups. Fortunately, this can be extended with third-party systems like ClusterControl.

Cloud compatibility

Cloud infrastructures are getting increasingly popular these days. Although a cloud VM may not be as reliable as an enterprise-grade server, the main cloud providers offer a variety of tools to increase service availability. You can choose between EC2 architecture or DBaaS like Amazon RDS.

Amazon RDS supports MariaDB Server 10.3. It does not support SQL_MODE=Oracle but you still can find a set of features making it easier to migrate. Amazon cloud supports common management tasks like monitoring, backups, multi A-Z deployments, etc..

Another popular cloud provider, Google Cloud, also offers the most recent MariaDB version. You can deploy it as a container or Bintami library certified VM image.

Azure also offers its own implementation of MariaDB. It’s similar to Amazon RDS, with the backups, scaling and builds in high availability. The guaranteed SLA is 99.99% which corresponds to 4 m 23 seconds per month of downtime.

Miscellaneous Considerations

As mentioned in the very beginning of this article, Oracle to MariaDB migration is a multi-stage process. A piece of general advice will be not to try to migrate all of the databases at once. Dividing the migration into small batches is, in most scenarios, the best approach.

If you are not familiar with the technology, give it a try. You should feel confident with the platform and know it’s pros and cons. Testing will build confidence and it affects your decisions with regards to migration.

There are interesting tools which can help you with the most difficult PL/SQL migration process. So interesting ones are dbconvert, AWS Schema Conversion Tool - AWS Documentation.

Finally, you can join me on March 12th for a webinar during which I’ll walk you through all you need to know when it comes to migrating from Oracle database to MariaDB.

Tags:

↧

The Current State of Open Source Backup Management for PostgreSQL

February 27, 2019, 12:11 am

≫ Next: MongoDB vs MySQL NoSQL - Why Mongo is Better

≪ Previous: Migrating from Oracle Database to MariaDB - What You Should Know

There are many ways to address taking backups of a PostgreSQL cluster. There are several articles and blogs which present the various technologies by which we can save our precious data in PostgreSQL. There are logical backup solutions, physical backup at the OS level, at the filesystem level, and so forth. Here in this blog we are not gonna cover the theoretical part which is adequately covered by various blogs and articles as well as the official documentation.

This blog is focusing on the state of the various tools and solutions available and an effort on presenting a thorough comparison based on real life experiences. This article in no way tries to promote any specific product, I really like all the tools, solutions and technologies described in this blog. The aim here is to note down their strengths, their weaknesses and to guide the end user as to which tool would best fit his/her environment, infrastructure and specific requirements. Here is a nice article describing backup tools for PostgreSQL at various levels.

I will not describe how to use the various tools in this blog, since this info is documented in the above blog and also in the official docs as well as other resources over the net. But I will describe the pros and cons as I experienced them in practice. In this blog, we are dealing exclusively with classic PITR-based physical PostgreSQL backups dependent on:

pg_basebackup or pg_start_backup()/pg_stop_backup
physical copy
archiving of WALs or streaming replication

There are several fine products and solutions, some are open source and free to use while others are commercial. To the best of my knowledge, those are:

pgbarman by 2ndquadrant (free)
pgbackrest (free)
pg_probackup by Postgres Professional (free)
BART by EDB (commercial)

I did not have the chance to try out BART since it runs on flavors of Linux that I don’t use. In this blog, I will include my own thoughts and impressions while interacting with the respective authors/community/maintainers of each solution since this a very important aspect which goes usually underestimated in the beginning. A little bit of terminology in order to better understand the various terms in each of the tools:

Terminology \ Tool	barman	pgbackrest	pg_probackup
name for backup site location	catalog	repository	catalog
name for cluster	server	stanza	instance

pgbarman

Pgbarman or just barman is the oldest of those tools. The latest release is 2.6 (released while I had this blog in the works! which is great news).

Pgbarman supports base backup via two methods:

pg_basebackup (backup_method=postgres)
rsync (backup_method=rsync)

and WAL transfer via:

WAL archiving
- via rsync
- via barman-wal-archive / put-wal
WAL via streaming replication with replication slot
- Asynchronous
- Synchronous

This gives us 8 out of the box combinations by which we can use barman. Each has its pros and cons.

Base backup via pg_basebackup (backup_method = postgres)

Pros:

the newest/modern way
relies on proven core PostgreSQL technology
recommended by the official docs

Cons:

no incremental backup
no parallel backup
no network compression
no data deduplication
no network bandwidth limit

Base backup via rsync (backup_method = rsync)

Pros:

old and proven
Incremental backup
data deduplication
network compression
parallel backup
network bandwidth limit

Cons:

not the recommended (by the authors) way

WAL transfer via WAL archiving (via rsync)

Pros:

simpler to setup

Cons:

No RPO=0 (zero data loss)
no way to recover from long and persisting network failures

WAL transfer via WAL archiving (via barman-wal-archive / put-wal)

Pros:

the latest and recommended way (introduced in 2.6)
more reliable/safe than rsync

Cons:

No RPO=0 (zero data loss)
still no way to recover from long and persisting network failures

WAL transfer via WAL streaming with replication slot (via pg_receivewal)

Pros:

more modern (and recommended)
RPO=0 (zero data loss) in synchronous mode

Cons:

always associated with replication slot. Could grow in case of network failures

So, while pg_basebackup (postgres method) seems like the future for pgbarman, in reality, all the fancy features come with the rsync method. So let us list all the features of Barman in more detail:

Remote operation (backups/restores)
Incremental backups. One of the great features of barman, incremental backups are based on the file level comparison of the database files against those of the last backup in the catalog. In barman the term “differential” refers to a different concept: By barman terminology, a differential backup is the last backup + the individual changes from the last backup. Barman docs say that they provide differential backups via WALs. Barman incremental backups work on file-level, which means if a file is changed the whole file is transfered. This is like pgbackrest and unlike some other offerings like pg_probackup or BART which support block-level differential/incremental backups. Barman incremental backups are specified via: reuse_backup = link or copy. By defining “copy” we achieve reduced time of the backup since only the changed files are transferred and backed up but still no reduction in space since the unchanged files are copied from the previous backup. By defining “link” then the unchanged files are hard linked (not copied) from the last backup. This way we achieve both time reduction and space reduction. I don’t want in any way to bring more confusion in this, but in reality, barman incremental backups are directly comparable with pgbackrest incremental backups, since barman treats (via link or copy) an incremental backup effectively as a full backup. So in both systems, an incremental backup deals with the files which were changed since the last backup. However, regarding differential backups, it means a different thing in each of the aforementioned system, as we’ll see below.
Backup from standby. Barman gives the option to perform the bulk of the base backup operations from a standby thus freeing the primary from the added IO load. However, note that still the WALs must come from the primary. It doesn’t matter if you use archive_command or WAL streaming via replication slots, you can’t yet (as of this writing with barman being on version 2.6) offload this task to the standby.
parallel jobs for backup and recover
A rich and comprehensive set of retention settings based on either:
- Redundancy (number of backups to keep)
- Recovery window (how back in the past should the backups be kept)
In my opinion from a user perspective, the above is great. The user may define reuse_backup = link and a recovery window and let barman (its cron job) deal with the rest. No diff/incr/full etc backups to worry about or schedule or manage. The system (barman) just does the right thing transparently.
Programming your own pre/post event hook scripts.
Tablespace remapping

Those are the best strengths of barman. And truly this is almost more than the average DBA would ask from a backup and recovery tool. However, there are some points that could be better:

The mailing list is not so active and the maintainers rarely write or answer questions
No feature to resume a failed/interrupted backup
Replication slots or the use of rsync/barman-wal-archive for archiving are not forgiving in case of failed network or other failures of the backup site. In either case, if the network outage is long enough and the changes in the DB worth a lot of WAL files then the primary will suffer from “no space left on device” and will eventually crash. (not a good thing). What is promising here is that barman now provides an alternative (to rsync) way to transfer WALs so that additional protection against e.g. pg_wal space exhaustion might be implemented in the future, which along with backup resume would truly make barman perfect, at least for me.

pgbackrest

Pgbackrest is the current trend among the open source backup tools, mainly because its efficiency to cope with very large volumes of data and the extreme care its creators put into validation of backups via checksums. As of this writing it is on version v2.09, and the docs are to found here. The User Guide might be slightly outdated but the rest of the docs are very up to date and accurate. Pgbackrest relies on WAL archiving using its own archive_command and its own file transfer mechanism which is better and safer than rsync. So pgbackrest is pretty forward since it does not give the larger set of choices that barman provides. Since there is no synchronous mode involved, naturally pgbackrest does not guarantee RPO=0 (zero data loss). Let us describe pgbackrest’s concepts:

A backup can be:
- Full. A full backup copies the entire database cluster.
- Differential (diff). A differential backup copies only the files that were changed since the last full backup. For a successful restore, both the differential backup and the previous full backup must be valid.
- Incremental (incr). An incremental backup copies only the files that were changed since the last backup (which may be a full backup, a differential or even an incremental one). Similarly to the differential backup, in order to do a successful restore, all previous required backups (including this backup, the latest diff and the previous full) must be valid.
A stanza is the definition of all required parameters of a PostgreSQL cluster. A PostgreSQL server normal has its own stanza, whereas backup servers will have one stanza for every PostgreSQL cluster that they backup.
A configuration is where information about stanzas is kept (usually /etc/pgbackrest.conf)
A repository is where pgbackrest keeps WALs and backups

The user is encouraged to follow the documentation as the documentation itself suggests, from the top to the bottom. The most important features of pgbackrest are:

Parallel backup and restore
No direct SQL access to the PostgreSQL server needed
Local/Remote operation
Retention based on:
- full backup retention (numbers of full backups to keep)
- diff backup retention (numbers of diff backups to keep)
Incremental backups don’t have their own retention and are expired as soon as a prior backup expires. So the user can define a schedule for taking full backups and a rolling set of diff backups between them.
Backup from standby. Some files still need to come from the primary but the bulk copy takes place on the standby. Still WALs must originate from the primary.
Backup integrity. The people behind pgbackrest are extremely careful when it comes to the integrity of the backups. Each file is checksummed at backup time and also is checked after the restore to make sure that no problematic hardware or software bug may result in a faulty restore. Also if page level checksums are enabled on the PostgreSQL cluster then they are also computed for each file. In addition, checksums are computed for every WAL file.
If compression is disabled and hard links are enabled it is possible to bring up the cluster directly from the catalog. This is extremely important for multi TB large databases.
Resume of a failed/interrupted back. Very handy in case of unreliable networks.
Delta restore: Ultra fast restore for large databases, without cleaning the whole cluster.
Asynchronous & Parallel WAL push to the backup server. This is one of the strongest points of pgbackrest. The PostgreSQL archiver only copies to the spool via archive-push and the heavy job of the transfer and the processing happens in a separate pgbackrest process. This allows for massive WAL transfers while ensuring low PostgreSQL response times.
Tablespace remapping
Amazon S3 support
Support for max WAL queued size. When the backup site is down or the network is failing, using this option will mock like the archiving was successful, allowing PostgreSQL to delete WAL preventing filling up pg_wal, and thus save the pgsql server from a potential PANIC.

So feature-wise pgbackrest puts a lot of emphasis when it comes to data validation and performance, no surprise that it is used by the biggest and busiest PostgreSQL installations. However, there is one thing that could be improved:

It would really handy to have a more “liberal” option as far as retention is concerned, i.e. provide a way to declarative specify some retention period and then let pgbackrest deal with full/diff/incr backups as needed.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

pg_probackup

Pg_proback is another promising tool for backups. It is originally based on pg_arman. Its emphasis is on the performance of the backup. It is based on catalogs, and instances, very similar to the rest of the tools, so we have. Its main features include:

Rich backup-level support ranging from:
- Full backups
- Incremental of three types:
  - PAGE backup. Level changes found via WAL scanning. Requires full access to the uninterrupted WAL sequence since the previous backup.
  - DELTA backup. Only changed pages are copied to the backup. Independent from WAL archiving, puts certain load on the server.
  - PTRACK backup. Requires special pgsql core patching. Works by maintaining a bitmap on the fly as soon as pages are modified. Really fast backup with minimal load on the server.
Backups can also be divided into:
- Autonomous backups. Those have no requirements on WAL outside the backup. No PITR.
- Archive backups. Those rely on continuous archiving, and support PITR.
multithreaded model (in contrast to barman, pgbackrest and of course PostgreSQL itself which follow a multiprocess model)
Data consistency and on demand validation without restore
Backup from a standby without access to the primary.
An expressive retention policy specification where redundancy can be used in an AND fashion along with window. Merging backups (via merge) is supported by converting previous incremental backups to full as a way to free space and to provide a method for smooth backup rotation.

So, pg_probackup provides a set of great features with emphasis on performance something which would benefit large installations. However, there are still some things missing, namely:

No official release supports remote backups. This means that pg_probackup must run on the same host as the pgsql cluster. (There is a dev branch which deals with backup from a remote site as well as archiving to a remote backup server)
No support for a failed backup resume.

We can summarize the above paragraphs with a feature matrix like the below.

Feature\Tool	pgbarman	pgbackrest	pg_probackup
Zero data loss	YES	NO	NO
Remote operation	YES	YES	NO
file copy	pg_basebackup or rsync	pgbackrest over ssh	pg_probackup
WAL via archiving	YES	YES	YES
WAL archiving method	rsync, barman-wal-archive	pgbackrest archive-push	pg_probackup archive-push
Async WAL archiving	NO	YES	NO
WAL via streaming	YES	NO	YES (only for autonomous)
Synchronous Streaming	YES	NO	NO
backup from standby	YES	YES	YES
WALs from standby	NO	NO	YES
backup exclusively from standby	NO	NO	YES
diff backups (from last full)	YES	YES	YES (via merge)
incr backups (from last backup)	YES (same as above)	YES	YES
transparent backups rotation	YES	NO	YES
files-based comparison	YES	YES	NO
block-level changes	NO	NO	YES
parallel backup/restore	YES	YES	YES (via threads)
Resume failed backup	NO	YES	NO
Resilience during network/recovery site failure (pg_wal related)	NO	YES	NO
tablespace remapping	YES	YES	YES
retention based on	Redundancy OR Window	Full And/Or Diff Redundancy	Redundancy AND Window
Help from community/ maintainers/authors	Poor	Excellent	Very Good

ClusterControl

ClusterControl provides the functionality either to generate an immediate backup or to schedule one, and automate the task in a simple and fast way.

We can choose between two backup methods, pgdump (logical) and pg_basebackup (binary). We can also specify where to store the backups (on the database server, on the ClusterControl server or in the cloud), the compression level and encryption.

Also, with ClusterControl we can use the Point-in-Time Recovery feature and backup verification to validate the generated backup.

Tags:

↧

MongoDB vs MySQL NoSQL - Why Mongo is Better

February 28, 2019, 2:48 am

≫ Next: A Review of the New Analytic Window Functions in MySQL 8.0

≪ Previous: The Current State of Open Source Backup Management for PostgreSQL

There are so many database management systems (DBMS) to choose from ranging from relational to non-relational DBMS. In the past years, the Relational DBMS where more dominant but with recent data structure trends the non-relational DBMS are becoming more popular. The choices for relational DBMS are quite obvious: MySQL, PostgreSQL and MS SQL. On the other hand, MongoDB a non-relational DBM has come to rise basically due to its ability to handle a large set of data. Every selection has got its pros and cons but your choice will mainly be determined by your application needs since both serve in different niches. However, in this article, we are going to discuss the pros of using MongoDB over MySQL.

Pros of Using MongoDB Over MySQL

Speed and performance
High Availability and Cloud Computing
Schema Flexibility
Need to grow bigger
Embedding feature
Security Model
Location-based data
Rich query language support

Speed and Performance

This is one of the major benefits of using MongoDB over MySQL especially when a large set of unstructured data is involved. MongoDB by default encourages high insert rate over transaction safety. This feature is not available in MySQL hence for instance if you are to save a lot of data to your DBM at once, in the case of MySQL you will have to do it one by one. But in the case of MongoDB, with the availability of insertMany() function, you can safely do the multiple inserts. Observing some of the querying behaviours of the two, we can summarize the different operation requests for 1 million documents in the illustration below.

In the case of updating which is a write operation, MongoDB takes 0.002 seconds to update all student emails whereas MySQL takes 0.2491s to execute the same task.

From the illustration, we can conclude that MongoDB takes way lesser time than MySQL for the same operations. MongoDB is mainly structured such that documents are the basis of storage which promotes huge query and data storage. This implies that the performance is dependent on two key values that are the design and scale out. On the other hand, MySQL has data stored in an individual table hence at some point one has to lookup on the entire table before doing a write operation.

High Availability and Cloud Computing

For unstable environments, MongoDB provides a better handling technique than MySQL. This is because it takes very less time for the active secondary nodes to elect a new primary node thus easy administration at the point of failure. Besides, due to comprehensive secondary indexes and native replication, creating a backup for a MongoDB database is quite easy as compared to MySQL since the latter has integrated replication support.

In a nutshell, setting a set of servers that can act as Master-Slaves is easy and fast in MongoDB than MySQL. Besides, recovery from a cluster failure is instant, automatic and safe. For MySQL, there is no clear official solution for providing failover between master and slave in the event of a failure.

Cloud-based storage solutions require data to be smoothly spread across various server to scale up. MongoDB can load a high volume of data as compared to MySQL and with built-in sharding, it is easy to partition and spread out data across multiple servers as a way of utilizing the cost-saving solution as per the cloud-based storage merits.

Schema Flexibility

MongoDB is schemaless such that different documents in the same collection may have the same or different fields from each other. This means there is no restriction on document structure for every insert or update hence changes to the data model won’t have much impact. Of course, there are scenarios that can opt one to use undefined schema for example if you are de-normalizing a database schema or when your database is growing but your schema is unstable. MongoDB therefore allows one to add various types of data as per needs change.

On the other hand, MySQL is table oriented whereby each row must have the same columns as the other rows. Adding a new column would require one to run an ALTER operation which is quite expensive in terms of performance as it will have to lock up the entire database. This is especially the case when the table grows over 10GB, MongoDB does not have this issue.

With a flexible schema it is easy to develop and maintain a cleaner code. Besides, MongoDB provides the option of using a JSON validator in case you want to ensure some data integrity and consistency for your collection hence you can do some validation before insert or update of a document.

The Need to Grow Bigger

Databases scaling is not an easy undertaking especially with MySQL it may result in degraded performance when the 5-10GB per table memory is surpassed. With MongoDB, this is not an issue since one can partition and shard the database with the In-built sharding feature. Once a shard key is specified and sharding is enabled, data is partitioned evenly according to the shard key. If a new shard is added, there is automatic rebalancing. Sharding basically allows horizontal scaling which is difficult to implement in MySQL. Besides, MongoDB has got built-in replication whereby replica sets create multiple copies of the data. Each member of this set has a role either as primary or secondary at any point in the process.

Reads and writes are done on the primary and then replicated to the secondaries. With this merit in place, in case of data inconsistency or instance failure, a new member may be voted in to act as primary.

Embedding Feature

Unlike MySQL where you cannot embed data to a field, MongoDB offers a better embedding technique for related data. As much as you can do a JOIN for tables in MySQL, you may end up having so many tables with some being unnecessary especially if they don’t involve so many fields. In the case of MongoDB you can decide to embed data into a field for related data or reference from another collection if you expect the document grow in future beyond the JSON document size.

For example if we have data for users who we want to capture their addresses and some other information, in the case of MongoDB we can easily have a simple structure like

{
    id:1,
    name:'George Bush',
    gender: 'Male',
    age:45,
    address:{
        City: 'New York',
        Street: 'Florida',
        Zip_code: 1342243
    }
}

But in the case of MySQL we will have to make 2 tables with an id referencing in this case. I.e

Users details table

id	name	gender	age
1	George Bush	Male	45

User address table

id	City	Street	Zip_code
1	George Bush	Male	134224

In MySQL you will have so many tables which could be so hectic to deal with especially when scaling is involved. As much as one can also do a table join in a single query when fetching this data in MySQL, the latency is quite larger as compared to MongoDB and this is one of the reasons that makes the performance of MongoDB outdo the performance of MySQL.

Become a MongoDB DBA - Bringing MongoDB to Production

Learn about what you need to know to deploy, monitor, manage and scale MongoDB

Download for Free

Security Model

Database administration (DBA) is quite essential in MySQL but not necessary in the case of MongoDB. This means you need to have the DBA to modify a schema in the case of MySQL when an application changes. On the other hand, one can do schema modification without DBA in MongoDB since it is great for class persistence and a class can equally be serialized to JSON and stored. However, this is the best practice if you don’t expect the data to grow big otherwise you will need to follow some best practices to avoid pitfalls.

Location Based Data

In order to improve on throughput operations especially read operations, MongoDB provides built-in special functions that enhance finding relevant data from specific locations which are accurate hence fastening the process. This is not possible in the case of MySQL.

Rich Query Language Support

On a personal interest as a MongoDB enthusiast, I got my attraction with flexibility on querying feature of MongoDB. Regarding the aggregation framework in the later versions and MapReduce feature, one can optimize the result data to suit own specifications. As much as MySQL also offers operations such as grouping, sorting and many more, MongoDB is quite extensive especially with embedded data structures. Further as mentioned early, queries are returned with lesser latency in the aggregation framework than when a JOIN was to be done in the case of MySQL. For instance, MongoDB offers an easy way of modifying a schema using the $set and $unset operations for the embedded schema. But, in the case of MySQL, one has to do the ALTER command for the only table within which the field exists and this is quite expensive in terms of performance.

Conclusion

Regarding the merits discussed above, as much as database selection absolutely depends on application design MongoDB offers a lot of flexibility along different lines. If you are looking for something that will cater for better performance, dealing with complex data hence no need restrictions on schema design, future expectations on database growth and rich query language technique, I would recommend you to go for MongoDB.

Tags:

↧

A Review of the New Analytic Window Functions in MySQL 8.0

March 1, 2019, 2:48 am

≫ Next: How to Migrate from MSSQL to MySQL

≪ Previous: MongoDB vs MySQL NoSQL - Why Mongo is Better

Data is captured and stored for a variety of reasons. Hours beyond count (and even more budget) invested in collecting, ingesting, structuring, validating, and ultimately storing of data; to say that it is a valuable asset is to drive home a moot point. This day in age it may, in fact, be our most precious commodity.

Some data is used strictly as an archive. Perhaps to record or track events that happened in the past. But the other side of that coin is that historical data has value in basing decisions for the future and future endeavors.

What day to have our sale on? (Planning for future sales based on how we did in the past.)
Which salesperson performed the best in quarter one? (Looking back, who can we reward for their efforts.)
Which restaurant is frequented the most in the middle of July? (The travel season is upon us... Who can we sell our foodstuffs and goods to?)

You get the picture. Using data on hand is integral for any organization.

Many companies build, base, and provide services with data. They depend on it.

Several months back, depending on when you are reading this, I began walking for exercise, in earnest, to lose weight, get a handle on my health, and to seek a daily bit of solitude from this busy world we live in.

I used a mobile pedometer app to track my hikes, even considering which shoes I wore, as I have a tendency to be ultra-picky when it comes to footwear.

While this data is not nearly as important as that mentioned in those scenarios above, for me, a key element in learning anything, is using something I am interested in, can relate to, and understand.

Window Functions have been on my radar to explore for a long while now. So, I thought to try my hand at a couple of them in this post. Having recently been supported in MySQL 8 (Visit this Severalnines blog I wrote about MySQL 8 upgrades and new additions where I mention them briefly) that ecosystem is the one I will use here. Be forewarned, I am not a window analytical function guru.

What is a MySQL Window Function?

The MySQL documentation defines them as such:"A window function performs an aggregate-like operation on a set of query rows. However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row:"

Data Set and Setup for This Post

I store the captured data from my walks in this table:

mysql> DESC hiking_stats;
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| day_walked      | date         | YES  |     | NULL    |       |
| burned_calories | decimal(4,1) | YES  |     | NULL    |       |
| distance_walked | decimal(4,2) | YES  |     | NULL    |       |
| time_walking    | time         | YES  |     | NULL    |       |
| pace            | decimal(2,1) | YES  |     | NULL    |       |
| shoes_worn      | text         | YES  |     | NULL    |       |
| trail_hiked     | text         | YES  |     | NULL    |       |
+-----------------+--------------+------+-----+---------+-------+
7 rows in set (0.01 sec)

There is close to 90 days worth of data here:

mysql> SELECT COUNT(*) FROM hiking_stats;
+----------+
| COUNT(*) |
+----------+
|       84 |
+----------+
1 row in set (0.00 sec)

I'll admit, I am finicky about my footwear so let's determine which pair of shoes I favored most:

mysql> SELECT DISTINCT shoes_worn, COUNT(*)
    -> FROM hiking_stats
    -> GROUP BY shoes_worn;
+---------------------------------------+----------+
| shoes_worn                            | COUNT(*) |
+---------------------------------------+----------+
| New Balance Trail Runners-All Terrain |       30 |
| Oboz Sawtooth Low                     |       47 |
| Keen Koven WP(keen-dry)               |        6 |
| New Balance 510v2                     |        1 |
+---------------------------------------+----------+
4 rows in set (0.00 sec)

In order to provide a better, manageable on-screen demonstration, I will limit the remaining portion of query results to just those of the favorite shoes I wore 47 times.

I also have a trail_hiked column and since I was in 'ultra exercise mode' during this almost 3 month period, I even counted calories while push mowing the yard:

mysql> SELECT DISTINCT trail_hiked, COUNT(*)
    -> FROM hiking_stats
    -> GROUP BY trail_hiked;
+------------------------+----------+
| trail_hiked            | COUNT(*) |
+------------------------+----------+
| Yard Mowing            |       14 |
| Sandy Trail-Drive      |       20 |
| West Boundary          |       29 |
| House-Power Line Route |       10 |
| Tree Trail-extended    |       11 |
+------------------------+----------+
5 rows in set (0.01 sec)

Yet, to even further limit the data set, I will filter out those rows as well:

mysql> SELECT COUNT(*)
    -> FROM hiking_stats
    -> WHERE shoes_worn = 'Oboz Sawtooth Low'
    -> AND
    -> trail_hiked <> 'Yard Mowing';
+----------+
| COUNT(*) |
+----------+
|       40 |
+----------+
1 row in set (0.01 sec)

For the sake of simplicity and ease of use, I will create a VIEW of columns to work with:

mysql> CREATE VIEW vw_fav_shoe_stats AS
    -> (SELECT day_walked, burned_calories, distance_walked, time_walking, pace, trail_hiked
    -> FROM hiking_stats
    -> WHERE shoes_worn = 'Oboz Sawtooth Low'
    -> AND trail_hiked <> 'Yard Mowing');
Query OK, 0 rows affected (0.19 sec)

Leaving me with this set of data:

mysql> SELECT * FROM vw_fav_shoe_stats;
+------------+-----------------+-----------------+--------------+------+------------------------+
| day_walked | burned_calories | distance_walked | time_walking | pace | trail_hiked            |
+------------+-----------------+-----------------+--------------+------+------------------------+
| 2018-06-03 |           389.6 |            4.11 | 01:13:19     |  3.4 | Sandy Trail-Drive      |
| 2018-06-04 |           394.6 |            4.26 | 01:14:15     |  3.4 | Sandy Trail-Drive      |
| 2018-06-06 |           384.6 |            4.10 | 01:13:14     |  3.4 | Sandy Trail-Drive      |
| 2018-06-07 |           382.7 |            4.12 | 01:12:52     |  3.4 | Sandy Trail-Drive      |
| 2018-06-17 |           296.3 |            2.82 | 00:55:45     |  3.0 | West Boundary          |
| 2018-06-18 |           314.7 |            3.08 | 00:59:13     |  3.1 | West Boundary          |
| 2018-06-20 |           338.5 |            3.27 | 01:03:42     |  3.1 | West Boundary          |
| 2018-06-21 |           339.5 |            3.40 | 01:03:54     |  3.2 | West Boundary          |
| 2018-06-24 |           392.4 |            3.76 | 01:13:51     |  3.1 | House-Power Line Route |
| 2018-06-25 |           362.1 |            3.72 | 01:08:09     |  3.3 | West Boundary          |
| 2018-06-26 |           380.5 |            3.94 | 01:11:36     |  3.3 | West Boundary          |
| 2018-07-03 |           323.7 |            3.29 | 01:00:55     |  3.2 | West Boundary          |
| 2018-07-04 |           342.8 |            3.47 | 01:04:31     |  3.2 | West Boundary          |
| 2018-07-06 |           375.7 |            3.80 | 01:10:42     |  3.2 | West Boundary          |
| 2018-07-07 |           347.6 |            3.40 | 01:05:25     |  3.1 | Sandy Trail-Drive      |
| 2018-07-08 |           351.6 |            3.58 | 01:06:09     |  3.2 | West Boundary          |
| 2018-07-09 |           336.0 |            3.28 | 01:03:13     |  3.1 | West Boundary          |
| 2018-07-11 |           375.2 |            3.81 | 01:10:37     |  3.2 | West Boundary          |
| 2018-07-12 |           325.9 |            3.28 | 01:01:20     |  3.2 | West Boundary          |
| 2018-07-15 |           382.9 |            3.91 | 01:12:03     |  3.3 | House-Power Line Route |
| 2018-07-16 |           368.6 |            3.72 | 01:09:22     |  3.2 | West Boundary          |
| 2018-07-17 |           339.4 |            3.46 | 01:03:52     |  3.3 | West Boundary          |
| 2018-07-18 |           368.1 |            3.72 | 01:08:28     |  3.3 | West Boundary          |
| 2018-07-19 |           339.2 |            3.44 | 01:03:06     |  3.3 | West Boundary          |
| 2018-07-22 |           378.3 |            3.76 | 01:10:22     |  3.2 | West Boundary          |
| 2018-07-23 |           322.9 |            3.28 | 01:00:03     |  3.3 | West Boundary          |
| 2018-07-24 |           386.4 |            3.81 | 01:11:53     |  3.2 | West Boundary          |
| 2018-07-25 |           379.9 |            3.83 | 01:10:39     |  3.3 | West Boundary          |
| 2018-07-27 |           378.3 |            3.73 | 01:10:21     |  3.2 | West Boundary          |
| 2018-07-28 |           337.4 |            3.39 | 01:02:45     |  3.2 | Sandy Trail-Drive      |
| 2018-07-29 |           348.7 |            3.50 | 01:04:52     |  3.2 | West Boundary          |
| 2018-07-30 |           361.6 |            3.69 | 01:07:15     |  3.3 | West Boundary          |
| 2018-07-31 |           359.9 |            3.66 | 01:06:57     |  3.3 | West Boundary          |
| 2018-08-01 |           336.1 |            3.37 | 01:01:48     |  3.3 | West Boundary          |
| 2018-08-03 |           259.9 |            2.57 | 00:47:47     |  3.2 | West Boundary          |
| 2018-08-05 |           341.2 |            3.37 | 01:02:44     |  3.2 | West Boundary          |
| 2018-08-06 |           357.7 |            3.64 | 01:05:46     |  3.3 | West Boundary          |
| 2018-08-17 |           184.2 |            1.89 | 00:39:00     |  2.9 | Tree Trail-extended    |
| 2018-08-18 |           242.9 |            2.53 | 00:51:25     |  3.0 | Tree Trail-extended    |
| 2018-08-30 |           204.4 |            1.95 | 00:37:35     |  3.1 | House-Power Line Route |
+------------+-----------------+-----------------+--------------+------+------------------------+
40 rows in set (0.00 sec)

The first window function I will look at is ROW_NUMBER().

Suppose I want a result set ordered by the burned_calories column for the month of 'July'.

Of course, I can retrieve that data with this query:

mysql> SELECT day_walked, burned_calories, trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE MONTHNAME(day_walked) = 'July'
    -> ORDER BY burned_calories DESC;
+------------+-----------------+------------------------+
| day_walked | burned_calories | trail_hiked            |
+------------+-----------------+------------------------+
| 2018-07-24 |           386.4 | West Boundary          |
| 2018-07-15 |           382.9 | House-Power Line Route |
| 2018-07-25 |           379.9 | West Boundary          |
| 2018-07-22 |           378.3 | West Boundary          |
| 2018-07-27 |           378.3 | West Boundary          |
| 2018-07-06 |           375.7 | West Boundary          |
| 2018-07-11 |           375.2 | West Boundary          |
| 2018-07-16 |           368.6 | West Boundary          |
| 2018-07-18 |           368.1 | West Boundary          |
| 2018-07-30 |           361.6 | West Boundary          |
| 2018-07-31 |           359.9 | West Boundary          |
| 2018-07-08 |           351.6 | West Boundary          |
| 2018-07-29 |           348.7 | West Boundary          |
| 2018-07-07 |           347.6 | Sandy Trail-Drive      |
| 2018-07-04 |           342.8 | West Boundary          |
| 2018-07-17 |           339.4 | West Boundary          |
| 2018-07-19 |           339.2 | West Boundary          |
| 2018-07-28 |           337.4 | Sandy Trail-Drive      |
| 2018-07-09 |           336.0 | West Boundary          |
| 2018-07-12 |           325.9 | West Boundary          |
| 2018-07-03 |           323.7 | West Boundary          |
| 2018-07-23 |           322.9 | West Boundary          |
+------------+-----------------+------------------------+
22 rows in set (0.01 sec)

Yet, for whatever reason (maybe personal satisfaction), I want to award a ranking among the returned rows beginning with 1 indicative of the highest burned_calories count, all the way to (n) rows in the result set.

ROW_NUMBER(), can handle this no problem at all:

mysql> SELECT day_walked, burned_calories,
    -> ROW_NUMBER() OVER(ORDER BY burned_calories DESC)
    -> AS position, trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE MONTHNAME(day_walked) = 'July';
+------------+-----------------+----------+------------------------+
| day_walked | burned_calories | position | trail_hiked            |
+------------+-----------------+----------+------------------------+
| 2018-07-24 |           386.4 |        1 | West Boundary          |
| 2018-07-15 |           382.9 |        2 | House-Power Line Route |
| 2018-07-25 |           379.9 |        3 | West Boundary          |
| 2018-07-22 |           378.3 |        4 | West Boundary          |
| 2018-07-27 |           378.3 |        5 | West Boundary          |
| 2018-07-06 |           375.7 |        6 | West Boundary          |
| 2018-07-11 |           375.2 |        7 | West Boundary          |
| 2018-07-16 |           368.6 |        8 | West Boundary          |
| 2018-07-18 |           368.1 |        9 | West Boundary          |
| 2018-07-30 |           361.6 |       10 | West Boundary          |
| 2018-07-31 |           359.9 |       11 | West Boundary          |
| 2018-07-08 |           351.6 |       12 | West Boundary          |
| 2018-07-29 |           348.7 |       13 | West Boundary          |
| 2018-07-07 |           347.6 |       14 | Sandy Trail-Drive      |
| 2018-07-04 |           342.8 |       15 | West Boundary          |
| 2018-07-17 |           339.4 |       16 | West Boundary          |
| 2018-07-19 |           339.2 |       17 | West Boundary          |
| 2018-07-28 |           337.4 |       18 | Sandy Trail-Drive      |
| 2018-07-09 |           336.0 |       19 | West Boundary          |
| 2018-07-12 |           325.9 |       20 | West Boundary          |
| 2018-07-03 |           323.7 |       21 | West Boundary          |
| 2018-07-23 |           322.9 |       22 | West Boundary          |
+------------+-----------------+----------+------------------------+
22 rows in set (0.00 sec)

You can see the row with burned_calories amount of 386.4 has position 1, while the row with value 322.9 has 22, which is the least (or lowest) amount among the returned rows set.

I'll use ROW_NUMBER() for something a bit more interesting as we progress. Only when I learned about it used in that context, did I truly realize some of its real power.

Up next, let's visit the RANK() window function to provide a different sort of 'ranking' among the rows. We will still target the burned_calories column value. And, while RANK() is similar to ROW_NUMBER() in that they somewhat rank rows, it does introduce a subtle difference in certain circumstances.

I will even further limit the number of rows as a whole by filtering any records not in the month of 'July' but targeting a specific trail:

mysql> SELECT day_walked, burned_calories,
    -> RANK() OVER(ORDER BY burned_calories DESC) AS position,
    -> trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE MONTHNAME(day_walked) = 'July'
    -> AND trail_hiked = 'West Boundary';
+------------+-----------------+----------+---------------+
| day_walked | burned_calories | position | trail_hiked   |
+------------+-----------------+----------+---------------+
| 2018-07-24 |           386.4 |        1 | West Boundary |
| 2018-07-25 |           379.9 |        2 | West Boundary |
| 2018-07-22 |           378.3 |        3 | West Boundary |
| 2018-07-27 |           378.3 |        3 | West Boundary |
| 2018-07-06 |           375.7 |        5 | West Boundary |
| 2018-07-11 |           375.2 |        6 | West Boundary |
| 2018-07-16 |           368.6 |        7 | West Boundary |
| 2018-07-18 |           368.1 |        8 | West Boundary |
| 2018-07-30 |           361.6 |        9 | West Boundary |
| 2018-07-31 |           359.9 |       10 | West Boundary |
| 2018-07-08 |           351.6 |       11 | West Boundary |
| 2018-07-29 |           348.7 |       12 | West Boundary |
| 2018-07-04 |           342.8 |       13 | West Boundary |
| 2018-07-17 |           339.4 |       14 | West Boundary |
| 2018-07-19 |           339.2 |       15 | West Boundary |
| 2018-07-09 |           336.0 |       16 | West Boundary |
| 2018-07-12 |           325.9 |       17 | West Boundary |
| 2018-07-03 |           323.7 |       18 | West Boundary |
| 2018-07-23 |           322.9 |       19 | West Boundary |
+------------+-----------------+----------+---------------+
19 rows in set (0.01 sec)

Notice anything odd here? Different from ROW_NUMBER()?

Check out the position value for those rows of '2018-07-22' and '2018-07-27'. They are in a tie at 3rd.

With good reason since the burned_calorie value of 378.3 is present in both rows.

How would ROW_NUMBER() rank them?

Let's find out:

mysql> SELECT day_walked, burned_calories,
    -> ROW_NUMBER() OVER(ORDER BY burned_calories DESC) AS position,
    -> trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE MONTHNAME(day_walked) = 'July'
    -> AND trail_hiked = 'West Boundary';
+------------+-----------------+----------+---------------+
| day_walked | burned_calories | position | trail_hiked   |
+------------+-----------------+----------+---------------+
| 2018-07-24 |           386.4 |        1 | West Boundary |
| 2018-07-25 |           379.9 |        2 | West Boundary |
| 2018-07-22 |           378.3 |        3 | West Boundary |
| 2018-07-27 |           378.3 |        4 | West Boundary |
| 2018-07-06 |           375.7 |        5 | West Boundary |
| 2018-07-11 |           375.2 |        6 | West Boundary |
| 2018-07-16 |           368.6 |        7 | West Boundary |
| 2018-07-18 |           368.1 |        8 | West Boundary |
| 2018-07-30 |           361.6 |        9 | West Boundary |
| 2018-07-31 |           359.9 |       10 | West Boundary |
| 2018-07-08 |           351.6 |       11 | West Boundary |
| 2018-07-29 |           348.7 |       12 | West Boundary |
| 2018-07-04 |           342.8 |       13 | West Boundary |
| 2018-07-17 |           339.4 |       14 | West Boundary |
| 2018-07-19 |           339.2 |       15 | West Boundary |
| 2018-07-09 |           336.0 |       16 | West Boundary |
| 2018-07-12 |           325.9 |       17 | West Boundary |
| 2018-07-03 |           323.7 |       18 | West Boundary |
| 2018-07-23 |           322.9 |       19 | West Boundary |
+------------+-----------------+----------+---------------+
19 rows in set (0.06 sec)

Hmmm...

No ties in the position column numbering this time.

But, who gets precedence?

To my knowledge, for a predictable ordering, you will likely have to determine it by some other additional means within the query (e.g. the time_walking column in this case?).

But we are not done yet with ranking options. Here is DENSE_RANK():

mysql> SELECT day_walked, burned_calories,
    -> DENSE_RANK() OVER(ORDER BY burned_calories DESC) AS position,
    -> trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE MONTHNAME(day_walked) = 'July'
    -> AND trail_hiked = 'West Boundary';
+------------+-----------------+----------+---------------+
| day_walked | burned_calories | position | trail_hiked   |
+------------+-----------------+----------+---------------+
| 2018-07-24 |           386.4 |        1 | West Boundary |
| 2018-07-25 |           379.9 |        2 | West Boundary |
| 2018-07-22 |           378.3 |        3 | West Boundary |
| 2018-07-27 |           378.3 |        3 | West Boundary |
| 2018-07-06 |           375.7 |        4 | West Boundary |
| 2018-07-11 |           375.2 |        5 | West Boundary |
| 2018-07-16 |           368.6 |        6 | West Boundary |
| 2018-07-18 |           368.1 |        7 | West Boundary |
| 2018-07-30 |           361.6 |        8 | West Boundary |
| 2018-07-31 |           359.9 |        9 | West Boundary |
| 2018-07-08 |           351.6 |       10 | West Boundary |
| 2018-07-29 |           348.7 |       11 | West Boundary |
| 2018-07-04 |           342.8 |       12 | West Boundary |
| 2018-07-17 |           339.4 |       13 | West Boundary |
| 2018-07-19 |           339.2 |       14 | West Boundary |
| 2018-07-09 |           336.0 |       15 | West Boundary |
| 2018-07-12 |           325.9 |       16 | West Boundary |
| 2018-07-03 |           323.7 |       17 | West Boundary |
| 2018-07-23 |           322.9 |       18 | West Boundary |
+------------+-----------------+----------+---------------+
19 rows in set (0.00 sec)

The tie remains, however, the numbering is different in where rows are counted, continuing through the remaining results.

Where RANK() began the count with 5 after the ties, DENSE_RANK() picks up at the next number, which is 4 in this instance, since the tie happened at row 3.

I'll be the first to admit, these various row ranking patterns are quite interesting, but, how can you use them for a meaningful result set?

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

A Bonus Thought

I have to give credit where credit is due. I learned so much about window functions from a wonderful series on YouTube and one video, in particular, inspired me for this next example. Please keep in mind although the examples in that series are demonstrated with a non-open-source database system (Don't toss the digital rotten fruits and veggies at me), there is a ton to learn from the videos overall.

I see a pattern in most of the query results so far I want to explore. I will not filter by any month nor trail.

What I want to know, are the consecutive days that I burned more than 350 calories. Better yet, groups of those days.

Here is the base query I will start with and build off from:

mysql> SELECT day_walked, burned_calories, 
    -> ROW_NUMBER() OVER(ORDER BY day_walked ASC) AS positional_bound, 
    -> trail_hiked 
    -> FROM vw_fav_shoe_stats 
    -> WHERE burned_calories > 350;
+------------+-----------------+------------------+------------------------+
| day_walked | burned_calories | positional_bound | trail_hiked            |
+------------+-----------------+------------------+------------------------+
| 2018-06-03 |           389.6 |                1 | Sandy Trail-Drive      |
| 2018-06-04 |           394.6 |                2 | Sandy Trail-Drive      |
| 2018-06-06 |           384.6 |                3 | Sandy Trail-Drive      |
| 2018-06-07 |           382.7 |                4 | Sandy Trail-Drive      |
| 2018-06-24 |           392.4 |                5 | House-Power Line Route |
| 2018-06-25 |           362.1 |                6 | West Boundary          |
| 2018-06-26 |           380.5 |                7 | West Boundary          |
| 2018-07-06 |           375.7 |                8 | West Boundary          |
| 2018-07-08 |           351.6 |                9 | West Boundary          |
| 2018-07-11 |           375.2 |               10 | West Boundary          |
| 2018-07-15 |           382.9 |               11 | House-Power Line Route |
| 2018-07-16 |           368.6 |               12 | West Boundary          |
| 2018-07-18 |           368.1 |               13 | West Boundary          |
| 2018-07-22 |           378.3 |               14 | West Boundary          |
| 2018-07-24 |           386.4 |               15 | West Boundary          |
| 2018-07-25 |           379.9 |               16 | West Boundary          |
| 2018-07-27 |           378.3 |               17 | West Boundary          |
| 2018-07-30 |           361.6 |               18 | West Boundary          |
| 2018-07-31 |           359.9 |               19 | West Boundary          |
| 2018-08-06 |           357.7 |               20 | West Boundary          |
+------------+-----------------+------------------+------------------------+
20 rows in set (0.00 sec)

We've seen ROW_NUMBER() already, however now it really comes into play.

To make this work (in MySQL at least) I had to use the DATE_SUB() function since essentially, with this technique we are subtracting a number - the value provided by ROW_NUMBER() from the day_walked date column of the same row, which in turn, provides a date itself via the calculation:

mysql> SELECT day_walked AS day_of_walk,
    -> DATE_SUB(day_walked, INTERVAL ROW_NUMBER() OVER(ORDER BY day_walked ASC) DAY) AS positional_bound,
    -> burned_calories,
    -> trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE burned_calories > 350;
+-------------+------------------+-----------------+------------------------+
| day_of_walk | positional_bound | burned_calories | trail_hiked            |
+-------------+------------------+-----------------+------------------------+
| 2018-06-03  | 2018-06-02       |           389.6 | Sandy Trail-Drive      |
| 2018-06-04  | 2018-06-02       |           394.6 | Sandy Trail-Drive      |
| 2018-06-06  | 2018-06-03       |           384.6 | Sandy Trail-Drive      |
| 2018-06-07  | 2018-06-03       |           382.7 | Sandy Trail-Drive      |
| 2018-06-24  | 2018-06-19       |           392.4 | House-Power Line Route |
| 2018-06-25  | 2018-06-19       |           362.1 | West Boundary          |
| 2018-06-26  | 2018-06-19       |           380.5 | West Boundary          |
| 2018-07-06  | 2018-06-28       |           375.7 | West Boundary          |
| 2018-07-08  | 2018-06-29       |           351.6 | West Boundary          |
| 2018-07-11  | 2018-07-01       |           375.2 | West Boundary          |
| 2018-07-15  | 2018-07-04       |           382.9 | House-Power Line Route |
| 2018-07-16  | 2018-07-04       |           368.6 | West Boundary          |
| 2018-07-18  | 2018-07-05       |           368.1 | West Boundary          |
| 2018-07-22  | 2018-07-08       |           378.3 | West Boundary          |
| 2018-07-24  | 2018-07-09       |           386.4 | West Boundary          |
| 2018-07-25  | 2018-07-09       |           379.9 | West Boundary          |
| 2018-07-27  | 2018-07-10       |           378.3 | West Boundary          |
| 2018-07-30  | 2018-07-12       |           361.6 | West Boundary          |
| 2018-07-31  | 2018-07-12       |           359.9 | West Boundary          |
| 2018-08-06  | 2018-07-17       |           357.7 | West Boundary          |
+-------------+------------------+-----------------+------------------------+
20 rows in set (0.00 sec)

However, without DATE_SUB(), you wind up with this (or at least I did):

mysql> SELECT day_walked AS day_of_walk,
    -> day_walked - ROW_NUMBER() OVER(ORDER BY day_walked ASC) AS positional_bound,
    -> burned_calories,
    -> trail_hiked
    -> FROM vw_fav_shoe_stats
    -> WHERE burned_calories > 350;
+-------------+------------------+-----------------+------------------------+
| day_of_walk | positional_bound | burned_calories | trail_hiked            |
+-------------+------------------+-----------------+------------------------+
| 2018-06-03  |         20180602 |           389.6 | Sandy Trail-Drive      |
| 2018-06-04  |         20180602 |           394.6 | Sandy Trail-Drive      |
| 2018-06-06  |         20180603 |           384.6 | Sandy Trail-Drive      |
| 2018-06-07  |         20180603 |           382.7 | Sandy Trail-Drive      |
| 2018-06-24  |         20180619 |           392.4 | House-Power Line Route |
| 2018-06-25  |         20180619 |           362.1 | West Boundary          |
| 2018-06-26  |         20180619 |           380.5 | West Boundary          |
| 2018-07-06  |         20180698 |           375.7 | West Boundary          |
| 2018-07-08  |         20180699 |           351.6 | West Boundary          |
| 2018-07-11  |         20180701 |           375.2 | West Boundary          |
| 2018-07-15  |         20180704 |           382.9 | House-Power Line Route |
| 2018-07-16  |         20180704 |           368.6 | West Boundary          |
| 2018-07-18  |         20180705 |           368.1 | West Boundary          |
| 2018-07-22  |         20180708 |           378.3 | West Boundary          |
| 2018-07-24  |         20180709 |           386.4 | West Boundary          |
| 2018-07-25  |         20180709 |           379.9 | West Boundary          |
| 2018-07-27  |         20180710 |           378.3 | West Boundary          |
| 2018-07-30  |         20180712 |           361.6 | West Boundary          |
| 2018-07-31  |         20180712 |           359.9 | West Boundary          |
| 2018-08-06  |         20180786 |           357.7 | West Boundary          |
+-------------+------------------+-----------------+------------------------+
20 rows in set (0.04 sec)

Hey, that doesn't look so bad really.

What gives?

Eh, the row with a positional_bound value of '20180698'...

Wait a minute, this is supposed to calculate a date value by subtracting the number ROW_NUMBER() provides from the day_of_walk column.

Correct.

I don't know about you, but I am not aware of a month with 98 days!

But, if there is one, bring on the extra paychecks!

All fun aside, this obviously was incorrect and prompted me to (eventually) use DATE_SUB(), which provides a correct, results set then allowing me to run this query:

mysql> SELECT MIN(t.day_of_walk), 
    -> MAX(t.day_of_walk),
    -> COUNT(*) AS num_of_hikes
    -> FROM (SELECT day_walked AS day_of_walk,
    -> DATE_SUB(day_walked, INTERVAL ROW_NUMBER() OVER(ORDER BY day_walked ASC) DAY) AS positional_bound
    -> FROM vw_fav_shoe_stats
    -> WHERE burned_calories > 350) AS t
    -> GROUP BY t.positional_bound
    -> ORDER BY 1;
+--------------------+--------------------+--------------+
| MIN(t.day_of_walk) | MAX(t.day_of_walk) | num_of_hikes |
+--------------------+--------------------+--------------+
| 2018-06-03         | 2018-06-04         |            2 |
| 2018-06-06         | 2018-06-07         |            2 |
| 2018-06-24         | 2018-06-26         |            3 |
| 2018-07-06         | 2018-07-06         |            1 |
| 2018-07-08         | 2018-07-08         |            1 |
| 2018-07-11         | 2018-07-11         |            1 |
| 2018-07-15         | 2018-07-16         |            2 |
| 2018-07-18         | 2018-07-18         |            1 |
| 2018-07-22         | 2018-07-22         |            1 |
| 2018-07-24         | 2018-07-25         |            2 |
| 2018-07-27         | 2018-07-27         |            1 |
| 2018-07-30         | 2018-07-31         |            2 |
| 2018-08-06         | 2018-08-06         |            1 |
+--------------------+--------------------+--------------+
13 rows in set (0.12 sec)

Basically, I have wrapped the results set provided from that analytical query, in the form of a Derived Table, and queried it for: a start and end date, a count of what I have labeled num_of_hikes, then grouped on the positional_bound column, ultimately providing sets of groups of consecutive days where I burned more than 350 calories.

You can see in the date range of 2018-06-24 to 2018-06-26, resulted in 3 consecutive days meeting the calorie burned criteria of 350 in the WHERE clause.

Not too bad if I don't say so myself, but definitely a record I want to try and best!

Conclusion

Window functions are in a world and league of their own. I have not even scratched the surface of them, having only covered 3 of them in a 'high-level' introductory and perhaps, trivial sense. However, hopefully, through this post, you find that you can query for quite interesting and potentially insightful data with a 'bare minimal' use of them.

Thank you for reading.

Tags:

MySQL

analytics

window functions

↧

How to Migrate from MSSQL to MySQL

March 4, 2019, 12:24 pm

≫ Next: Migration from Oracle Database to MariaDB - A Deep Dive

≪ Previous: A Review of the New Analytic Window Functions in MySQL 8.0

Migrating from proprietary engines into open source engines is a trend that is growing in the industry.

However, database migration is not something to be taken lightly.

In this blog, let’s see what is needed to move from Microsoft SQL Server to MySQL Server and how to do it.

So, let’s start by reviewing what MS SQL is and what MySQL is.

Microsoft SQL Server is a very popular RDBMS with restrictive licensing and modest cost of ownership if the database is of significant size, or is used by a significant number of clients. It provides a very user-friendly and easy to learn interface, which has resulted in a large installed user base. Like other RDBMS software, MS SQL Server is built on top of SQL, a standardized programming language that database administrators (DBAs) and other IT professionals use to manage databases and query the data they contain. SQL Server is tied to Transact-SQL (T-SQL), an implementation of SQL from Microsoft that adds a set of proprietary programming extensions to the standard language.

MySQL is an Oracle-backed open source relational database management system based on SQL.

It's the second most popular database in the world according to db-engines ranking and probably the most present database backend on the planet as it runs most of the internet services around the globe.

MySQL runs on virtually all platforms, including Linux, UNIX, and Windows. It’s an important component of an open source enterprise stack called LAMP. The MySQL Enterprise version comes with support and additional features for security and high availability.

https://db-engines.com/en/ranking

The combination of cost-savings, platform compatibility, and feature set of MySQL makes it really appealing, and many organizations are migrating from MS SQL Server into this open-source platform to take advantage of these features.

Why Migrate?

Usually, the first reason to migrate is the cost. SQL Server is a proprietary database from Microsoft. There is a free SQL Server version called Express, but it has some limitations like 10GB of database limit, a limited amount of CPU, a limited amount of RAM, and more, so probably you need to pay the license to use it in production. You can check the pricing here.

With MySQL, you can use the community edition for free and without any limitation.

Another reason could be the operating system support. Unlike MS SQL Server, MySQL supports a wide range of Operating Systems including Linux, Windows, Mac OS, Solaris and many more.

Regarding installation and configuration, MySQL installs faster, has a smaller footprint while still being able to manage fairly large databases, and has less configuration knobs that need tuning than SQL Server.

In the area of high availability, MySQL has a number of proven solutions including replication, SANs, and MySQL Cluster, which equal or best SQL Server depending on the scenario.

The great MySQL Community provides many benefits including a great developer and DBA network of everyone working together to help ensure a high-quality product and each other’s success.

What You Should Know

Moving data and index structures over to MySQL isn’t typically a challenging task as MySQL supports all the important data types, table designs, and index structures. Anyhow, there are some objects that will face some challenges. Code related objects, like stored procedures, can be using non-standard ANSI features, as Transact-SQL has many of them.

So, the following items will need special attention when migrating:

Assemblies
Types
DDL and statement-based triggers (MySQL has row-based triggers)
Proprietary SQL Server function calls
Certain cases of dynamic T-SQL

In the same way, Synonyms and Security Roles will need a workaround as they cannot be directly migrated into MySQL.

Datatypes Requiring Conversion

The following map can be used to convert SQL Server data types that don’t map in 1-to-1 relationship to MySQL:

SQL Server	MySQL
IDENTITY	AUTO_INCREMENT
NTEXT, NATIONAL TEXT	TEXT CHARACTER SET UTF8
SMALLDATETIME	DATETIME
MONEY	DECIMAL(19,4)
SMALL MONEY	DECIMAL(10,4)
UNIQUEIDENTIFIER	BINARY(16)
SYSNAME	CHAR(256)

How to do it

There are many tools to perform the migration from MS SQL Server to MySQL like Amazon DMS or Data Integration (Kettle), but in this case, we’ll use the MySQL Workbench Migration tool.

This tool is designed to save DBA and developer time by providing visual, point and click ease of use around all phases of configuring and managing a complex migration process:

Database migrations: Allows migrations from Microsoft SQL Server, Microsoft Access, PostgreSQL, Sybase ASE, Sybase SQL Anywhere, SQLite, and more.
Manage Migration Projects: Allows migrations to be configured, copied, edited, executed and scheduled.
Source and Target selection: Allows users to define specific data sources and to analyze source data in advance of the migration.
Object migration: Allows users to select objects to migrate, assign a source to target mappings where needed, edit migration scripts and create the target schema.
Version Upgrades: Using migration, users can easily move databases off older MySQL versions to the latest.

So, let’s do it.

For this task, we’re assuming you have:

SQL Server installed with your database to migrate: We’ll use the Northwind sample database over MS SQL Server Express 2014 Edition.
MySQL Server installed: We have MySQL 5.7.25 Community Edition over CentOS.
Users on both database servers with privileges to perform the task: We have the user “sa” on SQL Server and the user “migration” with all privileges on MySQL.
MySQL Workbench installed with access to both database servers: We’ll use MySQL Workbench 6.3.

To start the migration process, on the MySQL Workbench main screen, go to Database-> Migration Wizard.

We should check the prerequisites to confirm if we can continue the task. If everything looks fine, we can press on Start Migration.

In this step, we need to provide the information about the source database, in this case, SQL Server.

We’ll configure our source parameter as you can see in the previous image:

Database System: Microsoft SQL Server
Connection Method: ODBC (Native)
Driver: SQL Server
Server: localhost 
Username: sa

About the Server parameter, we’re running MySQL Workbench on the SQL Server node, but probably you’ll use the IP Address / Hostname of your database server.

Now, we can check the connection by using the Test Connection button.

Then, we need to add the target parameters, in this case, MySQL Server:

Connection Method: Standard (TCP/IP)
Hostname: 192.168.100.192
Port: 3306
Username: migration

And press on Test Connection to confirm the added information.

In the next step, MySQL Workbench will connect to our SQL Server to fetch a list of the catalogs and schemas.

Now, we’ll choose the Northwind sample database from the list.

We can choose how the reverse engineered schemas and object should be mapped. We’ll use Catalog.Schema.Table -> Catalog.Table option, so in our MySQL, we’ll have a database called Northwind, and the current tables that we have in our SQL Server database.

If everything went fine, we’ll have a list of objects to be migrated.

In this case, we have Table Objects, View Objects and Routine Objects. We’ll only select the Table Objects because for the rest of the object we should check the corresponding MySQL equivalent code manually.

In this step, the objects from the source are converted into MySQL compatible objects.

If everything went fine, we can continue by selecting how we want to create the migrated schema in the target. We’ll use the default “Create schema in target RDBMS” option.

Now, let’s check the creation schema process.

In the next step, we can check the result of each script execution, and we can check the new database created on our MySQL Server.

In our MySQL Server, we have:

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| NORTHWND           |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.01 sec)

At this point, we’ll have the database structure, but we don’t have the data yet. Now, we’ll select how we want to copy the data in the MySQL Server. We’ll use the “Online copy of table data to target RDBMS” option.

And we can monitor the copy process from the MySQL Workbench application.

At this point, we have all the information migrated to our MySQL Server.

mysql> SELECT * FROM NORTHWND.Region;
+----------+-------------------+
| RegionID | RegionDescription |
+----------+-------------------+
|        1 | Eastern           |
|        2 | Western           |
|        3 | Northern          |
|        4 | Southern          |
+----------+-------------------+
4 rows in set (0.00 sec)

In the last step, we can check the migration report and finish the task.

The migration is done!

Testing

Before the migration process, you should test the application and the MySQL database to know the behavior with the new engine.

It should also be useful to perform a benchmark test to validate the performance before the migration.

There are some tips to take into account:

The test should simulate the number of user connections that are expected.
The connected sessions should perform tasks as they would occur during a normal day.
You should load your database with test data that is approximately the size you expect your database to be in the real world.

For this test task, you can use the mysqlslap tool. It’s a diagnostic program designed to emulate client load for a MySQL Server and to report the timing of each stage.

Conclusion

As we have reviewed in this blog, there are several reasons that can make a business decide a database migration, going from a proprietary engine into an open source one. We have seen here a popular use case, a migration from SQL Server into MySQL, and made a step by step example by using one widely known MySQL tool, the MySQL Workbench. We hope you find this article useful.

Tags:

mssql

sql server

microsoft sql server

↧

Migration from Oracle Database to MariaDB - A Deep Dive

March 4, 2019, 11:30 pm

≫ Next: Dealing with Unreliable Networks When Crafting an HA Solution for MySQL or MariaDB

≪ Previous: How to Migrate from MSSQL to MySQL

In previous blogs, we discussed the topic of How to Migrate from Oracle to MySQL / Percona Server and most recently Migrating from Oracle Database to MariaDB - What You Should Know.

Over the years and as new versions of MySQL and MariaDB were released, both projects have deviated entirely into two very different RDBMS platforms.

MariaDB and MySQL now diverge from each other significantly, especially with the arrival of their most recent versions: MySQL 8.0 and MariaDB 10.3 GA and its 10.4 (currently RC candidate).

With the release MariaDB TX 3.0, MariaDB surprised many since it is no longer a drop-in replacement for MySQL. It introduces a new level of compatibility with Oracle database and is now becoming a real alternative to Oracle as well as other enterprise and proprietary databases such as IBM DB2 or EnterpriseDB.

Starting with MariaDB version 10.3, significant features have been introduced such as system-versioned tables and, what's most appealing for Oracle DBA's, support for PL/SQL!

According to the MariaDB website, approximately 80% of the legacy Oracle PL/SQL can be migrated without rewriting the code. MariaDB also has ColumnStore, which is their new analytics engine and a columnar storage engine designed for distributed, massively parallel processing (MPP), such as for big data analytics.

The MariaDB team have worked hard for the added support for PL/SQL. It adds extra ease when migrating to MariaDB from Oracle. As a reference point for your planned migration, you can check the following reference from MariaDB. As per our previous blog, this will not cover the overall process of migration, as it is a long process. But it will hopefully provide enough background information to serve as a guide for your migration process.

Planning and Development Strategy

For the DBA, migrating from Oracle database going to MariaDB, such a migration means a lot of similar factors that shouldn’t be too difficult to shift and adapt to. MariaDB can be operated in Windows server and does have binaries available for Windows platform for downloads. If you are using Oracle for OLAP (Online Analytical Processing) or business intelligence, MariaDB also has the ColumnStore, which is the equivalent of Oracle's Database In-Memory column store.

If you’re used to having an Oracle architecture having MAA (Maximum Available Architecture) with Data Guard ++ Oracle RAC (Real Application Cluster), same as MySQL/Percona Server, in MariaDB, you can choose from a synchronous replication, semi-sync, or an asynchronous replication.

For a highly available solution, MariaDB has Maxscale as your main option you can use. You can mix MaxScale with Keepalived and HAProxy. ClusterControl for example can manage this efficiently and even with the new arrival of MariaDB's product, MariaDB TX. See our previous blog to learn more on how ClusterControl can efficiently manage this.

With MariaDB being an open source technology, this question be considered: "How do we get support?"

You need to make sure when choosing a support option that it isn’t limited to the database but it should cover expertise in scalability, redundancy, resiliency, backups, high-availability, security, monitoring/observability, recovery and engaging on mission critical systems. Overall, the support offering you choose needs to come with an understanding of your architectural setup without exposing confidentiality of your data.

Additionally, MariaDB has a very large and collaborative community world wide. If you experience problems and want to ask people involved in this community, you can try on Freenode via IRC client (Internet Relay Chat), go to their community page, or join their mailing list.

Assessment or Preliminary Check

Backing up your data including configurations or setup files, kernel tunings, automation scripts need to be considered: it's an obvious task, but before you migrate, always secure everything first , especially when moving to a different platform.

You must assess as well that your applications are following up-to-date software engineering conventions and ensure that they are platform agnostic. These practices can be to your benefit especially when moving to a different database platform.

Since MariaDB is an open-source technology, make sure you know what the available connectors are that are available in MariaDB. This is pretty straight-forward right now as there are various available client-libraries. Check here for a list of these client libraries. Aside from that, you can check as well this list of available Clients and Utilities page.

Lastly, make sure of your hardware requirements.

MariaDB doesn't have specific requirements: a typical commodity server can work but that depends on how much performance you require. However, if you are engaged with ColumnStore for your analytical applications or data warehouse applications, check out their documentation. Taken from their page, for AWS, they have tested this generally using m4.4xlarge instance types as a cost effective middle ground. The R4.8xlarge has also been tested and performs about twice as fast for about twice the price.

What You Should Know

Same as MySQL, in MariaDB, you can create multiple databases whereas Oracle does not come with that same functionality.

In MariaDB, a schema is synonymous with a database. You can substitute the keyword SCHEMA instead of DATABASE in the MariaDB SQL syntax. For example, using CREATE SCHEMA instead of CREATE DATABASE; whilst Oracle has a distinction for this. A schema represents only a part of a database: the tables and other objects owned by a single user. Normally, there is a one-to-one relationship between the instance and the database.

For example, in a replication setup equivalent in Oracle (e.g. Real Application Clusters or RAC), you have your multiple instances accessing a single database. This lets you start Oracle on multiple servers, all accessing the same data. However, in MariaDB, you can allow access to multiple databases from your multiple instances and can even filter out which databases/schema you can replicate to a MariaDB node.

Referencing from one of our previous blogs (this and this), the same principle applies when speaking of converting your database with available tools found on the internet.

There is no such tool that can 100% convert Oracle database into MariaDB,though MariaDB has Red Rover Migration Practice ;this is a service that MariaDB offers and it's not free.

MariaDB talks about migration at Development Bank of Singapore (DBS), as a result of its collaboration with MariaDB on Oracle compatibility. It has been able to migrate more than 50 percent of its mission-critical applications in just 12 months from Oracle Database to MariaDB.

But if you are looking for some tools, sqlines tools, which are SQLines SQL Converter and SQLines Data Tool offer a simple yet operational set of tools.

The following sections below further outline the things that you must be aware of when it comes to migration and verifying the logical SQL result.

Data Type Mapping

MySQL and MariaDB share the same data types available. Although there are variations as to how it is implemented, you can check for the list of data types in MariaDB here.

While MySQL uses the JSON data-type, MariaDB differs as it's just an alias of LONGTEXT data type. MariaDB has also a function, JSON_VALID, which can be used within the CHECK constraint expression.

Hence, I'll make use of this tabular presentation below based on the information here, since data-types from MySQL against MariaDB don’t deviate so much, but I have added changes as the ROW data type has been introduced in MariaDB 10.3.0 as part of the PL/SQL compatibility feature.

Check out the table below:

Oracle			MySQL
1	BFILE	Pointer to binary file, ⇐ 4G	VARCHAR(255)
2	BINARY_FLOAT	32-bit floating-point number	FLOAT
3	BINARY_DOUBLE	64-bit floating-point number	DOUBLE
4	BLOB	Binary large object, ⇐ 4G	LONGBLOB
5	CHAR(n), CHARACTER(n)	Fixed-length string, 1 ⇐ n ⇐ 255	CHAR(n), CHARACTER(n)
6	CHAR(n), CHARACTER(n)	Fixed-length string, 256 ⇐ n ⇐ 2000	VARCHAR(n)
7	CLOB	Character large object, ⇐ 4G	LONGTEXT
8	DATE	Date and time	DATETIME
9	DECIMAL(p,s), DEC(p,s)	Fixed-point number	DECIMAL(p,s), DEC(p,s)
10	DOUBLE PRECISION	Floating-point number	DOUBLE PRECISION
11	FLOAT(p)	Floating-point number	DOUBLE
12	INTEGER, INT	38 digits integer	INT	DECIMAL(38)
13	INTERVAL YEAR(p) TO MONTH	Date interval	VARCHAR(30)
14	INTERVAL DAY(p) TO SECOND(s)	Day and time interval	VARCHAR(30)
15	LONG	Character data, ⇐ 2G	LONGTEXT
16	LONG RAW	Binary data, ⇐ 2G	LONGBLOB
17	NCHAR(n)	Fixed-length UTF-8 string, 1 ⇐ n ⇐ 255	NCHAR(n)
18	NCHAR(n)	Fixed-length UTF-8 string, 256 ⇐ n ⇐ 2000	NVARCHAR(n)
19	NCHAR VARYING(n)	Varying-length UTF-8 string, 1 ⇐ n ⇐ 4000	NCHAR VARYING(n)
20	NCLOB	Variable-length Unicode string, ⇐ 4G	NVARCHAR(max)
21	NUMBER(p,0), NUMBER(p)	8-bit integer, 1 <= p < 3	TINYINT	(0 to 255)
		16-bit integer, 3 <= p < 5	SMALLINT
		32-bit integer, 5 <= p < 9	INT
		64-bit integer, 9 <= p < 19	BIGINT
		Fixed-point number, 19 <= p <= 38	DECIMAL(p)
22	NUMBER(p,s)	Fixed-point number, s > 0	DECIMAL(p,s)
23	NUMBER, NUMBER(*)	Floating-point number	DOUBLE
24	NUMERIC(p,s)	Fixed-point number	NUMERIC(p,s)
25	NVARCHAR2(n)	Variable-length UTF-8 string, 1 ⇐ n ⇐ 4000	NVARCHAR(n)
26	RAW(n)	Variable-length binary string, 1 ⇐ n ⇐ 255	BINARY(n)
27	RAW(n)	Variable-length binary string, 256 ⇐ n ⇐ 2000	VARBINARY(n)
28	REAL	Floating-point number	DOUBLE
29	ROWID	Physical row address	CHAR(10) Hence, for PL/SQL compatibility, you can use ROW (<field name> <data type> [{, <field name> <data type>}... ])
30	SMALLINT	38 digits integer	DECIMAL(38)
31	TIMESTAMP(p)	Date and time with fraction	DATETIME(p)
32	TIMESTAMP(p) WITH TIME ZONE	Date and time with fraction and time zone	DATETIME(p)
33	UROWID(n)	Logical row addresses, 1 ⇐ n ⇐ 4000	VARCHAR(n)
34	VARCHAR(n)	Variable-length string, 1 ⇐ n ⇐ 4000	VARCHAR(n)
35	VARCHAR2(n)	Variable-length string, 1 ⇐ n ⇐ 4000	VARCHAR(n)
36	XMLTYPE	XML data	LONGTEXT

Data type attributes and options:

Oracle	MySQL
BYTE and CHAR column size semantics	Size is always in characters

Transactions

MariaDB uses XtraDB from previous versions until 10.1 and shifted to InnoDB from version 10.2 onwards; though various storage engines can be an alternative choice for handling transactions such as the MyRocks storage engine.

By default, MariaDB has the autocommit variable set to ON which means that you have to explicitly handle transactional statements to take advantage of ROLLBACK for ignoring changes or taking advantage of using SAVEPOINT.

It's basically the same concept that Oracle uses in terms of commit, rollbacks and savepoints.

For explicit transactions, this means that you have to use the START TRANSACTION/BEGIN; <SQL STATEMENTS>; COMMIT; syntax.

Otherwise, if you have to disable autocommit, you have to explicitly COMMIT all the time for your statements that require changes to your data.

Dual Table

MariaDB has the dual compatibility with Oracle which is meant for compatibility of databases using a dummy table, namely DUAL. It operates just the same as MySQL where the FROM clause is not mandatory, so the DUAL table is not necessary. However, the DUAL table does not work exactly the same way as it does for Oracle, but for simple SELECT's in MariaDB, this is fine.

This suits Oracle's usage of DUAL so any existing statements in your application that use DUAL might require no changes upon migration to MariaDB.

The Oracle FROM clause is mandatory for every SELECT statement, so Oracle database uses DUAL table for SELECT statement where a table name is not required.

See the following example below:

In Oracle:

SQL> DESC DUAL;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DUMMY                                              VARCHAR2(1)

SQL> SELECT CURRENT_TIMESTAMP FROM DUAL;
CURRENT_TIMESTAMP
---------------------------------------------------------------------------
16-FEB-19 04.16.18.910331 AM +08:00

But in MariaDB:

MariaDB [test]> DESC DUAL;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'DUAL' at line 1
MariaDB [test]> SELECT CURRENT_TIMESTAMP FROM DUAL;
+---------------------+
| CURRENT_TIMESTAMP   |
+---------------------+
| 2019-02-27 04:11:01 |
+---------------------+
1 row in set (0.000 sec)

Note: the DESC DUAL syntax does not work in MariaDB and the results as well differ as CURRENT_TIMESTAMP (uses TIMESTAMP data type) in MySQL does not include the timezone.

SYSDATE

Oracle's SYSDATE function is almost the same in MariaDB.

MariaDB returns date and time and it’s a function that requires () (close and open parenthesis with no arguments required. To demonstrate this below, here's Oracle and MariaDB on using SYSDATE.

In Oracle, using plain SYSDATE just returns the date of the day without the time. But to get the time and date, use TO_CHAR to convert the date time into its desired format; whereas in MariaDB, you might not need it to get the date and the time as it returns both.

See example below.

In Oracle:

SQL> SELECT TO_CHAR (SYSDATE, 'MM-DD-YYYY HH24:MI:SS') "NOW" FROM DUAL;
NOW
-------------------
02-16-2019 04:39:00

SQL> SELECT SYSDATE FROM DUAL;

SYSDATE
---------
16-FEB-19

But in MariaDB:

MariaDB [test]> SELECT SYSDATE() FROM DUAL;
+---------------------+
| SYSDATE()           |
+---------------------+
| 2019-02-27 04:11:57 |
+---------------------+
1 row in set (0.000 sec)

If you want to format the date, MariaDB has a DATE_FORMAT() function.

You can check the MariaDB's Date and Time documentation for more information.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

TO_DATE

Oracle's TO_DATE equivalent in MariaDB is the STR_TO_DATE() function.

It’s almost identical to the one in Oracle: it returns the DATE data type, while in MariaDB it returns the DATETIME data type.

Oracle:

SQL> SELECT TO_DATE ('20190218121212','yyyymmddhh24miss') as "NOW" FROM DUAL; 
NOW
-------------------------
18-FEB-19

MariaDB:

MariaDB [test]> SELECT STR_TO_DATE('2019-02-18 12:12:12','%Y-%m-%d %H:%i:%s') as "NOW" FROM DUAL;
+---------------------+
| NOW                 |
+---------------------+
| 2019-02-18 12:12:12 |
+---------------------+
1 row in set (0.000 sec)

SYNONYM

MariaDB does not have an equivalent functionality to this yet. Currently, based on their Jira ticket MDEV-16482 , this feature request to add SYNONYM is still open and no sign yet of progress as of this time. We're hoping that this will be incorporated in the future release. However, a possible alternative could be using VIEW.

Although SYNONYM in Oracle can be used to create an alias of a remote table,

e.g.

CREATE PUBLIC SYNONYM emp_table FOR hr.employees@remote.us.oracle.com

In MariaDB, you can take advantage of using the CONNECT storage engine which is more powerful than the FederatedX storage engine is, as it allows you to connect various database sources. You can check out this short video presentation.

There's a good example in the MariaDB's manual page,which I will not reiterate here as there are certain considerations you have to meet especially when using ODBC. Please refer to the manual.

Behaviour of Empty String and NULL

Take note that in MariaDB, empty string is not NULL whereas Oracle treats empty string as null values.

In Oracle:

SQL> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual;
Nul
---
Yes

In MariaDB:

MariaDB [test]> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual;
+-----------+
| Null Eval |
+-----------+
| No        |
+-----------+
1 row in set (0.001 sec)

Sequences

Since MariaDB 10.3, Oracle-compatible sequences and a stored procedure language compliant with Oracle PL/SQL has been introduced. In MariaDB, creating a sequence is pretty similar to Oracle's SEQUENCE.

MariaDB's example:

CREATE SEQUENCE s START WITH 100 INCREMENT BY 10;
CREATE SEQUENCE s2 START WITH -100 INCREMENT BY -10;

and specifying workable minimum and maximum values shows as follows

CREATE SEQUENCE s3 START WITH -100 INCREMENT BY 10 MINVALUE=-100 MAXVALUE=1000;

Character String Functions

MariaDB, same as MySQL, also has a handful of string functions which is too long to discuss it here one-by-one. Hence, can check the documentation from here and compare this against Oracle's string functions.

DML Statements

Insert/Update/Delete statements from Oracle are congruous in MariaDB.

Oracle's INSERT ALL/INSERT FIRST is not supported in MariaDB and no one yet opened this feature request in their Jira (that I know of).

Otherwise, you’d need to state your MySQL queries one-by-one.

e.g.

In Oracle:

SQL> INSERT ALL
  INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City')
  INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City')
SELECT * FROM dual;
2 rows created.

But in MariaDB, you have to run the insert one at a time:

MariaDB [test]> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City');
Query OK, 1 row affected (0.02 sec)
MariaDB [test]> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City');
Query OK, 1 row affected (0.00 sec)

The INSERT ALL/INSERT FIRST doesn’t compare to how it is used in Oracle, where you can take advantage of conditions by adding a WHEN keyword in your syntax; there's no equivalent option as of this time in MariaDB.

Hence, your alternative solution on this is to use procedures.

Outer Joins "+" Symbol

Currently, for compatibility, it's not yet present in MariaDB. Hence, there are plenty of Jira tickets I have found in MariaDB but this one is much more precise in terms of feature request. Hence, your alternative choice for this time is to use JOIN syntax. Please check the documentation for more info about this.

START WITH..CONNECT BY

Oracle uses START WITH..CONNECT BY for hierarchical queries.

Starting MariaDB 10.2, they introduced CTE (Common Table Expression) which is designed to support generations of hierarchical data results, which use models such as adjacency lists or nested set models.

Similar to PostgreSQL and MySQL, MariaDB uses non-recursive and recursive CTE's.

For example, a simple non-recursive which is used to compare individuals against their group:

WITH sales_product_year AS (
SELECT product,
YEAR(ship_date) AS year,
SUM(price) AS total_amt
FROM item_sales
GROUP BY product, year
)

SELECT * 
FROM sales_product_year S1
WHERE
total_amt > 
    (SELECT 0.1 * SUM(total_amt)
     FROM sales_product_year S2
     WHERE S2.year = S1.year)

while a recursive CTE (example: return the bus destinations with New York as the origin)

WITH RECURSIVE bus_dst as ( 
    SELECT origin as dst FROM bus_routes WHERE origin='New York' 
  UNION
    SELECT bus_routes.dst FROM bus_routes, bus_dst WHERE bus_dst.dst= bus_routes.origin 
) 
SELECT * FROM bus_dst;

PL/SQL in MariaDB?

Previously, in our blog about "Migrating from Oracle Database to MariaDB - What You Should Know", we showcased how powerful it is now in MariaDB adding its compliance to adopt PL/SQL as part of its database kernel. Whenever you use PL/SQL compatibility in MariaDB, make sure you have set SQL_MODE = 'Oracle' just like as follows:

SET SQL_MODE='ORACLE';

The new compatibility mode helps with the following syntax:

Loop Syntax
Variable Declaration
Non-ANSI Stored Procedure Construct
Cursor Syntax
Stored Procedure Parameters
Data Type Inheritance (%TYPE, %ROWTYPE)
PL/SQL Style Exceptions
Synonyms for Basic SQL Types (VARCHAR2, NUMBER, …)

For example, in Oracle, you can create a package, which is a schema object that groups logically related PL/SQL types, variables, and subprograms. Hence, in MariaDB, you can do it just like below:

MariaDB [test]> CREATE OR REPLACE PACKAGE BODY hello AS
    -> 
    ->   vString VARCHAR2(255) := NULL;
    -> 
    ->   -- was declared public in PACKAGE
    ->   PROCEDURE helloFromS9s(pString VARCHAR2) AS
    ->   BEGIN
    ->     SELECT 'Severalnines showing MariaDB Package Procedure in ' || pString || '!' INTO vString FROM dual;
    ->     SELECT vString;
    ->   END;
    -> 
    -> BEGIN
    ->   SELECT 'called only once per connection!';
    -> END hello;
    -> /
Query OK, 0 rows affected (0.021 sec)

MariaDB [test]> 
MariaDB [test]> DECLARE
    ->   vString VARCHAR2(255) := NULL;
    ->   -- CONSTANT seems to be not supported yet by MariaDB
    ->   -- cString CONSTANT VARCHAR2(255) := 'anonymous block';
    ->   cString VARCHAR2(255) := 'anonymous block';
    -> BEGIN
    ->   CALL hello.helloFromS9s(cString);
    -> END;
    -> /
+----------------------------------+
| called only once per connection! |
+----------------------------------+
| called only once per connection! |
+----------------------------------+
1 row in set (0.000 sec)

+--------------------------------------------------------------------+
| vString                                                            |
+--------------------------------------------------------------------+
| Severalnines showing MariaDB Package Procedure in anonymous block! |
+--------------------------------------------------------------------+
1 row in set (0.000 sec)

Query OK, 1 row affected (0.000 sec)

MariaDB [test]> 
MariaDB [test]> DELIMITER ;

However, Oracle's PL/SQL is compiled before execution when it is loaded into the server. Although MariaDB does not say this in their manual, I would assume that the approach is the same as MySQL where it is compiled and stored in the cache when it's invoked.

Migration Tools

As my colleague Bart indicated in our previous blog here, sqlines tools which are SQLines SQL Converter and SQLines Data Tool can also provide aid as part of your migration.

MariaDB have their Red Rover Migration Practice service which you can take advantage of.

Overall, Oracle's migration to MariaDB is not as easy a thing as for migrating to MySQL/Percona, which could add more challenges than MariaDB; especially no PL/SQL compatibility exists in MySQL.

Anyhow, if you find or know of any tools that you find helpful and beneficial for migrating from Oracle to MariaDB, please leave a comment on this blog!

Testing

Same as what I have stated in this blog, allow me to reiterate some of it here.

As part of your migration plan, testing is a vital task that plays a very important role and affects your decision with regards to migration.

The tool dbdeployer (a replacement of MySQL Sandbox) is a very helpful tool that you can take advantage of. This is pretty easy for you to try and test different approaches and saves you time, rather than setting up the whole stack if your purpose is to try and test the RDBMS platform first.

For testing your SQL stored routines (functions or procedures), triggers, events, I suggest you use these tools mytap or the Google's Unit Testing Framework.

Percona tools can still be useful and can be incorporated to your DBA or engineering tasks even with MariaDB. Checkout Percona Toolkit here. You can cherry-pick the tools according to your needs especially for testing and production-usage tasks.

Overall, things that you need to keep-in-mind as your guidelines when doing a test for your MariaDB Server are:

After your installation, you need to consider doing some tuning. Checkout our webinar about tuning your MariaDB server.
Do some benchmarks and stress-load testing for your configuration setup on your current node. Checkout mysqlslap and sysbench which can help you with this. Also check out our blog "How to Benchmark Performance of MySQL & MariaDB using SysBench".
Check your DDL's if they are correctly defined such as data-types, constraints, clustered and secondary indexes, or partitions, if you have any.
Check your DML especially if syntax are correct and are saving the data correctly as expected.
Check out your stored routines, events, trigger to ensure they run/return the expected results.
Verify that your queries running are performant. I suggest you take advantage of open-source tools or try our ClusterControl product. It offers monitoring/observability especially of your MariaDB cluster. Check this previous blog in which we showcase how ClusterControl can help you manage MariaDB TX 3.0. You can use ClusterControl here to monitor your queries and its query plan to make sure they are performant.

Tags:

MariaDB

oracle

migration

↧

Dealing with Unreliable Networks When Crafting an HA Solution for MySQL or MariaDB

March 6, 2019, 1:26 am

≫ Next: Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora

≪ Previous: Migration from Oracle Database to MariaDB - A Deep Dive

Long gone are the days when a database was deployed as a single node or instance - a powerful, standalone server which was tasked to handle all the requests to the database. Vertical scaling was the way to go - replace the server with another, even more powerful one. During these times, one didn’t really have to be bothered by network performance. As long as the requests were coming in, all was good.

But nowadays, databases are built as clusters with nodes interconnected over a network. It is not always a fast, local network. With businesses reaching global scale, database infrastructure has also to span across the globe, to stay close to customers and to reduce latency. It comes with additional challenges that we have to face when designing a highly available database environment. In this blog post, we will look into the network issues that you may face and provide some suggestions on how to deal with them.

Two Main Options for MySQL or MariaDB HA

We covered this particular topic quite extensively in one of the whitepapers, but let’s look at the two main ways of building high availability for MySQL and MariaDB.

Galera Cluster

Galera Cluster is shared-nothing, virtually synchronous cluster technology for MySQL. It allows to build multi-writer setups that can span across the globe. Galera thrives in low-latency environments but it can also be configured to work with long WAN connections. Galera has a built-in quorum mechanism which ensures that data will not be compromised in case of the network partitioning of some of the nodes.

MySQL Replication

MySQL Replication can be either asynchronous or semi-synchronous. Both are designed to build large scale replication clusters. Like in any other master-slave or primary-secondary replication setup, there can be only one writer, the master. Other nodes, slaves, are used for failover purposes as they contain the copy of the data set from the maser. Slaves can also be used for reading the data and offloading some of the workload from the master.

Both solutions have their own limits and features, both suffer from different problems. Both can be affected by unstable network connections. Let’s take a look at those limitations and how we can design the environment to minimize the impact of an unstable network infrastructure.

Galera Cluster - Network Problems

First, let’s take a look at Galera Cluster. As we discussed, it works best in a low-latency environment. One of the main latency-related problems in Galera is the way how Galera handles the writes. We will not go into all the details in this blog, but further reading in our Galera Cluster for MySQL tutorial. The bottom line is that, due to the certification process for writes, where all nodes in the cluster have to agree on whether the write can be applied or not, your write performance for single row is strictly limited by the network roundtrip time between the writer node and the most far away node. As long as the latency is acceptable and as long as you do not have too many hot spots in your data, WAN setups may work just fine. The problem starts when the network latency spikes from time to time. Writes will then take 3 or 4 times longer than usual and, as a result, databases may start to be overloaded with long-running writes.

One of great features of Galera Cluster is its ability to detect the cluster state and react upon network partitioning. If a node of the cluster cannot be reached, it will be evicted from the cluster and it will not be able to perform any writes. This is crucial in maintaining the integrity of the data during the time when the cluster is split - only the majority of the cluster will accept writes. Minority will complain. To handle this, Galera introduces a vast array of checks and configurable timeouts to avoid false alerts on very transient network issues. Unfortunately, if the network is unreliable, Galera Cluster will not be able to work correctly - nodes will start to leave the cluster, join it later. It will be especially problematic when we have Galera Cluster spanning across WAN - separated pieces of the cluster may disappear randomly if the interconnecting network will not work properly.

How to Design Galera Cluster for an Unstable Network?

First things first, if you have network problems within the single datacenter, there is not much you can do unless you will be able to solve those issues somehow. Unreliable local network is a no go for Galera Cluster, you have to reconsider using some other solution (even though, to be honest, unreliable network will always be a problematic). On the other hand, if the problems are related to WAN connections only (and this is one of the most typical cases), it may be possible to replace WAN Galera links with regular asynchronous replication (if the Galera WAN tuning did not help).

There are several inherent limitations in this setup - the main issue is that the writes used to happen locally. Now, all the writes will have to head to the “master” datacenter (DC A in our case). This is not as bad as it sounds. Please keep in mind that in an all-Galera environment, writes will be slowed down by the latency between nodes located in different datacenters. Even local writes will be affected. It will be more or less the same slowdown as with asynchronous setup in which you would send the writes across WAN to the “master” datacenter.

Using asynchronous replication comes with all of the problems typical for the asynchronous replication. Replication lag may become a problem - not that Galera would be more performant, it’s just that Galera would slow down the traffic via flow control while replication does not have any mechanism to throttle the traffic on the master.

Another problem is the failover: if the “master” Galera node (the one which acts as the master to the slaves in other datacenters) would fail, some mechanism has to be created to repoint slaves to another, working master node. It might be some sort of a script, it is also possible to try something with VIP where the “slave” Galera cluster slaves off Virtual IP which is always assigned to the alive Galera node in the “master” cluster.

The main advantage of such setup is that we do remove the WAN Galera link which means that our “master” cluster will not be slowed down by the fact that some of the nodes are separated geographically. As we mentioned, we lose the ability to write in all of the data-centers but latency-wise writing across the WAN is the same as writing locally to the Galera cluster which spans across WAN. As a result the overall latency should improve. Asynchronous replication is also less vulnerable to the unstable networks. Worst case scenario, the replication link will break and it will be recreated when the networks converge.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

How to Design MySQL Replication for an Unstable Network?

In the previous section, we covered Galera cluster and one solution was to use asynchronous replication. How does it look like in a plain asynchronous replication setup? Let’s look at how an unstable network can cause the biggest disruptions in the replication setup.

First of all, latency - one of the main pain points for Galera Cluster. In case of replication, it is almost a non-issue. Unless you use semi-synchronous replication that is - in such case, increased latency will slow down writes. In asynchronous replication, latency has no impact on the write performance. It may, though, have some impact on the replication lag. It is not anything as significant as it was for Galera but you may expect more lag spikes and overall less stable replication performance if the network between nodes suffers from high latency. This is mostly due to the fact that the master may as well serve several writes before data transfer to the slave can be initiated on high latency network.

The network instability may definitely impact replication links but it is, again, not that critical. MySQL slaves will attempt to reconnect to their masters and replication will commence.

The main issue with MySQL replication is actually something that Galera Cluster solves internally - network partitioning. We are talking about the network partitioning as the condition in which segments of the network are separated from each other. MySQL replication utilizes one single writer node - master. No matter how you design your environment, you have to send your writes to the master. If the master is not available (for whatever reasons), application cannot do its job unless it runs in some sort of read-only mode. Therefore there is a need to pick the new master as soon as possible. This is where the issues show up.

First, how to tell which host is a master and which one is not. One of the usual ways is to use the “read_only” variable to distinguish slaves from the master. If node has read_only enabled (set read_only=1), it is a slave (as slaves should not handle any direct writes). If the node has read_only disabled (set read_only=0), it is a master. To make things safer, a common approach is to set read_only=1 in MySQL configuration - in case of a restart, it is safer if the node shows up as a slave. Such “language” can be understood by proxies like ProxySQL or MaxScale.

Let’s take a look at an example.

We have application hosts which connect to the proxy layer. Proxies perform the read/write split sending SELECTs to slaves and writes to master. If master is down, failover is performed, new master is promoted, proxy layer detects that and start sending writes to another node.

If node1 restarts, it will come up with read_only=1 and it will be detected as a slave. It is not ideal as it is not replicating but it is acceptable. Ideally, the old master should not show up at all until it is rebuilt and slaved off the new master.

Way more problematic situation is if we have to deal with network partitioning. Let’s consider the same setup: application tier, proxy tier and databases.

When the network makes the master not reachable, the application is not usable as no writes make it to their destination. New master is promoted, writes are redirected to it. What will happen then if the network issues cease and the old master becomes reachable? It has not been stopped, therefore it is still using read_only=0:

You’ve now ended up in a split brain, when writes were directed to two nodes. This situation is pretty bad as to merge diverged datasets may take a while and it is quite a complex process.

What can be done to avoid this problem? There is no silver bullet but some actions can be taken to minimize the probability of a split brain to happen.

First of all, you can be smarter in detecting the state of the master. How do the slaves see it? Can they replicate from it? Maybe some of the slaves still can connect to the master, meaning that the master is up and running or, at least, making it possible to stop it should that be necessary. What about the proxy layer? Do all of the proxy nodes see the master as unavailable? If some can still connect, than you can try to utilize those nodes to ssh into the master and stop it before the failover?

The failover management software can also be smarter in detecting the state of the network. Maybe it utilizes RAFT or some other clustering protocol to build a quorum-aware cluster. If a failover management software can detect the split brain, it can also take some actions based on this like, for example, setting all nodes in the partitioned segment to read_only ensuring that the old master will not show up as writable when the networks converge.

You can also include tools like Consul or Etcd to store the state of the cluster. The proxy layer can be configured to use data from Consul, not the state of the read_only variable. It will be then up to the failover management software to make necessary changes in Consul so that all proxies will send the traffic to a correct, new master.

Some of those hints can even be combined together to make the failure detection even more reliable. All in all, it is possible to minimize the chances that the replication cluster will suffer from unreliable networks.

As you can see, no matter if we are talking about Galera or MySQL Replication, unstable networks may become a serious problem. On the other hand, if you design the environment correctly, you can still make it work. We hope this blog post will help you to create environments which will work stable even if the networks are not.

Tags:

↧

Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora

March 7, 2019, 2:37 pm

≫ Next: High Availability on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

≪ Previous: Dealing with Unreliable Networks When Crafting an HA Solution for MySQL or MariaDB

This blog starts a multi-series documenting my journey on benchmarking PostgreSQL in the cloud.

The first part includes an overview of benchmarking tools, and kickstarts the fun with Amazon Aurora PostgreSQL.

Selecting the PostgreSQL Cloud Services Providers

A while ago I came across the AWS benchmark procedure for Aurora, and thought it would be really cool if I could take that test and run it on other cloud hosting providers. To Amazon’s credit, out of the three most known utility computing providers — AWS, Google, and Microsoft — AWS is the only major contributor to PostgreSQL development, and the first to offer managed PostgreSQL service (dating back in November 2013).

While managed PostgreSQL services are also available from a plethora of PostgreSQL Hosting Providers, I wanted to focus on the said three cloud computing providers since their environments are where many organizations looking for the advantages of cloud computing choose to run their applications, provided that they have the required know-how on managing PostgreSQL. I am a firm believer that in today’s IT landscape, organizations working with critical workloads in the cloud would greatly benefit from the services of a specialized PostgreSQL service provider, that can help them navigate the complex world of GUCS and myriads of SlideShare presentations.

Selecting the Right Benchmark Tool

Benchmarking PostgreSQL comes up quite often on performance mailing list, and as stressed countless of times the tests are not intended to validate a configuration for a real life application. However, selecting the right benchmark tool and parameters are important in order to gather meaningful results. I would expect every cloud provider to provide procedures for benchmarking their services, especially when the first cloud experience may not start on the right foot. The good news is that two of the three players in this test, have included benchmarks in their documentation. The AWS Benchmark Procedure for Aurora guide is easy to find, available right on the Amazon Aurora Resources page. Google doesn’t provide a guide specific to PostgreSQL, however, the Compute Engine documentation contains a load testing guide for SQL Server based on HammerDB.

Following is a summary of benchmark tools based on their references are worth being looked at:

The AWS Benchmark mentioned above is based on pgbench and sysbench.
HammerDB, also mentioned earlier, is discussed in a recent post on pgsql-hackers list.
TPC-C tests based on oltpbench as alluded in this other pgsql-hackers discussion.
benchmarksql is yet another TPC-C test that was used to validate the changes to B-Tree page splits.
pg_ycsb is the new kid in town, improving on pgbench and already being used by some of the PostgreSQL hackers.
pgbench-tools as the name suggests, is based on pgbench and while not having received any updates since 2016, it is the product of Greg Smith, the author of PostgreSQL High Performance books.
join order benchmark is a benchmark that will test the query optimizer.
pgreplay which I came across while reading the Command Prompt blog is as close as it can get to benchmarking a real life scenario.

Another point to note is that PostgreSQL isn’t yet well suited for the TPC-H benchmark standard, and as noted above all the tools (except pgreplay) must be run in TPC-C mode (pgbench defaults to it).

For the purpose of this blog, I thought that the AWS Benchmark Procedure for Aurora is a good starting simply because it sets a standard for cloud providers and is based on widely used tools.

Also, I used the latest available PostgreSQL version at the time. When selecting a cloud provider, it is important to consider the frequency of upgrades, especially when important features introduced by new versions can affect performance (which is the case for versions 10 and 11 versus 9). As of this writing we have:

...and the winner here is AWS by offering the most recent version (although it is not the latest, which as of this writing is 11.2).

Setting up the Benchmarking Environment

I decided to limit my tests to average workloads for a couple of reasons: First, the available cloud resources are not identical across providers. In the guide, the AWS specs for the database instance are 64 vCPU / 488 GiB RAM / 25 Gigabit Network, while Google’s maximum RAM for any instance size (the choice must be set to “custom” in the Google Calculator) is 208 GiB, and Microsoft’s Business Critical Gen5 at 32 vCPU comes with only 163 GiB). Second, the pgbench initialization brings the database size to 160GiB which in the case of an instance with 488 GiB of RAM is likely to be stored in memory.

Also, I left the PostgreSQL configuration untouched. The reason for sticking to cloud provider’s defaults being that, out of the box, when stressed by a standard benchmark, a managed service is expected to perform reasonably well. Remember that the PostgreSQL community runs pgbench tests as part of the release management process. Additionally, the AWS guide does not mention any changes to the default PostgreSQL configuration.

As explained in the guide, AWS applied two patches to pgbench. Since the patch for the number of clients didn’t apply cleanly on the 10.6 version of PostgreSQL and I didn’t want to invest time into fixing it, the number of clients was limited to the maximum of 1,000.

The guide specifies a requirement for the client instance to have enhanced networking enabled — for this instance type that is the default:

[ec2-user@ip-172-31-19-190 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:cd:ee:40:2b:e6 brd ff:ff:ff:ff:ff:ff
    inet 172.31.19.190/20 brd 172.31.31.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8cd:eeff:fe40:2be6/64 scope link
       valid_lft forever preferred_lft forever
[ec2-user@ip-172-31-19-190 ~]$ ethtool -i eth0
driver: ena
version: 2.0.2g
firmware-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
>>> aws (master *%) ~ $ aws ec2 describe-instances --instance-ids i-0ee51642334c1ec57 --query "Reservations[].Instances[].EnaSupport"
[
    true
]

Running the Benchmark on Amazon Aurora PostgreSQL

During the actual run I decided to make one more deviation from the guide: instead of running the test for 1 hour set the time limit to 10 minutes, which is generally accepted as a good value.

Run #1

Specifics

This test uses the AWS specifications for both client and database instance sizes.
- Client machine: On Demand Memory Optimized EC2 instance:
  - vCPU: 32 (16 Cores x 2 Threads/Core)
  - RAM: 244 GiB
  - Storage: EBS Optimized
  - Network: 10 Gigabit
- DB Cluster: db.r4.16xlarge
  - vCPU: 64
  - ECU (CPU capacity): 195 x [1.0-1.2 GHz] 2007 Opteron / Xeon
  - RAM: 488 GiB
  - Storage: EBS Optimized (Dedicated capacity for I/O)
  - Network: 14,000 Mbps Max Bandwidth on a 25 Gps network
The database setup included one replica.
Database storage was not encrypted.

Performing the Tests and Results

Follow the instructions in the guide to install pgbench and sysbench.

Edit ~/.bashrc to set the environment variables for the database connection and required paths to PostgreSQL libraries:

export PGHOST=aurora.cluster-ctfirtyhadgr.us-east-1.rds.amazonaws.com
export PGUSER=postgres
export PGPASSWORD=postgres
export PGDATABASE=postgres
export PATH=$PATH:/usr/local/pgsql/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/pgsql/lib

Initialize the database:

[root@ip-172-31-19-190 ~]# pgbench -i --fillfactor=90 --scale=10000
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 1000000000 tuples (0%) done (elapsed 0.05 s, remaining 457.23 s)
200000 of 1000000000 tuples (0%) done (elapsed 0.13 s, remaining 631.70 s)
300000 of 1000000000 tuples (0%) done (elapsed 0.21 s, remaining 688.29 s)

...

999500000 of 1000000000 tuples (99%) done (elapsed 811.41 s, remaining 0.41 s)
999600000 of 1000000000 tuples (99%) done (elapsed 811.50 s, remaining 0.32 s)
999700000 of 1000000000 tuples (99%) done (elapsed 811.58 s, remaining 0.24 s)
999800000 of 1000000000 tuples (99%) done (elapsed 811.65 s, remaining 0.16 s)
999900000 of 1000000000 tuples (99%) done (elapsed 811.73 s, remaining 0.08 s)
1000000000 of 1000000000 tuples (100%) done (elapsed 811.80 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

Verify the database size:

postgres=> \l+ postgres
                                                                 List of databases
   Name   |  Owner   | Encoding |   Collate   |    Ctype    | Access privileges |  Size  | Tablespace |                Description
----------+----------+----------+-------------+-------------+-------------------+--------+------------+--------------------------------------------
 postgres | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                   | 160 GB | pg_default | default administrative connection database
(1 row)

Use the following query to verify that the time interval between checkpoints is set so checkpoints will be forced during the 10 min run:

SELECT
   total_checkpoints,
   seconds_since_start / total_checkpoints / 60 AS minutes_between_checkpoints FROM (
      SELECT EXTRACT(
      EPOCH FROM (
         now() - pg_postmaster_start_time()
      )
      ) AS seconds_since_start,
   (checkpoints_timed+checkpoints_req) AS total_checkpoints
FROM pg_stat_bgwriter) AS sub;

Result:

postgres=> \e
   total_checkpoints | minutes_between_checkpoints
-------------------+-----------------------------
                  50 |           0.977392292333333
(1 row)

Run the Read/Write workload:

[root@ip-172-31-19-190 ~]# pgbench --protocol=prepared -P 60 --time=600 --client=1000 --jobs=2048

Output

starting vacuum...end.
progress: 60.0 s, 35670.3 tps, lat 27.243 ms stddev 10.915
progress: 120.0 s, 36569.5 tps, lat 27.352 ms stddev 11.859
progress: 180.0 s, 35845.2 tps, lat 27.896 ms stddev 12.785
progress: 240.0 s, 36613.7 tps, lat 27.310 ms stddev 11.804
progress: 300.0 s, 37323.4 tps, lat 26.793 ms stddev 11.376
progress: 360.0 s, 36828.8 tps, lat 27.155 ms stddev 11.318
progress: 420.0 s, 36670.7 tps, lat 27.268 ms stddev 12.083
progress: 480.0 s, 37176.1 tps, lat 26.899 ms stddev 10.981
progress: 540.0 s, 37210.8 tps, lat 26.875 ms stddev 11.341
progress: 600.0 s, 37415.4 tps, lat 26.727 ms stddev 11.521
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 1000
number of threads: 1000
duration: 600 s
number of transactions actually processed: 22040445
latency average = 27.149 ms
latency stddev = 11.617 ms
tps = 36710.828624 (including connections establishing)
tps = 36811.054851 (excluding connections establishing)

Prepare the sysbench test:

sysbench --test=/usr/local/share/sysbench/oltp.lua \
    --pgsql-host=aurora.cluster-ctfirtyhadgr.us-east-1.rds.amazonaws.com \
    --pgsql-db=postgres \
    --pgsql-user=postgres \
    --pgsql-password=postgres \
    --pgsql-port=5432 \
    --oltp-tables-count=250\
    --oltp-table-size=450000 \
    prepare

Output:

sysbench 0.5:  multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 450000 records into 'sbtest1'
Creating secondary indexes on 'sbtest1'...
Creating table 'sbtest2'...
...
Creating table 'sbtest250'...
Inserting 450000 records into 'sbtest250'
Creating secondary indexes on 'sbtest250'...

Run the sysbench test:

sysbench --test=/usr/local/share/sysbench/oltp.lua \
    --pgsql-host=aurora.cluster-ctfirtyhadgr.us-east-1.rds.amazonaws.com \
    --pgsql-db=postgres \
    --pgsql-user=postgres \
    --pgsql-password=postgres \
    --pgsql-port=5432 \
    --oltp-tables-count=250 \
    --oltp-table-size=450000 \
    --max-requests=0 \
    --forced-shutdown \
    --report-interval=60 \
    --oltp_simple_ranges=0 \
    --oltp-distinct-ranges=0 \
    --oltp-sum-ranges=0 \
    --oltp-order-ranges=0 \
    --oltp-point-selects=0 \
    --rand-type=uniform \
    --max-time=600 \
    --num-threads=1000 \
    run

Output:

sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1000
Report intermediate results every 60 second(s)
Random number generator seed is 0 and will be ignored

Forcing shutdown in 630 seconds

Initializing worker threads...

Threads started!

[  60s] threads: 1000, tps: 20443.09, reads: 0.00, writes: 81834.16, response time: 68.24ms (95%), errors: 0.62, reconnects:  0.00
[ 120s] threads: 1000, tps: 20580.68, reads: 0.00, writes: 82324.33, response time: 70.75ms (95%), errors: 0.73, reconnects:  0.00
[ 180s] threads: 1000, tps: 20531.85, reads: 0.00, writes: 82127.21, response time: 70.63ms (95%), errors: 0.73, reconnects:  0.00
[ 240s] threads: 1000, tps: 20212.67, reads: 0.00, writes: 80861.67, response time: 71.99ms (95%), errors: 0.43, reconnects:  0.00
[ 300s] threads: 1000, tps: 19383.90, reads: 0.00, writes: 77537.87, response time: 75.64ms (95%), errors: 0.75, reconnects:  0.00
[ 360s] threads: 1000, tps: 19797.20, reads: 0.00, writes: 79190.78, response time: 75.27ms (95%), errors: 0.68, reconnects:  0.00
[ 420s] threads: 1000, tps: 20304.43, reads: 0.00, writes: 81212.87, response time: 73.82ms (95%), errors: 0.70, reconnects:  0.00
[ 480s] threads: 1000, tps: 20933.80, reads: 0.00, writes: 83737.16, response time: 74.71ms (95%), errors: 0.68, reconnects:  0.00
[ 540s] threads: 1000, tps: 20663.05, reads: 0.00, writes: 82626.42, response time: 73.56ms (95%), errors: 0.75, reconnects:  0.00
[ 600s] threads: 1000, tps: 20746.02, reads: 0.00, writes: 83015.81, response time: 73.58ms (95%), errors: 0.78, reconnects:  0.00
OLTP test statistics:
   queries performed:
      read:                            0
      write:                           48868458
      other:                           24434022
      total:                           73302480
   transactions:                        12216804 (20359.59 per sec.)
   read/write requests:                 48868458 (81440.43 per sec.)
   other operations:                    24434022 (40719.87 per sec.)
   ignored errors:                      414    (0.69 per sec.)
   reconnects:                          0      (0.00 per sec.)

General statistics:
   total time:                          600.0516s
   total number of events:              12216804
   total time taken by event execution: 599964.4735s
   response time:
         min:                                  6.27ms
         avg:                                 49.11ms
         max:                                350.24ms
         approx.  95 percentile:              72.90ms

Threads fairness:
   events (avg/stddev):           12216.8040/31.27
   execution time (avg/stddev):   599.9645/0.01

Metrics Collected

Cloudwatch Metrics

Performance Insights Metrics

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Run #2

Specifics

This test uses the AWS specifications for the client and a smaller instance size for the database:
- Client machine: On Demand Memory Optimized EC2 instance:
  - vCPU: 32 (16 Cores x 2 Threads/Core)
  - RAM: 244 GiB
  - Storage: EBS Optimized
  - Network: 10 Gigabit
- DB Cluster: db.r4.2xlarge:
  - vCPU: 8
  - RAM: 61GiB
  - Storage: EBS Optimized
  - Network: 1,750 Mbps Max Bandwidth on an up to 10 Gbps connection
The database did not include a replica.
Database storage was not encrypted.

Performing the Tests and Results

The steps are identical to Run #1 so I’m showing only the output:

pgbench Read/Write workload:

starting vacuum...end.

progress: 60.0 s, 3361.3 tps, lat 286.143 ms stddev 80.417
progress: 120.0 s, 3466.8 tps, lat 288.386 ms stddev 76.373
progress: 180.0 s, 3683.1 tps, lat 271.840 ms stddev 75.712
progress: 240.0 s, 3444.3 tps, lat 289.909 ms stddev 69.564
progress: 300.0 s, 3475.8 tps, lat 287.736 ms stddev 73.712
progress: 360.0 s, 3449.5 tps, lat 289.832 ms stddev 71.878
progress: 420.0 s, 3518.1 tps, lat 284.432 ms stddev 74.276
progress: 480.0 s, 3430.7 tps, lat 291.359 ms stddev 73.264
progress: 540.0 s, 3515.7 tps, lat 284.522 ms stddev 73.206
progress: 600.0 s, 3482.9 tps, lat 287.037 ms stddev 71.649
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 1000
number of threads: 1000
duration: 600 s
number of transactions actually processed: 2090702
latency average = 286.030 ms
latency stddev = 74.245 ms
tps = 3481.731730 (including connections establishing)
tps = 3494.157830 (excluding connections establishing)

sysbench test:

sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1000
Report intermediate results every 60 second(s)
Random number generator seed is 0 and will be ignored

Forcing shutdown in 630 seconds

Initializing worker threads...

Threads started!

[  60s] threads: 1000, tps: 4809.05, reads: 0.00, writes: 19301.02, response time: 288.03ms (95%), errors: 0.05, reconnects:  0.00
[ 120s] threads: 1000, tps: 5264.15, reads: 0.00, writes: 21005.40, response time: 255.23ms (95%), errors: 0.08, reconnects:  0.00
[ 180s] threads: 1000, tps: 5178.27, reads: 0.00, writes: 20713.07, response time: 260.40ms (95%), errors: 0.03, reconnects:  0.00
[ 240s] threads: 1000, tps: 5145.95, reads: 0.00, writes: 20610.08, response time: 255.76ms (95%), errors: 0.05, reconnects:  0.00
[ 300s] threads: 1000, tps: 5127.92, reads: 0.00, writes: 20507.98, response time: 264.24ms (95%), errors: 0.05, reconnects:  0.00
[ 360s] threads: 1000, tps: 5063.83, reads: 0.00, writes: 20278.10, response time: 268.55ms (95%), errors: 0.05, reconnects:  0.00
[ 420s] threads: 1000, tps: 5057.51, reads: 0.00, writes: 20237.28, response time: 269.19ms (95%), errors: 0.10, reconnects:  0.00
[ 480s] threads: 1000, tps: 5036.32, reads: 0.00, writes: 20139.29, response time: 279.62ms (95%), errors: 0.10, reconnects:  0.00
[ 540s] threads: 1000, tps: 5115.25, reads: 0.00, writes: 20459.05, response time: 264.64ms (95%), errors: 0.08, reconnects:  0.00
[ 600s] threads: 1000, tps: 5124.89, reads: 0.00, writes: 20510.07, response time: 265.43ms (95%), errors: 0.10, reconnects:  0.00
OLTP test statistics:
    queries performed:
        read:                            0
        write:                           12225686
        other:                           6112822
        total:                           18338508
    transactions:                        3056390 (5093.75 per sec.)
    read/write requests:                 12225686 (20375.20 per sec.)
    other operations:                    6112822 (10187.57 per sec.)
    ignored errors:                      42     (0.07 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          600.0277s
    total number of events:              3056390
    total time taken by event execution: 600005.2104s
    response time:
         min:                                  9.57ms
         avg:                                196.31ms
         max:                                608.70ms
         approx.  95 percentile:             268.71ms

Threads fairness:
    events (avg/stddev):           3056.3900/67.44
    execution time (avg/stddev):   600.0052/0.01

Metrics Collected

Cloudwatch Metrics

Performance Insights - Counter Metrics

Performance Insights - Database Load by Waits

Final Thoughts

Users are limited to using predefined instance sizes. As a downside, if the benchmark shows that the instance can benefit from additional memory it is not possible to “just add more RAM”. Adding more memory translates to increasing the instance size which comes with a higher cost (cost doubles for every instance size).
Amazon Aurora storage engine is much different from RDS, and is built on top of SAN hardware. The I/O throughput metrics per instance show that the test did not get even closer to the maximum for the provisioned IOPS SSD EBS volumes of 1,750 MiB/s.
Further tuning can be performed by reviewing the AWS PostgreSQL Events included in the Performance Insights graphs.

Next in Series

Stay tuned for the next part: Amazon RDS for PostgreSQL 10.6.

Tags:

↧

High Availability on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

March 8, 2019, 7:23 am

≫ Next: How to Manage MySQL - for Oracle DBAs

≪ Previous: Benchmarking Managed PostgreSQL Cloud Solutions - Part One: Amazon Aurora

We regularly get questions about how to set up a Galera cluster with just 2 nodes.

The documentation clearly states you should have at least 3 Galera nodes to avoid network partitioning. But there are some valid reasons for considering a 2 node deployment, e.g., if you want to achieve database high availability but have a limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup.

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes, so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. Arbitrator failure does not affect the cluster operations and a new instance can be reattached to the cluster at any time. There can be several arbitrators in the cluster.

ClusterControl has support for deploying garbd on non-database hosts.

Normally a Galera cluster needs at least three hosts to be fully functional, however, at deploy time, two nodes would suffice to create a primary component. Here are the steps:

Deploy a Galera cluster of two nodes,
After the cluster has been deployed by ClusterControl, add garbd on the ClusterControl node.

You should end up with the below setup:

Deploy the Galera Cluster

Go to the ClusterControl Deploy section to deploy the cluster.

After selecting the technology that we want to deploy, we must specify User, Key or Password and port to connect by SSH to our hosts. We also need the name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

After setting up the SSH access information, we must select vendor/version and we must define the database admin password, datadir and port. We can also specify which repository to use.

Even though ClusterControl warns you that a Galera cluster needs an odd number of nodes, only add two nodes to the cluster.

Deploying a Galera cluster will trigger a ClusterControl job which can be monitored at the Jobs page.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Install Garbd

Once deployment is complete, install garbd on the ClusterControl host. We have the option to deploy garbd from ClusterControl, but this option won’t work if we want to deploy it in the same ClusterControl server. This is to avoid some issue related to the database versions and package dependencies.

So, we must install it manually, and then import garbd to ClusterControl.

Let’s see the manual installation of Percona Garbd on CentOS 7.

Create the Percona repository file:

$ vi /etc/yum.repos.d/percona.repo
[percona-release-$basearch]
name = Percona-Release YUM repository - $basearch
baseurl = http://repo.percona.com/release/$releasever/RPMS/$basearch
enabled = 1
gpgcheck = 0
[percona-release-noarch]
name = Percona-Release YUM repository - noarch
baseurl = http://repo.percona.com/release/$releasever/RPMS/noarch
enabled = 1
gpgcheck = 0
[percona-release-source]
name = Percona-Release YUM repository - Source packages
baseurl = http://repo.percona.com/release/$releasever/SRPMS
enabled = 0
gpgcheck = 0

Then, install the Percona XtraDB Cluster garbd package:

$ yum install Percona-XtraDB-Cluster-garbd-57

Now, we need to configure garbd. For this, we need to edit the /etc/sysconfig/garb file:

$ vi /etc/sysconfig/garb
# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.
# A comma-separated list of node addresses (address[:port]) in the cluster
GALERA_NODES="192.168.100.192:4567,192.168.100.193:4567"
# Galera cluster name, should be the same as on the rest of the nodes.
GALERA_GROUP="Galera1"
# Optional Galera internal options string (e.g. SSL settings)
# see http://galeracluster.com/documentation-webpages/galeraparameters.html
# GALERA_OPTIONS=""
# Log file for garbd. Optional, by default logs to syslog
# Deprecated for CentOS7, use journalctl to query the log for garbd
# LOG_FILE=""

Change the GALERA_NODES and GALERA_GROUP parameter according to the Galera nodes configuration. We also need to remove the line # REMOVE THIS AFTER CONFIGURATION before starting the service.

And now, we can start the garb service:

$ service garb start
Redirecting to /bin/systemctl start garb.service

Now, we can import the new garbd into ClusterControl.

Go to ClusterControl -> Select Cluster -> Add Load Balancer.

Then, select Garbd and Import Garbd section.

Here we only need to specify the hostname or IP Address and the port of the new Garbd.

Importing garbd will trigger a ClusterControl job which can be monitored at the Jobs page. Once completed, you can verify garbd is running with a green tick icon at the top bar:

That’s it!

Our minimal two-node Galera cluster is now ready!

Tags:

↧

How to Manage MySQL - for Oracle DBAs

March 11, 2019, 5:57 am

≫ Next: HA for MySQL and MariaDB - Comparing Master-Master Replication to Galera Cluster

≪ Previous: High Availability on a Shoestring Budget - Deploying a Minimal Two Node MySQL Galera Cluster

Open source databases are quickly becoming mainstream, so migration from proprietary engines into open source engines is a kind of an industry trend now. It also means that we DBA’s often end up having multiple database backends to manage.

In the past few blog posts, my colleague Paul Namuag and I covered several aspects of migration from Oracle to Percona, MariaDB, and MySQL. The obvious goal for the migration is to get your application up and running more efficiently in the new database environment, however it’s crucial to assure that staff is ready to support it.

This blog covers the basic operations of MySQL with reference to similar tasks that you would perform daily in your Oracle environment. It provides you with a deep dive on different topics to save you time as you can relate to Oracle knowledge that you’ve already built over the years.

We will also talk about external command line tools that are missing in the default MySQL installation but are needed to perform daily operations efficiently. The open source version doesn’t come with the equivalent of Oracle Cloud Control for instance, so do checkout ClusterControl if you are looking for something similar.

In this blog, we are assuming you have a better knowledge of Oracle than MySQL and hence would like to know the correlation between the two. The examples are based on Linux platform however you can find many similarities in managing MySQL on Windows.

How do I connect to MySQL?

Let’s start our journey with a very (seemingly) basic task. Actually, this is a kind of task which can cause some confusion due to different login concepts in Oracle and MySQL.

The equivalent of sqlplus / as sysdba connection is “mysql” terminal command with a flag -uroot. In the MySQL world, the superuser is called root. MySQL database users (including root) are defined by the name and host from where it can connect.

The information about user and hosts from where it can connect is stored in mysql.user table. With the connection attempt, MySQL checks if the client host, username and password match the row in the metadata table.

This is a bit of a different approach than in Oracle where we have a user name and password only, but those who are familiar with Oracle Connection Manager might find some similarities.

You will not find predefined TNS entries like in Oracle. Usually, for an admin connection, we need user, password and -h host flag. The default port is 3306 (like 1521 in Oracle) but this may vary on different setups.

By default, many installations will have root access connection from any machine (root@’%’) blocked, so you have to log in to the server hosting MySQL, typically via ssh.

Type the following:

mysql -u root

When the root password is not set this is enough. If the password is required then you should add the flag -p.

mysql -u root -p

You are now logged in to the mysql client (the equivalent of sqlplus) and will see a prompt, typically 'mysql>'.

Is MySQL up and running?

You can use the mysql service startup script or mysqladmin command to find out if it is running. Then you can use the ps command to see if mysql processes are up and running. Another alternative can be mysqladmin, which is a utility that is used for performing administrative operations.

mysqladmin -u root -p status

On Debian:

/etc/init.d/mysql status

If you are using RedHat or Fedora then you can use the following script:

service mysqld status

/etc/init.d/mysqld status

systemctl status mysql.service

On MariaDB instances, you should look for the MariaDB service name.

systemctl status mariadb

What’s in this database?

Like in Oracle, you can querythe metadata objects to get information about database objects.

It’s common to use some shortcuts here, commands that help you to list objects or get DDL of the objects.

show databases;
use database_name;
show tables;
show table status;
show index from table_name;
show create table table_name;

Similar to Oracle you can describe the table:

desc table_name;

Where is my data stored?

There is no dedicated internal storage like ASM in MySQL. All data files are placed in the regular OS mount points. With a default installation, you can find your data in:

/var/lib/mysql

The location is based on the variable datadir.

root@mysql-3:~# cat /etc/mysql/my.cnf | grep datadir
datadir=/var/lib/mysql

You will see there a directory for each database.

Depending on the version and storage engine (yes there are a few here), the database’s directory may contain files of the format *.frm, which define the structure of each table within the database. For MyISAM tables, the data (*.MYD) and indexes (*.MYI) are stored within this directory also.

InnoDB tables are stored in InnoDB tablespaces. Each of which consists of one or more files, which are similar to Oracle tablespaces. In a default installation, all InnoDB data and indexes for all databases on a MySQL server are held in one tablespace, consisting of one file: /var/lib/mysql/ibdata1. In most setups, you don’t manage tablespaces like in Oracle. The best practice is to keep them with autoextend on and max size unlimited.

root@mysql-3:~# cat /etc/mysql/my.cnf | grep innodb-data-file-path
innodb-data-file-path = ibdata1:100M:autoextend

InnoDB has log files, which are the equivalent of Oracle redo logs, allowing automatic crash recovery. By default there are two log files: /var/lib/mysql/ib_logfile0 and /var/lib/mysql/ib_logfile1. Undo data is held within the tablespace file.

root@galera-3:/var/lib/mysql# ls -rtla | grep logfile
-rw-rw----  1 mysql mysql  268435456 Dec 15 00:59 ib_logfile1
-rw-rw----  1 mysql mysql  268435456 Mar  6 11:45 ib_logfile0

Where is the metadata information?

There are no dba_*, user_*, all_* type of views but MySQL has internal metadata views.

Information_schema is defined in the SQL 2003 standard and is implemented by other major databases, e.g. SQL Server, PostgreSQL.

Since MySQL 5.0, the information_schema database has been available, containing data dictionary information. The information was actually stored in the external FRM files. Finally, after many years .frm files are gone in version 8.0. The metadata is still visible in the information_schema database but uses the InnoDB storage engine.

To see all actual views contained in the data dictionary within the mysql client, switch to information_schema database:

use information_schema;
show tables;

You can find additional information in the MySQL database,which contains information about db, event (MySQL jobs), plugins, replication, database, users etc.

The number of views depends on the version and vendor.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Select * from v$session

Oracle’s select * from v$session is represented here with the command SHOW PROCESSLIST which shows the list of threads.

mysql> SHOW PROCESSLIST;
+---------+------------------+------------------+--------------------+---------+--------+--------------------+------------------+-----------+---------------+
| Id      | User             | Host             | db                 | Command | Time   | State              | Info             | Rows_sent | Rows_examined |
+---------+------------------+------------------+--------------------+---------+--------+--------------------+------------------+-----------+---------------+
|       1 | system user      |                  | NULL               | Sleep   | 469264 | wsrep aborter idle | NULL             |         0 |             0 |
|       2 | system user      |                  | NULL               | Sleep   | 469264 | NULL               | NULL             |         0 |             0 |
|       3 | system user      |                  | NULL               | Sleep   | 469257 | NULL               | NULL             |         0 |             0 |
|       4 | system user      |                  | NULL               | Sleep   | 469257 | NULL               | NULL             |         0 |             0 |
|       6 | system user      |                  | NULL               | Sleep   | 469257 | NULL               | NULL             |         0 |             0 |
|      16 | maxscale         | 10.0.3.168:5914  | NULL               | Sleep   |      5 |                    | NULL             |         4 |             4 |
|      59 | proxysql-monitor | 10.0.3.168:6650  | NULL               | Sleep   |      7 |                    | NULL             |         0 |             0 |
|      81 | proxysql-monitor | 10.0.3.78:62896  | NULL               | Sleep   |      6 |                    | NULL             |         0 |             0 |
|    1564 | proxysql-monitor | 10.0.3.78:25064  | NULL               | Sleep   |      3 |                    | NULL             |         0 |             0 |
| 1822418 | cmon             | 10.0.3.168:41202 | information_schema | Sleep   |      0 |                    | NULL             |         0 |             8 |
| 1822631 | cmon             | 10.0.3.168:43254 | information_schema | Sleep   |      4 |                    | NULL             |         1 |             1 |
| 1822646 | cmon             | 10.0.3.168:43408 | information_schema | Sleep   |      0 |                    | NULL             |       464 |           464 |
| 2773260 | backupuser       | localhost        | mysql              | Query   |      0 | init               | SHOW PROCESSLIST |         0 |             0 |
+---------+------------------+------------------+--------------------+---------+--------+--------------------+------------------+-----------+---------------+


13 rows in set (0.00 sec)

It is based on information stored in the information_schema.processlist view. The view requires to have the PROCESS privilege. It can also help you to check if you are running out of the maximum number of processes.

Where is an alert log?

The error log can be found in my.cnf or via show variables command.

mysql> show variables like 'log_error';
+---------------+--------------------------+
| Variable_name | Value                    |
+---------------+--------------------------+
| log_error     | /var/lib/mysql/error.log |
+---------------+--------------------------+
1 row in set (0.00 sec)

Where is the list of the users and their permissions?

The information about users is stored in the mysql.user table, while the grants are stored in several places including the mysql.user, mysql.tables_priv,

MySQL user access is defined in:

mysql.columns_priv, mysql.tables_priv, mysql.db,mysql.user

The preferable way to list grants is to use pt-grants, the tool from Percona toolkit (a must-have for every MySQL DBA).

pt-show-grants --host localhost --user root --ask-pass

Alternatively, you can use the following query (created by Calvaldo)

SELECT
    CONCAT("`",gcl.Db,"`") AS 'Database(s) Affected',
    CONCAT("`",gcl.Table_name,"`") AS 'Table(s) Affected',
    gcl.User AS 'User-Account(s) Affected',
    IF(gcl.Host='%','ALL',gcl.Host) AS 'Remote-IP(s) Affected',
    CONCAT("GRANT ",UPPER(gcl.Column_priv)," (",GROUP_CONCAT(gcl.Column_name),") ",
                 "ON `",gcl.Db,"`.`",gcl.Table_name,"` ",
                 "TO '",gcl.User,"'@'",gcl.Host,"';") AS 'GRANT Statement (Reconstructed)'
FROM mysql.columns_priv gcl
GROUP BY CONCAT(gcl.Db,gcl.Table_name,gcl.User,gcl.Host)
/* SELECT * FROM mysql.columns_priv */

UNION

/* [Database.Table]-Specific Grants */
SELECT
    CONCAT("`",gtb.Db,"`") AS 'Database(s) Affected',
    CONCAT("`",gtb.Table_name,"`") AS 'Table(s) Affected',
    gtb.User AS 'User-Account(s) Affected',
    IF(gtb.Host='%','ALL',gtb.Host) AS 'Remote-IP(s) Affected',
    CONCAT(
        "GRANT ",UPPER(gtb.Table_priv),"",
        "ON `",gtb.Db,"`.`",gtb.Table_name,"` ",
        "TO '",gtb.User,"'@'",gtb.Host,"';"
    ) AS 'GRANT Statement (Reconstructed)'
FROM mysql.tables_priv gtb
WHERE gtb.Table_priv!=''
/* SELECT * FROM mysql.tables_priv */

UNION

/* Database-Specific Grants */
SELECT
    CONCAT("`",gdb.Db,"`") AS 'Database(s) Affected',
    "ALL" AS 'Table(s) Affected',
    gdb.User AS 'User-Account(s) Affected',
    IF(gdb.Host='%','ALL',gdb.Host) AS 'Remote-IP(s) Affected',
    CONCAT(
        'GRANT ',
        CONCAT_WS(',',
            IF(gdb.Select_priv='Y','SELECT',NULL),
            IF(gdb.Insert_priv='Y','INSERT',NULL),
            IF(gdb.Update_priv='Y','UPDATE',NULL),
            IF(gdb.Delete_priv='Y','DELETE',NULL),
            IF(gdb.Create_priv='Y','CREATE',NULL),
            IF(gdb.Drop_priv='Y','DROP',NULL),
            IF(gdb.Grant_priv='Y','GRANT',NULL),
            IF(gdb.References_priv='Y','REFERENCES',NULL),
            IF(gdb.Index_priv='Y','INDEX',NULL),
            IF(gdb.Alter_priv='Y','ALTER',NULL),
            IF(gdb.Create_tmp_table_priv='Y','CREATE TEMPORARY TABLES',NULL),
            IF(gdb.Lock_tables_priv='Y','LOCK TABLES',NULL),
            IF(gdb.Create_view_priv='Y','CREATE VIEW',NULL),
            IF(gdb.Show_view_priv='Y','SHOW VIEW',NULL),
            IF(gdb.Create_routine_priv='Y','CREATE ROUTINE',NULL),
            IF(gdb.Alter_routine_priv='Y','ALTER ROUTINE',NULL),
            IF(gdb.Execute_priv='Y','EXECUTE',NULL),
            IF(gdb.Event_priv='Y','EVENT',NULL),
            IF(gdb.Trigger_priv='Y','TRIGGER',NULL)
        ),
        " ON `",gdb.Db,"`.* TO '",gdb.User,"'@'",gdb.Host,"';"
    ) AS 'GRANT Statement (Reconstructed)'
FROM mysql.db gdb
WHERE gdb.Db != ''
/* SELECT * FROM mysql.db */

UNION

/* User-Specific Grants */
SELECT
    "ALL" AS 'Database(s) Affected',
    "ALL" AS 'Table(s) Affected',
    gus.User AS 'User-Account(s) Affected',
    IF(gus.Host='%','ALL',gus.Host) AS 'Remote-IP(s) Affected',
    CONCAT(
        "GRANT ",
        IF((gus.Select_priv='N')&(gus.Insert_priv='N')&(gus.Update_priv='N')&(gus.Delete_priv='N')&(gus.Create_priv='N')&(gus.Drop_priv='N')&(gus.Reload_priv='N')&(gus.Shutdown_priv='N')&(gus.Process_priv='N')&(gus.File_priv='N')&(gus.References_priv='N')&(gus.Index_priv='N')&(gus.Alter_priv='N')&(gus.Show_db_priv='N')&(gus.Super_priv='N')&(gus.Create_tmp_table_priv='N')&(gus.Lock_tables_priv='N')&(gus.Execute_priv='N')&(gus.Repl_slave_priv='N')&(gus.Repl_client_priv='N')&(gus.Create_view_priv='N')&(gus.Show_view_priv='N')&(gus.Create_routine_priv='N')&(gus.Alter_routine_priv='N')&(gus.Create_user_priv='N')&(gus.Event_priv='N')&(gus.Trigger_priv='N')&(gus.Create_tablespace_priv='N')&(gus.Grant_priv='N'),
            "USAGE",
            IF((gus.Select_priv='Y')&(gus.Insert_priv='Y')&(gus.Update_priv='Y')&(gus.Delete_priv='Y')&(gus.Create_priv='Y')&(gus.Drop_priv='Y')&(gus.Reload_priv='Y')&(gus.Shutdown_priv='Y')&(gus.Process_priv='Y')&(gus.File_priv='Y')&(gus.References_priv='Y')&(gus.Index_priv='Y')&(gus.Alter_priv='Y')&(gus.Show_db_priv='Y')&(gus.Super_priv='Y')&(gus.Create_tmp_table_priv='Y')&(gus.Lock_tables_priv='Y')&(gus.Execute_priv='Y')&(gus.Repl_slave_priv='Y')&(gus.Repl_client_priv='Y')&(gus.Create_view_priv='Y')&(gus.Show_view_priv='Y')&(gus.Create_routine_priv='Y')&(gus.Alter_routine_priv='Y')&(gus.Create_user_priv='Y')&(gus.Event_priv='Y')&(gus.Trigger_priv='Y')&(gus.Create_tablespace_priv='Y')&(gus.Grant_priv='Y'),
                "ALL PRIVILEGES",
                CONCAT_WS(',',
                    IF(gus.Select_priv='Y','SELECT',NULL),
                    IF(gus.Insert_priv='Y','INSERT',NULL),
                    IF(gus.Update_priv='Y','UPDATE',NULL),
                    IF(gus.Delete_priv='Y','DELETE',NULL),
                    IF(gus.Create_priv='Y','CREATE',NULL),
                    IF(gus.Drop_priv='Y','DROP',NULL),
                    IF(gus.Reload_priv='Y','RELOAD',NULL),
                    IF(gus.Shutdown_priv='Y','SHUTDOWN',NULL),
                    IF(gus.Process_priv='Y','PROCESS',NULL),
                    IF(gus.File_priv='Y','FILE',NULL),
                    IF(gus.References_priv='Y','REFERENCES',NULL),
                    IF(gus.Index_priv='Y','INDEX',NULL),
                    IF(gus.Alter_priv='Y','ALTER',NULL),
                    IF(gus.Show_db_priv='Y','SHOW DATABASES',NULL),
                    IF(gus.Super_priv='Y','SUPER',NULL),
                    IF(gus.Create_tmp_table_priv='Y','CREATE TEMPORARY TABLES',NULL),
                    IF(gus.Lock_tables_priv='Y','LOCK TABLES',NULL),
                    IF(gus.Execute_priv='Y','EXECUTE',NULL),
                    IF(gus.Repl_slave_priv='Y','REPLICATION SLAVE',NULL),
                    IF(gus.Repl_client_priv='Y','REPLICATION CLIENT',NULL),
                    IF(gus.Create_view_priv='Y','CREATE VIEW',NULL),
                    IF(gus.Show_view_priv='Y','SHOW VIEW',NULL),
                    IF(gus.Create_routine_priv='Y','CREATE ROUTINE',NULL),
                    IF(gus.Alter_routine_priv='Y','ALTER ROUTINE',NULL),
                    IF(gus.Create_user_priv='Y','CREATE USER',NULL),
                    IF(gus.Event_priv='Y','EVENT',NULL),
                    IF(gus.Trigger_priv='Y','TRIGGER',NULL),
                    IF(gus.Create_tablespace_priv='Y','CREATE TABLESPACE',NULL)
                )
            )
        ),
        " ON *.* TO '",gus.User,"'@'",gus.Host,"' REQUIRE ",
        CASE gus.ssl_type
            WHEN 'ANY' THEN
                "SSL "
            WHEN 'X509' THEN
                "X509 "
            WHEN 'SPECIFIED' THEN
                CONCAT_WS("AND ",
                    IF((LENGTH(gus.ssl_cipher)>0),CONCAT("CIPHER '",CONVERT(gus.ssl_cipher USING utf8),"'"),NULL),
                    IF((LENGTH(gus.x509_issuer)>0),CONCAT("ISSUER '",CONVERT(gus.ssl_cipher USING utf8),"'"),NULL),
                    IF((LENGTH(gus.x509_subject)>0),CONCAT("SUBJECT '",CONVERT(gus.ssl_cipher USING utf8),"'"),NULL)
                )
            ELSE "NONE "
        END,
        "WITH ",
        IF(gus.Grant_priv='Y',"GRANT OPTION ",""),
        "MAX_QUERIES_PER_HOUR ",gus.max_questions,"",
        "MAX_CONNECTIONS_PER_HOUR ",gus.max_connections,"",
        "MAX_UPDATES_PER_HOUR ",gus.max_updates,"",
        "MAX_USER_CONNECTIONS ",gus.max_user_connections,
        ";"
    ) AS 'GRANT Statement (Reconstructed)'
FROM mysql.user gus;

How to create a mysql user

The ‘create user’ procedure is similar to Oracle. The simplest example could be:

CREATE user 'username'@'hostname' identified by 'password';
GRANT privilege_name on *.* TO 'username'@'hostname';

The option to grant and create in one line with:

GRANT privilege_name  ON *.* TO 'username'@'hostname' identified by 'password';

has been removed in MySQL 8.0.

How do I start and stop MySQL?

You can stop and start MySQL with the service.

The actual command depends on the Linux distribution and the service name.

Below you can find an example with the service name mysqld.

Ubuntu

/etc/init.d/mysqld start 
/etc/init.d/mysqld stop 
/etc/init.d/mysqld restart

RedHat/Centos

service mysqld start 
service mysqld stop 
service mysqld restart

systemctl start mysqld.service
systemctl stop mysqld.service
systemctl restart mysqld.service

Where is the MySQL Server Configuration data?

The configuration is stored in the my.cnf file.

Until version 8.0, any dynamic setting change that should remain after a restart required a manual update of the my.cnf file. Similar to Oracle’s scope=both, you can change values using the persistent option.

mysql> SET PERSIST max_connections = 1000;
mysql> SET @@PERSIST.max_connections = 1000;

For older versions use:

mysql> SET GLOBAL max_connections = 1000;
$ vi /etc/mysql/my.cnf
SET GLOBAL max_connections = 1000;

How do I backup MySQL?

There are two ways to execute a mysql backup.

For smaller databases or smaller selective backups, you can use the mysqldump command.

Database backup with mysqldump (logical backup):

mysqldump -uuser -p --databases db_name --routines --events --single-transaction | gzip > db_name_backup.sql.gz

xtrabackup, mariabackup (hot binary backup)

The preferable method is to use xtrabackup or mariabackup, external tools to run hot binary backups.

Oracle offers hot binary backup in the paid version called MySQL Enterprise Edition.

mariabackup --user=root --password=PASSWORD --backup --target-dir=/u01/backups/

Stream backup to other server

Start a listener on the external server on the preferable port (in this example 1984)

nc -l 1984 | pigz -cd - | pv | xbstream -x -C /u01/backups

Run backup and transfer to external host

innobackupex --user=root --password=PASSWORD --stream=xbstream /var/tmp | pigz  | pv | nc external_host.com 1984

Copy user permission

It’s often needed to copy user permission and transfer them to the other servers.

The recommended way to do this is to use pt-show-grants.

pt-show-grants > /u01/backups

How do I restore MySQL?

Logical backup restore

MySQLdump creates the SQL file, which can be executed with the source command.

To keep the log file of the execution, use the tee command.

mysql> tee dump.log
mysql> source mysqldump.sql

Binary backup restore (xtrabackup/mariabackup)

To restore of MySQL from the binary backup you need to first restore the files and then apply the log files.

You can compare this process to restore and recover in Oracle.

xtrabackup --copy-back --target-dir=/var/lib/data
innobackupex --apply-log --use-memory=[values in MB or GB] /var/lib/data

Hopefully, these tips give a good overview of how to perform basic administrative tasks.

Tags:

↧

HA for MySQL and MariaDB - Comparing Master-Master Replication to Galera Cluster

March 12, 2019, 2:03 am

≫ Next: How to Migrate from Oracle DB to MariaDB

≪ Previous: How to Manage MySQL - for Oracle DBAs

Galera replication is relatively new if compared to MySQL replication, which is natively supported since MySQL v3.23. Although MySQL replication is designed for master-slave unidirectional replication, it can be configured as an active master-master setup with bidirectional replication. While it is easy to set up, and some use cases might benefit from this “hack”, there are a number of caveats. On the other hand, Galera cluster is a different type of technology to learn and manage. Is it worth it?

In this blog post, we are going to compare master-master replication to Galera cluster.

Replication Concepts

Before we jump into the comparison, let’s explain the basic concepts behind these two replication mechanisms.

Generally, any modification to the MySQL database generates an event in binary format. This event is transported to the other nodes depending on the replication method chosen - MySQL replication (native) or Galera replication (patched with wsrep API).

MySQL Replication

The following diagrams illustrates the data flow of a successful transaction from one node to another when using MySQL replication:

The binary event is written into the master's binary log. The slave(s) via slave_IO_thread will pull the binary events from master's binary log and replicate them into its relay log. The slave_SQL_thread will then apply the event from the relay log asynchronously. Due to the asynchronous nature of replication, the slave server is not guaranteed to have the data when the master performs the change.

Ideally, MySQL replication will have the slave to be configured as a read-only server by setting read_only=ON or super_read_only=ON. This is a precaution to protect the slave from accidental writes which can lead to data inconsistency or failure during master failover (e.g., errant transactions). However, in a master-master active-active replication setup, read-only has to be disabled on the other master to allow writes to be processed simultaneously. The primary master must be configured to replicate from the secondary master by using the CHANGE MASTER statement to enable circular replication.

Galera Replication

The following diagrams illustrates the data replication flow of a successful transaction from one node to another for Galera Cluster:

The event is encapsulated in a writeset and broadcasted from the originator node to the other nodes in the cluster by using Galera replication. The writeset undergoes certification on every Galera node and if it passes, the applier threads will apply the writeset asynchronously. This means that the slave server will eventually become consistent, after agreement of all participating nodes in global total ordering. It is logically synchronous, but the actual writing and committing to the tablespace happens independently, and thus asynchronously on each node with a guarantee for the change to propagate on all nodes.

Avoiding Primary Key Collision

In order to deploy MySQL replication in master-master setup, one has to adjust the auto increment value to avoid primary key collision for INSERT between two or more replicating masters. This allows the primary key value on masters to interleave each other and prevent the same auto increment number being used twice on either of the node. This behaviour must be configured manually, depending on the number of masters in the replication setup. The value of auto_increment_increment equals to the number of replicating masters and the auto_increment_offset must be unique between them. For example, the following lines should exist inside the corresponding my.cnf:

Master1:

log-slave-updates
auto_increment_increment=2
auto_increment_offset=1

Master2:

log-slave-updates
auto_increment_increment=2
auto_increment_offset=2

Likewise, Galera Cluster uses this same trick to avoid primary key collisions by controlling the auto increment value and offset automatically with wsrep_auto_increment_control variable. If set to 1 (the default), will automatically adjust the auto_increment_increment and auto_increment_offset variables according to the size of the cluster, and when the cluster size changes. This avoids replication conflicts due to auto_increment. In a master-slave environment, this variable can be set to OFF.

The consequence of this configuration is the auto increment value will not be in sequential order, as shown in the following table of a three-node Galera Cluster:

Node	auto_increment_increment	auto_increment_offset	Auto increment value
Node 1	3	1	1, 4, 7, 10, 13, 16...
Node 2	3	2	2, 5, 8, 11, 14, 17...
Node 3	3	3	3, 6, 9, 12, 15, 18...

If an application performs insert operations on the following nodes in the following order:

Node1, Node3, Node2, Node3, Node3, Node1, Node3 ..

Then the primary key value that will be stored in the table will be:

1, 6, 8, 9, 12, 13, 15 ..

Simply said, when using master-master replication (MySQL replication or Galera), your application must be able to tolerate non-sequential auto-increment values in its dataset.

For ClusterControl users, take note that it supports deployment of MySQL master-master replication with a limit of two masters per replication cluster, only for active-passive setup. Therefore, ClusterControl does not deliberately configure the masters with auto_increment_increment and auto_increment_offset variables.

Data Consistency

Galera Cluster comes with its flow-control mechanism, where each node in the cluster must keep up when replicating, or otherwise all other nodes will have to slow down to allow the slowest node to catch up. This basically minimizes the probability of slave lag, although it might still happen but not as significant as in MySQL replication. By default, Galera allows nodes to be at least 16 transactions behind in applying through variable gcs.fc_limit. If you want to do critical reads (a SELECT that must return most up to date information), you probably want to use session variable, wsrep_sync_wait.

Galera Cluster on the other hand comes with a safeguard to data inconsistency whereby a node will get evicted from the cluster if it fails to apply any writeset for whatever reasons. For example, when a Galera node fails to apply writeset due to internal error by the underlying storage engine (MySQL/MariaDB), the node will pull itself out from the cluster with the following error:

150305 16:13:14 [ERROR] WSREP: Failed to apply trx 1 4 times
150305 16:13:14 [ERROR] WSREP: Node consistency compromized, aborting..

To fix the data consistency, the offending node has to be re-synced before it is allowed to join the cluster. This can be done manually or by wiping out the data directory to trigger snapshot state transfer (full syncing from a donor).

MySQL master-master replication does not enforce data consistency protection and a slave is allowed to diverge e.g, replicate a subset of data or lag behind, which makes the slave inconsistent with the master. It is designed to replicate data in one flow - from master down to the slaves. Data consistency checks have to be performed manually or via external tools like Percona Toolkit pt-table-checksum or mysql-replication-check.

Conflict Resolution

Generally, master-master (or multi-master, or bi-directional) replication allows more than one member in the cluster to process writes. With MySQL replication, in case of replication conflict, the slave's SQL thread simply stops applying the next query until the conflict is resolved, either by manually skipping the replication event, fixing the offending rows or resyncing the slave. Simply said, there is no automatic conflict resolution support for MySQL replication.

Galera Cluster provides a better alternative by retrying the offending transaction during replication. By using wsrep_retry_autocommit variable, one can instruct Galera to automatically retry a failed transaction due to cluster-wide conflicts, before returning an error to the client. If set to 0, no retries will be attempted, while a value of 1 (the default) or more specifies the number of retries attempted. This can be useful to assist applications using autocommit to avoid deadlocks.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Node Consensus and Failover

Galera uses Group Communication System (GCS) to check node consensus and availability between cluster members. If a node is unhealthy, it will be automatically evicted from the cluster after gmcast.peer_timeout value, default to 3 seconds. A healthy Galera node in "Synced" state is deemed as a reliable node to serve reads and writes, while others are not. This design greatly simplifies health check procedures from the upper tiers (load balancer or application).

In MySQL replication, a master does not care about its slave(s), while a slave only has consensus with its sole master via the slave_IO_thread process when replicating the binary events from master's binary log. If a master goes down, this will break the replication and an attempt to re-establish the link will be made every slave_net_timeout (default to 60 seconds). From the application or load balancer perspective, the health check procedures for replication slave must at least involve checking the following state:

Seconds_Behind_Master
Slave_IO_Running
Slave_SQL_Running
read_only variable
super_read_only variable (MySQL 5.7.8 and later)

In terms of failover, generally, master-master replication and Galera nodes are equal. They hold the same data set (albeit you can replicate a subset of data in MySQL replication, but that's uncommon for master-master) and share the same role as masters, capable of handling reads and writes simultaneously. Therefore, there is actually no failover from the database point-of-view due to this equilibrium. Only from the application side that would require failover to skip the unoperational nodes. Keep in mind that because MySQL replication is asynchronous, it is possible that not all of the changes done on the master will have propagated to the other master.

Node Provisioning

The process of bringing a node into sync with the cluster before replication starts, is known as provisioning. In MySQL replication, provisioning a new node is a manual process. One has to take a backup of the master and restore it over to the new node before setting up the replication link. For an existing replication node, if the master's binary logs have been rotated (based on expire_logs_days, default to 0 means no automatic removal), you may have to re-provision the node using this procedure. There are also external tools like Percona Toolkit pt-table-sync and ClusterControl to help you out on this. ClusterControl supports resyncing a slave with just two clicks. You have options to resync by taking a backup from the active master or an existing backup.

In Galera, there are two ways of doing this - incremental state transfer (IST) or state snapshot transfer (SST). IST process is the preferred method where only the missing transactions transfer from a donor's cache. SST process is similar to taking a full backup from the donor, it is usually pretty resource intensive. Galera will automatically determine which syncing process to trigger based on the joiner's state. In most cases, if a node fails to join a cluster, simply wipe out the MySQL datadir of the problematic node and start the MySQL service. Galera provisioning process is much simpler, it comes very handy when scaling out your cluster or re-introducing a problematic node back into the cluster.

Loosely Coupled vs Tightly Coupled

MySQL replication works very well even across slower connections, and with connections that are not continuous. It can also be used across different hardware, environment and operating systems. Most storage engines support it, including MyISAM, Aria, MEMORY and ARCHIVE. This loosely coupled setup allows MySQL master-master replication to work well in a mixed environment with less restriction.

Galera nodes are tightly-coupled, where the replication performance is as fast as the slowest node. Galera uses a flow control mechanism to control replication flow among members and eliminate any slave lag. The replication can be all fast or all slow on every node and is adjusted automatically by Galera. Thus, it's recommended to use uniform hardware specs for all Galera nodes, especially with respect to CPU, RAM, disk subsystem, network interface card and network latency between nodes in the cluster.

Conclusions

In summary, Galera Cluster is superior if compared to MySQL master-master replication due to its synchronous replication support with strong consistency, plus more advanced features like automatic membership control, automatic node provisioning and multi-threaded slaves. Ultimately, this depends on how the application interacts with the database server. Some legacy applications built for a standalone database server may not work well on a clustered setup.

To simplify our points above, the following reasons justify when to use MySQL master-master replication:

Things that are not supported by Galera:
- Replication for non-InnoDB/XtraDB tables like MyISAM, Aria, MEMORY or ARCHIVE.
- XA transactions.
- Statement-based replication between masters (e.g, when bandwidth is very expensive).
- Relying on explicit locking like LOCK TABLES statement.
- The general query log and the slow query log must be directed to a table, instead of a file.
Loosely coupled setup where the hardware specs, software version and connection speed are significantly different on every master.
When you already have a MySQL replication chain and you want to add another active/backup master for redundancy to speed up failover and recovery time in case if one of the master is unavailable.
If your application can't be modified to work around Galera Cluster limitations and having a MySQL-aware load balancer like ProxySQL or MaxScale is not an option.

Reasons to pick Galera Cluster over MySQL master-master replication:

Ability to safely write to multiple masters.
Data consistency automatically managed (and guaranteed) across databases.
New database nodes easily introduced and synced.
Failures or inconsistencies automatically detected.
In general, more advanced and robust high availability features.

Tags:

↧

How to Migrate from Oracle DB to MariaDB

March 13, 2019, 4:20 am

≫ Next: How to Migrate from Oracle DB to MariaDB

≪ Previous: HA for MySQL and MariaDB - Comparing Master-Master Replication to Galera Cluster

Wednesday, March 13, 2019 - 12:15

Watch this webinar replay as we walk you through all you need to know to plan and execute a successful migration from Oracle database to MariaDB.

Migrating from Oracle database to MariaDB can come with a number of benefits: lower cost of ownership, access to and use of an open source database engine, tight integration with the web, wide circle of MariaDB database professionals and more.

Find out how it could benefit your organisation!

↧

How to Migrate from Oracle DB to MariaDB

March 13, 2019, 7:11 am

≫ Next: An Introduction to Database High Availability for MySQL & MariaDB

≪ Previous: How to Migrate from Oracle DB to MariaDB

Watch this webinar replay as we walk you through all you need to know to plan and execute a successful migration from Oracle database to MariaDB.

Agenda

A brief introduction to the platform
- Oracle vs MariaDB
- Platform support
- Installation process
- Database access
- Backup process
- Controlling query execution
- Security
- Replication options
Migration
- Planning and development strategy
- Assessment or preliminary check
- Data type mapping
- Migration tools
- Migration process
- Testing
Post-migration
- Monitoring and Alerting
- Performance Management
- Backup Management
- High availability
- Upgrades
- Scaling
- Staff training

Speaker

Tags:

↧

An Introduction to Database High Availability for MySQL & MariaDB

March 14, 2019, 7:42 am

≫ Next: How to Replicate PostgreSQL Data to Remote Sites

≪ Previous: How to Migrate from Oracle DB to MariaDB

The following is an excerpt from our whitepaper “How to Design Highly Available Open Source Database Environments” which can be downloaded for free.

A Couple of Words on “High Availability”

These days high availability is a must for any serious deployment. Long gone are days when you could schedule a downtime of your database for several hours to perform a maintenance. If your services are not available, you are losing customers and money. Therefore making a database environment highly available has typically one of the highest priorities.

This poses a significant challenge to database administrators. First of all, how do you tell if your environment is highly available or not? How would you measure it? What are the steps you need to take in order to improve availability? How to design your setup to make it highly available from the beginning?

There are many many HA solutions available in the MySQL (and MariaDB) ecosystem, but how do we know which ones we can trust? Some solutions might work under certain specific conditions, but might cause more trouble when applied outside of these conditions. Even a basic functionality like MySQL replication, which can be configured in many ways, can cause significant harm - for instance, circular replication with multiple writeable masters. Although it is easy to set up a ‘multi-master setup’ using replication, it can very easily break and leave us with diverging datasets on different servers. For a database, which is often considered the single source of truth, compromised data integrity can have catastrophic consequences.

In the following chapters, we’ll discuss the requirements for high availability in database
setups, and how to design the system from the ground up.

Measuring High Availability

What is high availability? To be able to decide if a given environment is highly available or not, one has to have some metrics for that. There are numerous ways you can measure high availability, we’ll focus on some of the most basic stuff.

First, though, let’s think what this whole high availability is all about? What is its purpose? It is about making sure your environment serves its purpose. Purpose can be defined in many ways but, typically, it will be about delivering some service. In the database world, typically it’s somewhat related to data. It could be serving data to your internal application. It can be to store data and make it queryable by analytical processes. It can be to store some data for your users, and provide it when requested on demand. Once we are clear about the purpose, we can establish the success factors involved. This will help us define what high availability means in our specific case.

SLA’s

Service Level Agreement (SLA). It is also quite common to define SLA’s for internal services. What is an SLA? It is a definition of the service level you plan to provide to your customers. This is for them to better understand what level of stability you plan for a service they bought or are planning to buy. There are numerous methods you can leverage to prepare a SLA but typical ones are:

Availability of the service (percent)
Responsiveness of the service - latency (average, max, 95 percentile, 99 percentile)
Packet loss over the network (percent)
Throughput (average, minimum, 95 percentile, 99 percentile)

It can get more complex than that, though. In a sharded, multi-user environment you can define, let’s say, your SLA as: “Service will be available 99,99% of the time, downtime is declared when more than 2% of the users is affected. No incident can take more than 15 minutes to be resolved”. Such SLA can also be extended to incorporate query response time: “downtime is called if 99 percentile of latency for queries excede 200 milliseconds”.

Nines

Availability is typically measured in “nines”, let us look into what exactly a given amount of “nines” guarantees. The table below is taken from Wikipedia:

Availability %	Downtime per year	Downtime per month	Downtime per week	Downtime per day
90% ("one nine")	36.5 days	72 hours	16.8 hours	2.4 hours
95% ("one and a half nines")	18.25 days	36 hours	8.4 hours	1.2 hours
97%	10.96 days	21.6 hours	5.04 hours	43.2 min
98%	7.30 days	14.4 hours	3.36 hours	28.8 min
99% ("two nines")	3.65 days	7.20 hours	1.68 hours	14.4 min
99.5% ("two and a half nines")	1.83 days	3.60 hours	50.4 min	7.2 min
99.8%	17.52 hours	86.23 min	20.16 min	2.88 min
99.9% ("three nines")	8.76 hours	43.8 min	10.1 min	1.44 min
99.95% ("three and a half nines")	4.38 hours	21.56 min	5.04 min	43.2 s
99.99% ("four nines")	52.56 min	4.38 min	1.01 min	8.64 s
99.995% ("four and a half nines")	26.28 min	2.16 min	30.24 s	4.32 s
99.999% ("five nines")	5.26 min	25.9 s	6.05 s	864.3 ms
99.9999% ("six nines")	31.5 s	2.59 s	604.8 ms	86.4 ms
99.99999% ("seven nines")	3.15 s	262.97 ms	60.48 ms	8.64 ms
99.999999% ("eight nines")	315.569 ms	26.297 ms	6.048 ms	0.864 ms
99.9999999% ("nine nines")	31.5569 ms	2.6297 ms	0.6048 ms	0.0864 ms

As we can see, it escalates quickly. Five nines (99,999% availability) is equivalent to 5.26 minutes of downtime over the course of a year. Availability can also be calculated in different, smaller ranges: per month, per week, per day. Keep in mind those numbers, as they will be useful when we start to discuss the costs associated with maintaining different levels of availability.

Measuring Availability

To tell if there is a downtime or not, one has to have insight into the environment. You need to track the metrics which define the availability of your systems. It is important to keep in mind that you should measure it from a customer’s point of view, taking the broader picture under consideration. It doesn’t matter if your databases are up if, let’s say, due to a network issue, no application cannot reach them. Every single building block of your setup has its impact on availability.

One of the good places where to look for availability data is web server logs. All requests which ended up with errors mean something has happened. It could be HTTP error 500 returned by the application, because the database connection failed. Those could be programmatic errors pointing to some database issues, and which ended up in Apache’s error log. You can also use simple metric as uptime of database servers, although, with more complex SLA’s it might be tricky to determine how the unavailability of one database impacted your user base. No matter what you do, you should use more than one metric - this is needed to capture issues which might have happened on different layers of your environment.

Magic Number: “Three”

Even though high availability is also about redundancy, in case of database clusters, three is a magic number. It is not enough to have two nodes for redundancy - such setup does not provide any built-in high availability. Sure, it might be better than just a single node, but human intervention is required to recover services. Let’s see why it is so.

Let’s assume we have two nodes, A and B. There’s a network link between them. Let us assume that both A and B serves writes and the application randomly picks where to connect (which means that part of the application will connect to node A and the other part will connect to node B). Now, let’s imagine we have a network issue which results in lost network connectivity between A and B.

What now? Neither A nor B can know the state of the other node. There are two actions which can be taken by both nodes:

They can continue accepting traffic
They can cease to operate and refuse to serve any traffic

Let’s think about the first option. As long as the other node is indeed down, this is the preferred action to take - we want our database to continue serving traffic. This is the main idea behind high availability after all. What would happen, though, if both nodes would continue to accept traffic while being disconnected from each other? New data will be added on both sides, and the datasets will get out of sync. When the network issue will be resolved, it will be a daunting task to merge those two datasets. Therefore, it is not acceptable to keep both nodes up and running. The problem is - how can node A tell if node B is alive or not (and vice versa)? The answer is - it cannot. If all connectivity is down, there is no way to distinguish a failed node from a failed network. As a result, the only safe action is for both nodes to cease all operations and refuse to
serve traffic.

Let’s think now how a third node can help us in such a situation.

So we now have three nodes: A, B and C. All are interconnected, all are handling reads and writes.

Again, as in the previous example, node B has been cut off from the rest of the cluster due to network issues. What can happen next? Well, the situation is fairly similar to what we discussed earlier. Two options - node B can either be down (and the rest of the cluster should continue) or it can be up, in which case it shouldn’t be allowed to handle any traffic. Can we now tell what’s the state of the cluster? Actually, yes. We can see that nodes A and C can talk to each other and, as a result, they can agree that node B is not available. They won’t be able to tell why it happened, but what they know is that out of three nodes in the cluster two still have connectivity between each other. Given that those two nodes form a majority of the cluster, it makes possible to continue handling traffic. At the same time node B can also deduct that the problem is on its side. It cannot access neither node A nor node C, making node B separated from the rest of the cluster. As it is isolated and is not part of a majority (1 of 3), the only safe action it can take is to stop serving traffic and refuse to accept any queries, ensuring that data drift won’t happen.

Of course, it doesn’t mean you can have only three nodes in the cluster. If you want better failure tolerance, you may want to add more. Keep in mind, though, it should be an odd number if you want to improve high availability. Also, we were talking about “nodes” in the examples above. Please keep in mind that this is also true for datacenters, availability zones etc. If you have two datacenters, each having the same number of nodes (let’s say three nodes each), and you lose connectivity between those two DC’s, same principles apply here - you cannot tell which half of the cluster should start handling traffic. To be able to tell that, you have to have an observer in a third datacenter. It can be yet another set of nodes, or just a single host, with the task
to observe the state of remaining dataceters and take part in making decisions (an example here would be the Galera arbitrator).

Single Points of Failure

High availability is all about removing single points of failure (SPOF) and not introducing new ones in the process. What are the SPOFs? Any part of your infrastructure which, when failed, brings downtime as defined in SLA, is called a SPOF. Infrastructure design requires a holistic approach, the different components cannot be designed independently of each other. Most likely, you are not responsible for the whole design -
database administrators tend to focus on databases and not, for example, the network layer. Still, you have to keep the other parts in mind and work with the teams which are responsible for them, to make sure that not only the part you are responsible for is designed correctly but also that the remaining bits of the infrastructure were designed using the same principles. On top of that, such knowledge of how the whole
infrastructure is designed, helps you to design the database stack too. Knowing what issues may happen helps to build some mechanisms to prevent them from impacting the availability of the database.

Tags:

MySQL

MariaDB

high availability

↧