Setting up HTTPS on the ClusterControl Server

May 16, 2018, 3:46 am

≫ Next: MongoDB Chain Replication Basics

≪ Previous: Updated: Become a ClusterControl DBA: Safeguarding your Data

As a platform that manages all of your databases, ClusterControl maintains the communication with the backend servers, it sends commands and collect metrics. In order to avoid unauthorized access, it is critical that the communication between your browser and a ClusterControl UI is encrypted. In this blog post we will take a look at how ClusterControl uses HTTPS to improve security.

By default, ClusterControl is configured with HTTPS enabled when you deployed it using the deployment script. All you need to do is to point your browser to: https://cc.node.hostname/clustercontrol and you can enjoy secured connection, as shown on the screenshot below.

We will go through this configuration in details. If you do not have HTTPS configured for ClusterControl, this blog will show you how to change your Apache config to enable secure connections.

Apache configuration - Debian/Ubuntu

When Apache is deployed by ClusterControl, a file: /etc/apache2/sites-enabled/001-s9s-ssl.conf is deployed. Below is the content of that file stripped from any comments:

root@vagrant:~# cat /etc/apache2/sites-enabled/001-s9s-ssl.conf | perl -pe 's/\s*\#.*//' | sed '/^$/d'<IfModule mod_ssl.c>
    <VirtualHost _default_:443>
        ServerName cc.severalnines.local
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html
        RewriteEngine On
        RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
        RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
        RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
        <Directory />
            Options +FollowSymLinks
            AllowOverride All
        </Directory>
        <Directory /var/www/html>
            Options +Indexes +FollowSymLinks +MultiViews
            AllowOverride All
            Require all granted
        </Directory>
        SSLEngine on
SSLCertificateFile /etc/ssl/certs/s9server.crt
SSLCertificateKeyFile /etc/ssl/private/s9server.key
        <FilesMatch "\.(cgi|shtml|phtml|php)$">
                SSLOptions +StdEnvVars
        </FilesMatch>
        <Directory /usr/lib/cgi-bin>
                SSLOptions +StdEnvVars
        </Directory>
        BrowserMatch "MSIE [2-6]" \
                nokeepalive ssl-unclean-shutdown \
                downgrade-1.0 force-response-1.0
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
    </VirtualHost>
</IfModule>

Important and not standard bits are RewriteRule directives which are used for web SSH in the UI. Otherwise, it’s a pretty standard VirtualHost definition. Please mind that you will have to create SSL keys if you would attempt to recreate this configuration by hand. ClusterControl, when being installed, creates them for you.

Also in /etc/apache2/ports.conf a directive for Apache to listen on port 443 has been added:

<IfModule ssl_module>
        Listen 443
</IfModule>

<IfModule mod_gnutls.c>
        Listen 443
</IfModule>

Again, pretty much typical setup.

Apache configuration - Red Hat, Centos

Configuration looks almost the same, it’s just located in a different place:

[root@localhost ~]# cat /etc/httpd/conf.d/ssl.conf | perl -pe 's/\s*\#.*//' | sed '/^$/d'<IfModule mod_ssl.c>
    <VirtualHost _default_:443>
        ServerName cc.severalnines.local
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html
        RewriteEngine On
        RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
        RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
        RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
        <Directory />
            Options +FollowSymLinks
            AllowOverride All
        </Directory>
        <Directory /var/www/html>
            Options +Indexes +FollowSymLinks +MultiViews
            AllowOverride All
            Require all granted
        </Directory>
        SSLEngine on
SSLCertificateFile /etc/pki/tls/certs/s9server.crt
SSLCertificateKeyFile /etc/pki/tls/private/s9server.key
        <FilesMatch "\.(cgi|shtml|phtml|php)$">
                SSLOptions +StdEnvVars
        </FilesMatch>
        <Directory /usr/lib/cgi-bin>
                SSLOptions +StdEnvVars
        </Directory>
        BrowserMatch "MSIE [2-6]" \
                nokeepalive ssl-unclean-shutdown \
                downgrade-1.0 force-response-1.0
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
    </VirtualHost>
</IfModule>

Again, RewriteRule directives are used to enable web SSH console. In addition to this, ClusterControl adds the following lines at the top of /etc/httpd/conf/httpd.conf file:

ServerName 127.0.0.1
Listen 443

This is all that’s needed to have ClusterControl running using HTTPS.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Troubleshooting

In case of issues, here are the steps you can use to identify some of the problems. First of all, if you cannot access ClusterControl over HTTPS, please make sure that Apache listens on port 443. You can check it by using netstat. Below are results for Centos 7 and Ubuntu 16.04:

[root@localhost ~]# netstat -lnp | grep 443
tcp6       0      0 :::443                  :::*                    LISTEN      977/httpd

root@vagrant:~# netstat -lnp | grep 443
tcp6       0      0 :::443                  :::*                    LISTEN      1389/apache2

If Apache does not listen on that port, please review the configuration and check if there’s a “Listen 443” directive added to Apache’s configuration. Please also check if ssl module is enabled. You can check it by running:

root@vagrant:~# apachectl -M | grep ssl
 ssl_module (shared)

If you have “Listen” directive used in “IfModule” section, like below:

<IfModule ssl_module>
        Listen 443
</IfModule>

You have to make sure that it’s in the configuration after modules have been loaded. For example, in Ubuntu 16.04 it’ll be those lines in /etc/apache2/apache2.conf:

# Include module configuration:
IncludeOptional mods-enabled/*.load
IncludeOptional mods-enabled/*.conf

On Centos 7 it’ll be /etc/httpd/conf/httpd.conf file and line:

# Example:
# LoadModule foo_module modules/mod_foo.so
#
Include conf.modules.d/*.conf

Normally, ClusterControl handles that correctly but if you are adding HTTPS support manually, you need to keep this in mind.

As always, please refer to Apache logs for further investigation - if HTTPS is up but for some reason you cannot reach the UI, it is possible that more clues could be found in the logs.

Tags:

clustercontrol

ssl

https

security

database security

↧

MongoDB Chain Replication Basics

May 17, 2018, 2:59 am

≫ Next: Cloud Disaster Recovery for MariaDB and MySQL

≪ Previous: Setting up HTTPS on the ClusterControl Server

What is Chain Replication?

When we talk about replication, we are referring to the process of making redundant copies of data in order to meet design criteria on data availability. Chain replication, therefore, refers to the linear ordering of MongoDB servers to form a synchronized chain. The chain contains a primary node, succeeded by secondary servers arranged linearly. Like the word chain suggest, the server closest to the primary server replicates from it while every other succeeding secondary server replicates from the preceding secondary MongoDB server. This is the main difference between chained replication and normal replication. Chained replication occurs when a secondary node selects its target using ping time or when the closest node is a secondary. Although chain replication as it appears, reduces load on the primary node, it may cause replication lag.

Why Use Chain Replication?

System infrastructures sometimes suffer unpredictable failures leading to loss of a server and therefore affecting availability. Replication ensures that unpredictable failures do not affect availability. Replication further allows recovery from hardware failure and service interruption. Both chained and unchained replication serve this purpose of ensuring availability despite system failures. Having established that replication is important, you may ask why use chain replication in particular. There is no performance difference between chained and unchained replication in MongoDb. In both cases, when the primary node fails, the secondary servers vote for a new acting primary and therefore writing and reading of data is not affected in both cases. Chained replication is however the default replication type in MongoDb.

How to Setup a Chain Replica

By default, chained replication is enabled in MongoDB. We will therefore elaborate on the process of deactivating chain replication. The major reason for which chain replication can be disabled is if it is causing lag. The merit of chain replication is however superior to the lag demerit and therefore in most cases deactivating it is unnecessary. Just in case chain replication is not active by default, the following commands will help you activate.

cfg = rs.config()
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

This process is reversible. When forced to deactivate chain replication, the following process is followed religiously.

cfg = rs.config()
cfg.settings.chainingAllowed = false
rs.reconfig(cfg)

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Tips & Tricks for Chain Replication

The most dreadful limitations of chain replication is replication lag. Replication lag refers to the delay that occurs between the time when an operation is done on the primary and when the same operation is replicated on the secondary. Although it is naturally impossible, it is always desired that the speed of replication to be very high in that replication lag is zero. To avoid or minimize replication lag to be close to zero, it a prudent design criteria to use primary and secondary hosts of the same specs in terms of CPU, RAM, IO and network related specs.

Although chain replication ensures data availability, chain replication can be used together with journaling. Journaling provides data safety by writing to a log that is regularly flushed to disk. When the two are combined three servers are written per write request unlike in chain replication alone where only two servers are written per write request.

Another important tip is using w with replication. The w parameter controls the number of servers that a response should be written to before returning success. When the w parameter is set, the getlasterror checks the servers’ oplog and waits until the given number of ‘w’ servers have the operation applied.

Using a monitoring tool like MongoDB Monitoring Service (MMS) or ClusterControl allows you to obtain the status of your replica nodes and visualize changes over time. For instance, in MMS, you can find replica lag graphs of the secondary nodes showing the variation in replication lag time.

Measuring Chain Replication Performance

By now you are aware that the most important performance parameter of chain replication is the replication lag time. We will therefore discuss how to test for replication lag period. This test can be done through the MongoDb shell script. To do a replication lag test, we compare the oplog of the last event on the primary node and the oplog of last event on the secondary node.

To check the information for the primary node, we run the following code.

db.printSlaveReplicationInfo()

The above command will provide information on all the recent operations on the primary node.The results should appear as below.

rs-ds046297:PRIMARY db.printSlaveReplicationInfo()
source: ds046297-a1.mongolab.com:46297
synced To: Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
      = 7475 secs ago (2.08hrs)
source: ds046297-a2.mongolab.com:46297
synced To: Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
      = 7475 secs ago (2.08hrs)

Having obtained the oplog for the primary, we are now interested in the oplog for the secondary node. The following command will help us obtain the oplog.

db.printReplicationInfo()

This command will provide an output with details on oplog size, log length, time for oplog first event, time for oplog last event and the current time. The results appear as below.

rs-ds046297:PRIMARY db.printReplicationInfo()
configured oplog size:   1024MB
log length start to end: 5589 secs (1.55hrs)
oplog first event time:  Tue Mar 05 2013 06:15:19 GMT-0800 (PST)
oplog last event time:   Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
now:                     Tue Mar 05 2013 09:53:07 GMT-0800 (PST)

From the oplog of the primary server, the last sync occurred on Tue Mar 05 2013 07:48:19 GMT-0800 (PST). From the oplog of the secondary server, the last operation occurred on Tue Mar 05 2013 07:48:19 GMT-0800 (PST). The replication lag was zero and therefore our chain replicated system is in correct operation. Replication time lag may however vary depending on the amount of changes that need to be replicated.

Tags:

MongoDB

chain replication

↧

Cloud Disaster Recovery for MariaDB and MySQL

May 18, 2018, 5:17 am

≫ Next: PostgreSQL Management and Automation with ClusterControl

≪ Previous: MongoDB Chain Replication Basics

MySQL has a long tradition in geographic replication. Distributing clusters to remote data centers reduces the effects of geographic latency by pushing data closer to the user. It also provides a capability for disaster recovery. Due to the significant cost of duplicating hardware in a separate site, not many companies were able to afford it in the past. Another cost is skilled staff who is able to design, implement and maintain a sophisticated multiple data centers environment.

With the Cloud and DevOps automation revolution, having distributed datacenter has never been more accessible to the masses. Cloud providers are increasing the range of services they offer for a better price.One can build cross-cloud, hybrid environments with data spread all over the world. One can make flexible and scalable DR plans to approach a broad range of disruption scenarios. In some cases, that can just be a backup stored offsite. In other cases, it can be a 1 to 1 copy of a production environment running somewhere else.

In this blog we will take a look at some of these cases, and address common scenarios.

Storing Backups in the Cloud

A DR plan is a general term that describes a process to recover disrupted IT systems and other critical assets an organization uses. Backup is the primary method to achieve this. When a backup is in the same data center as your production servers, you risk that all data may be wiped out in case you lose that data center. To avoid that, you should have the policy to create a copy in another physical location. It's still a good practice to keep a backup on disk to reduce the time needed to restore. In most cases, you will keep your primary backup in the same data center (to minimize restore time), but you should also have a backup that can be used to restore business procedures when primary datacenter is down.

ClusterControl: Upload Backup to the cloud

ClusterControl allows seamless integration between your database environment and the cloud. It provides options for migrating data to the cloud. We offer a full combination of database backups for Amazon Web Services (AWS), Google Cloud Services or Microsoft Azure. Backups can now be executed, scheduled, downloaded and restored directly from your cloud provider of choice. This ability provides increased redundancy, better disaster recovery options, and benefits in both performance and cost savings.

ClusterControl: Managing Cloud Credentials

The first step to set up "data center failure - proof backup" is to provide credentials for your cloud operator. You can choose from multiple vendors here. Let's take a look at the process set up for the most popular cloud operator - AWS.

ClusterControl: adding cloud credentials

All you need is the AWS Key ID and the secret for the region where you want to store your backup. You can get that from AWS console. You can follow a few steps to get it.

Use your AWS account email address and password to sign in to the AWS Management Console as the AWS account root user.
On the IAM Dashboard page, choose your account name in the navigation bar, and then select My Security Credentials.
If you see a warning about accessing the security credentials for your AWS account, choose to Continue to Security Credentials.
Expand the Access keys (access key ID and secret access key) section.
Choose to Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you will not be able to retrieve this secret access key again.

ClusterControl: Hybrid cloud backup

When all is set, you can adjust your backup schedule and enable backup to cloud option. To reduce network traffic make sure to enable data compression. It makes backups smaller and minimizes the time needed for upload. Another good practice is to encrypt the backup. ClusterControl creates a key automatically and uses it if you decide to restore it. Advanced backup policies should have different keep times for backups stored on servers in the same datacenter, and the backups stored in another physical location. You should set a more extended retention period for cloud-based backups, and shorter period for backups stored near the production environment, as the probability of restore drops with the backup lifetime.

ClusterControl: backup retention policy

Extend your cluster with asynchronous replication

Galera with asynchronous replication can be an excellent solution to build an active DR node in a remote data center. There are a few good reasons to attach an asynchronous slave to a Galera Cluster. Long-running OLAP type queries on a Galera node might slow down a whole cluster. With delay apply option, delayed replication can save you from human errors so all those golden enters will be not immediately applied to your backup node.

ClusterControl: delayed replication

In ClusterControl, extending a Galera node group with asynchronous replication is done in a single page wizard. You need to provide the necessary information about your future or existing slave server. The slave will be set up from an existing backup, or a freshly streamed XtraBackup from the master to the slave.

Load balancers in multi-datacenter

Load balancers are a crucial component in MySQL and MariaDB database high availability. It’s not enough to have a cluster spanning across multiple data centers. You still need your services to access them. A failure of a load balancer that is available in one data center will make your entire environment unreachable.

Web proxies in cluster environment

One of the popular methods to hide the complexity of the database layer from an application is to use a proxy. Proxies act as an entry point to the databases, they track the state of the database nodes and should always direct traffic to only the nodes that are available. ClusterControl makes it easy to deploy and configure several different load balancing technologies for MySQL and MariaDB, including ProxySQL, HAProxy, with a point-and-click graphical interface.

ClusterControl: load balancer HA

It also allows making this component redundant by adding keepalived on top of it. To prevent your load balancers from being a single point of failure, one would set up two identical (one active and one in different DC as standby) HAProxy, ProxySQL or MariaDB Maxscale instances and use Keepalived to run Virtual Router Redundancy Protocol (VRRP) between them. VRRP provides a Virtual IP address to the active load balancer and transfers the Virtual IP to the standby HAProxy in case of failure. It is seamless because the two proxy instances need no shared state.

Of course, there are many things to consider to make your databases immune to data center failures.
Proper planning and automation will make it work! Happy Clustering!

Tags:

↧

PostgreSQL Management and Automation with ClusterControl

May 17, 2018, 6:02 am

≫ Next: Understanding Deadlocks in MySQL & PostgreSQL

≪ Previous: Cloud Disaster Recovery for MariaDB and MySQL

Introduction

PostgreSQL is an object-relational database management system (ORDBMS) developed at the University of California at Berkeley Computer Science Department.

It supports a large part of the SQL standard and offers many modern features:

complex queries
foreign keys
triggers
updateable views
transactional integrity
multiversion concurrency control

Also, PostgreSQL can be extended by the user in many ways, for example by adding new:

data types
functions
operators
aggregate functions
index methods
procedural languages

And because of the liberal license, PostgreSQL can be used, modified, and distributed free of charge by anyone for any purpose, be it private, commercial, or academic.

These features have consolidated the engine in the top 4 of the most used databases.

Figure 1: PostgreSQL Rank

PostgreSQL offers natively some of the most industry demanded feature, such as master-slave replication, backup and recovery, transactionality, partitioning.

Anyway, there are still some other demanded features that need to be accomplished by external tools, such as sharding, master-master setups, monitoring, load balancing and statistics reporting.

↧

Understanding Deadlocks in MySQL & PostgreSQL

May 21, 2018, 3:16 am

≫ Next: PostgreSQL Management and Automation with ClusterControl - New Whitepaper

≪ Previous: PostgreSQL Management and Automation with ClusterControl

Working with databases, concurrency control is the concept that ensures that database transactions are performed concurrently without violating data integrity.

There is a lot of theory and different approaches around this concept and how to accomplish it, but we will briefly refer to the way that PostgreSQL and MySQL (when using InnoDB) handle it, and a common problem that can arise in highly concurrent systems: deadlocks.

These engines implement concurrency control by using a method called MVCC (Multiversion Concurrency Control). In this method, when an item is being updated, the changes will not overwrite the original data, but instead a new version of the item (with the changes) will be created. Thus we will have several versions of the item stored.

One of the main advantages of this model is that locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

But, if several versions of the same item are stored, which version of it will a transaction see? To answer that question we need to review the concept of transaction isolation. Transactions specify an isolation level, that defines the degree to which one transaction must be isolated from resource or data modifications made by other transactions.This degree is directly related with the locking generated by a transaction, and so, as it can be specified at transaction level, it can determine the impact that a running transaction can have over other running transactions.

This is a very interesting and long topic, although we will not go into too much details in this blog. We’d recommend the PostgreSQL and MySQL official documentation for further reading on this topic.

So, why are we going into the above topics when dealing with deadlocks? Because sql commands will automatically acquire locks to ensure the MVCC behaviour, and the lock type acquired depends on the transaction isolation defined.

There are several types of locks (again another long and interesting topic to review for PostgreSQL and MySQL) but, the important thing about them, is how they interact (most exactly, how they conflict) with each other. Why is that? Because two transactions cannot hold locks of conflicting modes on the same object at the same time. And a non minor detail, once acquired, a lock is normally held till end of transaction.

This is a PostgreSQL example of how locking types conflict with each other:

PostgreSQL Locking types conflict

And for MySQL:

MySQL Locking types conflict

X= exclusive lock IX= intention exclusive lock
S= shared lock IS= intention shared lock

So what happens when I have two running transactions that want to hold conflicting locks on the same object at the same time? One of them will get the lock and the other will have to wait.

So now we are in a position to truly understand what is happening during a deadlock.

What is a deadlock then? As you can imagine, there are several definitions for a database deadlock, but i like the following for its simplicity.

A database deadlock is a situation in which two or more transactions are waiting for one another to give up locks.

So for example, the following situation will lead us to a deadlock:

Deadlock example

Here, the application A gets a lock on table 1 row 1 in order to make an update.

At the same time application B gets a lock on table 2 row 2.

Now application A needs to get a lock on table 2 row 2, in order to continue the execution and finish the transaction, but it cannot get the lock because it is held by application B. Application A needs to wait for application B to release it.

But application B needs to get a lock on table 1 row 1, in order to continue the execution and finish the transaction, but it cannot get the lock because it is held by application A.

So here we are in a deadlock situation. Application A is waiting for the resource held by application B in order to finish and application B is waiting for the resource held by application A. So, how to continue? The database engine will detect the deadlock and kill one of the transactions, unblocking the other one and raising a deadlock error on the killed one.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Let's check some PostgreSQL and MySQL deadlock examples:

PostgreSQL

Suppose we have a test database with information from the countries of the world.

world=# SELECT code,region,population FROM country WHERE code IN ('NLD','AUS');
code |          region           | population
------+---------------------------+------------
NLD  | Western Europe            |   15864000
AUS  | Australia and New Zealand |   18886000
(2 rows)

We have two sessions that want to make changes to the database.

The first session will modify the region field for the NLD code, and the population field for the AUS code.

The second session will modify the region field for the AUS code, and the population field for the NLD code.

Table data:

code: NLD
region: Western Europe
population: 15864000

code: AUS
region: Australia and New Zealand
population: 18886000

Session 1:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Europe' WHERE code='NLD';
UPDATE 1

Session 2:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Oceania' WHERE code='AUS';
UPDATE 1
world=# UPDATE country SET population=15864001 WHERE code='NLD';

Session 2 will hang waiting for Session 1 to finish.

Session 1:

world=# UPDATE country SET population=18886001 WHERE code='AUS';

ERROR:  deadlock detected
DETAIL:  Process 1181 waits for ShareLock on transaction 579; blocked by process 1148.
Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (0,15) in relation "country"

Here we have our deadlock. The system detected the deadlock and killed session 1.

Session 2:

world=# BEGIN;
BEGIN
world=# UPDATE country SET region='Oceania' WHERE code='AUS';
UPDATE 1
world=# UPDATE country SET population=15864001 WHERE code='NLD';
UPDATE 1

And we can check that the second session finished correctly after the deadlock was detected and the Session 1 was killed (thus, the lock was released).

To have more details we can see the log in our PostgreSQL server:

2018-05-16 12:56:38.520 -03 [1181] ERROR:  deadlock detected
2018-05-16 12:56:38.520 -03 [1181] DETAIL:  Process 1181 waits for ShareLock on transaction 579; blocked by process 1148.
       Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
       Process 1181: UPDATE country SET population=18886001 WHERE code='AUS';
       Process 1148: UPDATE country SET population=15864001 WHERE code='NLD';
2018-05-16 12:56:38.520 -03 [1181] HINT:  See server log for query details.
2018-05-16 12:56:38.520 -03 [1181] CONTEXT:  while updating tuple (0,15) in relation "country"
2018-05-16 12:56:38.520 -03 [1181] STATEMENT:  UPDATE country SET population=18886001 WHERE code='AUS';
2018-05-16 12:59:50.568 -03 [1181] ERROR:  current transaction is aborted, commands ignored until end of transaction block

Here we will be able to see the actual commands that were detected on deadlock.

MySQL

To simulate a deadlock in MySQL we can do the following.

As with PostgreSQL, suppose we have a test database with information on actors and movies among other things.

mysql> SELECT first_name,last_name FROM actor WHERE actor_id IN (1,7);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| PENELOPE   | GUINESS   |
| GRACE      | MOSTEL    |
+------------+-----------+
2 rows in set (0.00 sec)

We have two processes that want to make changes to the database.

The first process will modify the field first_name for actor_id 1, and the field last_name for actor_id 7.

The second process will modify the field first_name for actor_id 7, and the field last_name for actor_id 1.

Table data:

actor_id: 1
first_name: PENELOPE
last_name: GUINESS

actor_id: 7
first_name: GRACE
last_name: MOSTEL

Session 1:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE actor SET first_name='GUINESS' WHERE actor_id='1';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Session 2:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE actor SET first_name='MOSTEL' WHERE actor_id='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1';

Session 2 will hang waiting for Session 1 to finish.

Session 1:

mysql> UPDATE actor SET last_name='GRACE' WHERE actor_id='7';

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Here we have our deadlock. The system detected the deadlock and killed session 1.

Session 2:

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE actor SET first_name='MOSTEL' WHERE actor_id='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1';
Query OK, 1 row affected (8.52 sec)
Rows matched: 1  Changed: 1  Warnings: 0

As we can see in the error, as we saw for PostgreSQL, there is a deadlock between both processes.

For more details we can use the command SHOW ENGINE INNODB STATUS\G:

mysql> SHOW ENGINE INNODB STATUS\G
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-05-16 18:55:46 0x7f4c34128700
*** (1) TRANSACTION:
TRANSACTION 1456, ACTIVE 33 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 54, OS thread handle 139965388506880, query id 15876 localhost root updating
UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1456 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 0000000005af; asc       ;;
2: len 7; hex 2d000001690110; asc -   i  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afca8b3; asc Z   ;;

*** (2) TRANSACTION:
TRANSACTION 1455, ACTIVE 47 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 53, OS thread handle 139965267871488, query id 16013 localhost root updating
UPDATE actor SET last_name='GRACE' WHERE actor_id='7'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1455 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 0000000005af; asc       ;;
2: len 7; hex 2d000001690110; asc -   i  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afca8b3; asc Z   ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1455 lock_mode X locks rec but not gap waiting
Record lock, heap no 202 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0007; asc   ;;
1: len 6; hex 0000000005b0; asc       ;;
2: len 7; hex 2e0000016a0110; asc .   j  ;;
3: len 6; hex 4d4f5354454c; asc MOSTEL;;
4: len 6; hex 4d4f5354454c; asc MOSTEL;;
5: len 4; hex 5afca8c1; asc Z   ;;

*** WE ROLL BACK TRANSACTION (2)

Under the title "LATEST DETECTED DEADLOCK", we can see details of our deadlock.

To see the detail of the deadlock in the mysql error log, we must enable the option innodb_print_all_deadlocks in our database.

mysql> set global innodb_print_all_deadlocks=1;
Query OK, 0 rows affected (0.00 sec)

MySQL Log Error:

2018-05-17T18:36:58.341835Z 12 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2018-05-17T18:36:58.341869Z 12 [Note] InnoDB:
*** (1) TRANSACTION:
 
TRANSACTION 1812, ACTIVE 42 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140515492943616, query id 8467 localhost root updating
UPDATE actor SET last_name='PENELOPE' WHERE actor_id='1'
2018-05-17T18:36:58.341945Z 12 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1812 lock_mode X locks rec but not gap waiting
Record lock, heap no 204 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 000000000713; asc       ;;
2: len 7; hex 330000016b0110; asc 3   k  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afdcb89; asc Z   ;;
 
2018-05-17T18:36:58.342347Z 12 [Note] InnoDB: *** (2) TRANSACTION:
 
TRANSACTION 1811, ACTIVE 65 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 12, OS thread handle 140515492677376, query id 9075 localhost root updating
UPDATE actor SET last_name='GRACE' WHERE actor_id='7'
2018-05-17T18:36:58.342409Z 12 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1811 lock_mode X locks rec but not gap
Record lock, heap no 204 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0001; asc   ;;
1: len 6; hex 000000000713; asc       ;;
2: len 7; hex 330000016b0110; asc 3   k  ;;
3: len 7; hex 4755494e455353; asc GUINESS;;
4: len 7; hex 4755494e455353; asc GUINESS;;
5: len 4; hex 5afdcb89; asc Z   ;;
 
2018-05-17T18:36:58.342793Z 12 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 23 page no 3 n bits 272 index PRIMARY of table `sakila`.`actor` trx id 1811 lock_mode X locks rec but not gap waiting
Record lock, heap no 205 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 2; hex 0007; asc   ;;
1: len 6; hex 000000000714; asc       ;;
2: len 7; hex 340000016c0110; asc 4   l  ;;
3: len 6; hex 4d4f5354454c; asc MOSTEL;;
4: len 6; hex 4d4f5354454c; asc MOSTEL;;
5: len 4; hex 5afdcba0; asc Z   ;;
 
2018-05-17T18:36:58.343105Z 12 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

Taking into account what we have learned above about why deadlocks happen, you can see that there is not much we can do on the database side to avoid them. Anyway, as DBAs it is our duty to actually catch them, analyze them, and provide feedback to the developers.

The reality is that these errors are particular to each application, so you will need to check them one by one and there is not guide to tell you how to troubleshoot this. Keeping this in mind, there are some things you can look for.

Search for long running transactions. As the locks are usually held until the end of a transaction, the longer the transaction , the longer the locks over the resources. If it is possible, try to split long running transactions into smaller/faster ones.

Sometimes it is not possible to actually split the transactions, so the work should focus on trying to execute those operations in a consistent order each time, so transactions form well-defined queues and do not deadlock.

One workaround that you can also propose is to add retry logic into the application (of course, try to solve the underlying issue first) in a way that, if a deadlock happens, the application will to run the same commands again.

Check the isolation levels used, sometimes you try by changing them. Look for commands like SELECT FOR UPDATE, and SELECT FOR SHARE, as they generate explicit locks, and evaluate if they are really needed or you can work with an older snapshot of the data. One thing you can try if you cannot remove these commands is using a lower isolation level such as READ COMMITTED.

Of course, always add well-chosen indexes to your tables. Then your queries need scan fewer index records and consequently set fewer locks.

On a higher level, as a DBA you can take some precautions to minimize locking in general. For naming one example, in this case for PostgreSQL, you can avoid adding a default value in the same command that you will add a column. Altering a table will get a really aggressive lock, and setting a default value for it will actually update the existing rows that have null values, making this operation take really long. So if you split this operation into several commands, adding the column, adding the default, updating the null values, you will minimize the locking impact.

Of course there are tons of tips like this that the DBAs get with the practice (creating indexes concurrently, create the pk index separately before adding the pk,and so on), but the important thing is to learn and understand this "way of thinking" and always to minimize the lock impact of the operations we are doing.

Tags:

↧

PostgreSQL Management and Automation with ClusterControl - New Whitepaper

May 22, 2018, 2:59 am

≫ Next: Top PostgreSQL Security Threats

≪ Previous: Understanding Deadlocks in MySQL & PostgreSQL

We’re happy to announce that our new whitepaper PostgreSQL Management and Automation with ClusterControl is now available to download for free!

This whitepaper provides an overview of what it takes to configure and manage a PostgreSQL production environment and shows how ClusterControl provides PostgreSQL automation in a centralized and user-friendly way.

Topics included in this whitepaper are…

Introduction to PostgreSQL
Backup & Recovery
HA Setups (Master/Slave & Master/Master)
Load Balancing & Connection Pooling
Monitoring
Automation with ClusterControl
ChatOps via CCBot

You have many options for managing your PostgreSQL databases but ClusterControl lets you deploy, manage, monitor and scale; providing you full control of your databases.

If you are currently running PostgreSQL or considering migrating your database to PostgreSQL this whitepaper will help you get started and ensure your databases are optimized and operating at peak performance. Download the whitepaper today!

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

ClusterControl for PostgreSQL

PostgreSQL is considered by many to be the world’s most advanced relational database system and ClusterControl supports its deployment, management, monitoring and scaling. Each deployed PostgreSQL instance is automatically configured using our easy to use point-and-click interface. You can manage backups, run queries, and perform advanced monitoring of all the master and slaves; all with automated failover if something goes wrong.

The automation tools inside ClusterControl let you easily setup a PostgreSQL replication environment, where you can add new replication slaves from scratch or use ones that are already configured. It also allows you to promote masters and rebuild slaves.

ClusterControl provides the following features to drive automation and performance...

Deployment - Deploy the latests PostgreSQL versions using proven methodologies you can count on to work
Management - Automated failover & recovery, Backup, Restore and Verification, Advanced security, Topology management, and a Developer Studio for advanced orchestration
Monitoring - Unified view across data centers with ability to drill down into individual nodes, Full stack monitoring, from load balancers to database instances down to underlying hosts, Query Monitoring, and Database Advisors
Scaling - Streaming Replication architectures and the ability to easily add and configure the most popular load balancing technologies.

To learn more about ClusterControl click here.

Tags:

PostgreSQL

clustercontrol

database automation

↧

≫ Next: PostgreSQL Audit Logging Best Practices

≪ Previous: Top PostgreSQL Security Threats

PostgreSQL is one of the most secure databases in the world. Database security plays an imperative role in the real-world mission critical environments. It is important to ensure databases and the data is always secured and is not subjected to un-authorized access thereby compromising the data security. Whilst PostgreSQL provides various mechanisms and methods for users to access the database in a secured manner, it can also be integrated with various external authentication systems to ensure enterprise standard database security requirements are met.

Apart from providing secured authentication mechanisms via SSL, MD5, pgpass and pg_ident etc., PostgreSQL can be integrated with various other popular enterprise grade external authentication systems. My focus in this blog will be on LDAP, Kerberos and RADIUS with SSL and pg_ident.

LDAP

LDAP refers to Lightweight Directory Access Protocol which is a popularly used centralized authentication system. It is a datastore which stores the user credentials and various other user related details like Names, Domains, Business Units etc. in the form of a hierarchy in a table format. The end users connecting to the target systems (E.g., a database) must first connect to LDAP server to get through a successful authentication. LDAP is one of the popular authentication systems currently used across organizations demanding high security standards.

LDAP + PostgreSQL

PostgreSQL can be integrated with LDAP. In my customer consulting experience, this is considered one of the key capabilities of PostgreSQL. As the authentication of the username and password takes place at the LDAP server, to ensure users can connect to the database via LDAP, the user account must exist in the database. In other words, this means the users when attempting to connect to PostgreSQL are routed to the LDAP server first and then to the Postgres database upon successful authentication. Configuration can be made in the pg_hba.conf file to ensure connections are routed to the LDAP server. Below is a sample pg_hba.conf entry -

host    all    pguser   0.0.0.0/0    ldap ldapserver=ldapserver.example.com ldapprefix="cn=" ldapsuffix=", dc=example, dc=com"

Below is an example of an LDAP entry in pg_hba.conf:

host    all    pguser   0.0.0.0/0    ldap ldapserver=ldapserver.example.com ldapprefix="cn=" ldapsuffix=", ou=finance, dc=example, dc=com"

When using non-default ldap port and TLS:

ldap ldapserver=ldapserver.example.com ldaptls=1 ldapport=5128 ldapprefix="uid=" ldapsuffix=",ou=finance,dc=apix,dc=com"

Understanding the above LDAP entry

LDAP uses various attributes and terminologies to store / search for a user entry in its datastore. Also, as mentioned above, user entries are stored in hierarchy.
The above pg_hba.conf ldap entries consists of attributes called CN (Common Name), OU (Organization Unit) and DC (Domain Component) which, are termed as Relative Distinguished Names (RDN), these sequence of RDN together become something called DN (Distinguished Name). DN is the LDAP object based which, the search is performed in the LDAP data store.
LDAP attribute values like CN, DC, OU etc. are defined in LDAP’s Object Classes, which can be provided by the systems experts who built the LDAP environment.

Will that make LDAP secured enough?

Maybe not. Passwords communicated over the network in an LDAP environment are not encrypted, which can be a security risk as the encrypted passwords can be hacked. There are options to make the credentials communication more secure.

Consider configuring LDAP on TLS (Transport Layer Security)
LDAP can be configured with SSL which is another option

Tips to achieve LDAP integration with PostgreSQL

(for Linux based systems)

Install appropriate openLDAP modules based on operating system version
Ensure PostgreSQL software is installed with LDAP libraries
Ensure LDAP is integrated well with Active Directory
Familiarize with any existing BUGs in the openLDAP modules being used. This can be catastrophic and can compromise security standards.
Windows Active Directory can also be integrated with LDAP
Consider configuring LDAP with SSL which is more secure. Install appropriate openSSL modules and be aware of BUGs like heart-bleed which can expose the credentials transmitted over the network.

Kerberos

Kerberos is an industry-standard centralized authentication system popularly used in organizations and provides encryption-based authentication mechanism. The passwords are authenticated by a third-party authentication server termed as KDC (Key Distribution Centre). The passwords can be encrypted based on various algorithms and can only be decrypted with the help of shared private keys. This also means, passwords communicated over the network are encrypted.

PostgreSQL + Kerberos

PostgreSQL supports GSSAPI based authentication with Kerberos. The users attempting to connect to the Postgres database, will be routed to KDC server for authentication. This authentication between clients and KDC database is performed based on shared private keys and upon successful authentication, the clients would now hold Kerberos based credentials. The same credentials are subjected to validation between the Postgres server and the KDC which will be done based on the keytab file generated by Kerberos. This keytab file must exist on the database server with appropriate permissions to the user owning the Postgres process.

The Kerberos configuration and connection process -

Kerberos based user accounts must generate a ticket ( a connection request ) using “kinit” command.
A keytab file must be generated using “kadmin” command for a fully qualified Kerberos based user account (principal) and then Postgres would use the same keytab file to validate the credentials. Principals can be encrypted and added to existing keytab file using “ktadd” command. Kerberos encryption supports various industry standard encryption algorithms.
The generated keytab file must be copied across to the Postgres server, it must be readable by the Postgres process. The below postgresql.conf parameter must be configured:
```
krb_server_keyfile = '/database/postgres/keytab.example.com'
```
If you are particular about case-sensitivity, then, use the below parameter
```
krb_caseins_users which is by default “off”  (case sensitive)
```

An entry must be made in the pg_hba.conf to ensure connections are routed to KDC server

Example pg_hba.conf entry

# TYPE DATABASE       USER    CIDR-ADDRESS            METHOD
host     all                     all         192.168.1.6/32            gss include_realm=1 krb_realm=EXAMPLE.COM

Example pg_hba.conf entry with map entry

# TYPE DATABASE       USER    CIDR-ADDRESS            METHOD
host     all                     all         192.168.1.6/32            gss include_realm=1 krb_realm=EXAMPLE.COM map=krb

A user account attempting to connect must be added to the KDC database which is termed as principal and the same user account or a mapping user account must exist in the database as well
Below is an example of a Kerberos principal
pguser@example.com
pguser is the username and the “example.com” is the realm name configured in the Kerberos config (/etc/krb5.conf) in the KDC server.
In the kerberos world, principals are in an email like format (username@realmname) and the database users cannot be created in the same format. This makes DBAs think of creating a mapping of database user names instead and ensure principals connect with mapped names using pg_ident.conf.
Below is an example of a map name entry in pg_ident.conf
```
# MAPNAME           SYSTEM-USERNAME               GP-USERNAME
   mapuser               /^(.*)EXAMPLE\.DOMAIN$      admin
```

Will that make Kerberos Secured enough ?

Maybe not. User credentials communicated over the network can be exposed, hacked. Though Kerberos encrypts the principals, they can be stolen, hacked. This brings in the need for implementing network layer security. Yes, SSL or TLS is the way to go. Kerberos authentication system can be integrated with SSL or TLS. TLS is the successor of SSL. It is recommended to have Kerberos configured with SSL or TLS so that the communication over the network is secured.

TIPS

Ensure krb* libraries are installed
OpenSSL libraries must be installed to configure SSL

Ensure Postgres is installed with the following options

./configure --with-gssapi --with-krb-srvnam --with-openssl

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

RADIUS

RADIUS is a remote authentication service network protocol which provides centralized

Authentication, Authorization and Accounting (AAA). Username / password pairs are authenticated at the RADIUS server. This way of centralized authentication is much straight orward and simpler compared to other authentication systems like LDAP and Kerberos which involves a bit of complexity.

RADIUS + PostgreSQL

PostgreSQL can be integrated with RADIUS authentication mechanism. Accounting is not supported in Postgres yet. This requires database user accounts to exist in the database. Connections to the database are authorized based on the shared secret termed as “radiussecret”.

An entry in the pg_hba.conf config is essential to route the connections to radius server for authentication.

Example pg_hba.conf entry

hostssl             all        all        0.0.0.0/0         radius  radiusserver=127.0.0.1 radiussecret=secretr radiusport=3128

To understand the above entry -

“radiusserver” is the host IP address of the RADIUS server where users are routed for authentication. This parameter is configured in the /etc/radiusd.conf in the RADIUS server.

“radiussecret” value is extracted from clients.conf. This is the secret code which uniquely identifies the radius client connection.

“radiusport” can be found in /etc/radiusd.conf file. This is port on which radius connections will be listening.

Importance of SSL

SSL (Secure Socket Layer) plays an imperative role with external authentication systems in place. It is highly recommended to configure SSL with an external authentication system as there will be communication of sensitive information between clients and the servers over the network and SSL can further tighten the security.

Performance Impact of using external authentication systems

An effective and efficient security system comes at the expense of performance. As the clients/users attempting to connect to the database are routed to authentication systems to establish connection, there can be performance degradation. There are ways to overcome performance hurdles.

With external authentication mechanism in place, there could be a delay when establishing a connection to the database. This could be a real concern when there are huge number of connections being established to the database.
Developerss need to ensure that anunnecessary high number of connections are not made to the database. Multiple application requests being served via one connection would be advantageous.
Also, how long each request is taking at the database end plays an important role. If the request takes longer to complete, then subsequent requests would queue up. Performance tuning of the processes and meticulously architecting the infrastructure will be key!
Database and infrastructure must be efficiently architected and adequately capacitated to ensure good performance.
When doing performance benchmarking, ensure SSL is enabled and the average connection establishment time must then be evaluated.

Integrating external authentication systems with ClusterControl - PostgreSQL

PostgreSQL instances can be built and configured automatically via ClusterControl GUI. Integrating external authentication systems with PostgreSQL Instances deployed via ClusterControl is pretty much similar compared to integration with traditional PostgreSQL instances and in-fact is a bit simpler. Below is an overview of the same -

ClusterControl installs PostgreSQL libraries enabled with LDAP, KRB, GSSAPI and OpenSSL capabilities
Integration with external authentication systems requires various parameter configuration changes on the postgresql database server which can be done using ClusterControl GUI.

Tags:

PostgreSQL

authentication system

↧

PostgreSQL Audit Logging Best Practices

May 25, 2018, 2:58 am

≫ Next: PostgreSQL Streaming Replication - a Deep Dive

≪ Previous: Integrating PostgreSQL with Authentication Systems

In every IT system where important business tasks take place, it is important to have an explicit set of policies and practices, and to make sure those are respected and followed.

Introduction to Auditing

An Information Technology system audit is the examination of the policies, processes, procedures, and practices of an organization regarding IT infrastructure against a certain set of objectives. An IT audit may be of two generic types:

Checking against a set of standards on a limited subset of data
Checking the whole system

An IT audit may cover certain critical system parts, such as the ones related to financial data in order to support a specific set of regulations (e.g. SOX), or the entire security infrastructure against regulations such as the new EU GDPR regulation which addresses the need for protecting privacy and sets the guidelines for personal data management. The SOX example is of the former type described above whereas GDPR is of the latter.

The Audit Lifecycle

Planning

The scope of an audit is dependent on the audit objective. The scope may cover a special application identified by a specific business activity, such as a financial activity, or the whole IT infrastructure covering system security, data security and so forth. The scope must be correctly identified beforehand as an early step in the initial planning phase. The organization is supposed to provide to the auditor all the necessary background information to help with planning the audit. This may be the functional/technical specifications, system architecture diagrams or any other information requested.

Control Objectives

Based on the scope, the auditor forms a set of control objectives to be tested by the audit. Those control objectives are implemented via management practices that are supposed to be in place in order to achieve control to the extent described by the scope. The control objectives are associated with test plans and those together constitute the audit program. Based on the audit program the organization under audit allocates resources to facilitate the auditor.

Findings

The auditor tries to get evidence that all control objectives are met. If for some control objective there is no such evidence, first the auditor tries to see if there is some alternative way that the company handles the specific control objective, and in case such a way exists then this control objective is marked as compensating and the auditor considers that the objective is met. If however there is no evidence at all that an objective is met, then this is marked as a finding. Each finding consists of the condition, criteria, cause, effect and recommendation. The IT manager must be in close contact with the auditor in order to be informed of all potential findings and make sure that all requested information are shared between the management and the auditor in order to assure that the control objective is met (and thus avoid the finding).

The Assessment Report

At the end of the audit process the auditor will write an assessment report as a summary covering all important parts of the audit, including any potential findings followed by a statement on whether the objective is adequately addressed and recommendations for eliminating the impact of the findings.

What is Audit Logging and Why Should You Do It?

The auditor wants to have full access to the changes on software, data and the security system. He/she not only wants to be able to track down any change to the business data, but also track changes to the organizational chart, the security policy, the definition of roles/groups and changes to role/group membership. The most common way to perform an audit is via logging. Although it was possible in the past to pass an IT audit without log files, today it is the preferred (if not the only) way.

Typically the average IT system comprises of at least two layers:

Database
Application (possibly on top of an application server)

The application maintains its own logs covering user access and actions, and the database and possibly the application server systems maintain their own logs. Clean, readily usable information in log files which has real business value from the auditor perspective is called an audit trail. Audit trails differ from ordinary log files (sometimes called native logs) in that:

Log files are dispensable
Audit trails should be kept for longer periods
Log files add overhead to the system’s resources
Log files’ purpose is to help the system admin
Audit trails’ purpose is to help the auditor

We summarise the above in the following table:

Log type	App/System	Audit Trail friendly
App logs	App	Yes
App server logs	System	No
Database logs	System	No

App logs may be easily tailored to be used as audit trails. System logs not so easily because:

They are limited in their format by the system software
They act globally on the whole system
They don’t have direct knowledge about specific business context
They usually require additional software for later offline parsing/processing in order to produce usable audit-friendly audit trails.

However on the other hand App logs place an additional software layer on top of the actual data, thus:

Making the audit system more vulnerable to application bugs/misconfiguration
Creating a potential hole in the logging process if someone tries to access data directly on the database bypassing the app logging system, such as a privileged user or a DBA
Making the audit system more complex and harder to manage and maintain in case we have many applications or many software teams.

So, ideally we would be looking for the best of the two: Having usable audit trails with the greatest coverage on the whole system including database layer, and configurable in one place, so that the logging itself can be easily audited by means of other (system) logs.

Audit Logging with PostgreSQL

The options we have in PostgreSQL regarding audit logging are the following:

By using exhaustive logging ( log_statement = all )
By writing a custom trigger solution
By using standard PostgreSQL tools provided by the community, such as
- audit-trigger 91plus (https://github.com/2ndQuadrant/audit-trigger)
- pgaudit extension (https://github.com/pgaudit/pgaudit)

Exhaustive logging at least for standard usage in OLTP or OLAP workloads should be avoided because:

Produces huge files, increases load
Does not have inner knowledge of tables being accessed or modified, just prints the statement which might be a DO block with a cryptic concatenated statement
Needs additional software/resources for offline parsing and processing (in order to produce the audit trails) which in turn must be included in the scope of the audit, to be considered trustworthy

In the rest of this article we will try the tools provided by the community. Let’s suppose that we have this simple table that we want to audit:

myshop=# \d orders
                                       Table "public.orders"
   Column   |           Type           | Collation | Nullable |              Default               
------------+--------------------------+-----------+----------+------------------------------------
 id         | integer                  |           | not null | nextval('orders_id_seq'::regclass)
 customerid | integer                  |           | not null |
 customer   | text                     |           | not null |
 xtime      | timestamp with time zone   |           | not null | now()
 productid  | integer                  |           | not null |
 product    | text                     |           | not null |
 quantity   | integer                  |           | not null |
 unit_price | double precision         |           | not null |
 cur        | character varying(20)    |           | not null | 'EUR'::character varying
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)

audit-trigger 91plus

The docs about using the trigger can be found here: https://wiki.postgresql.org/wiki/Audit_trigger_91plus. First we download and install the provided DDL (functions, schema):

$ wget https://raw.githubusercontent.com/2ndQuadrant/audit-trigger/master/audit.sql
$ psql myshop
psql (10.3 (Debian 10.3-1.pgdg80+1))
Type "help" for help.
myshop=# \i audit.sql

Then we define the triggers for our table orders using the basic usage:

myshop=# SELECT audit.audit_table('orders');

This will create two triggers on table orders: a insert_update_delere row trigger and a truncate statement trigger. Now let’s see what the trigger does:

myshop=# insert into orders (customer,customerid,product,productid,unit_price,quantity) VALUES('magicbattler',1,'some fn skin 2',2,5,2);      
INSERT 0 1
myshop=# update orders set quantity=3 where id=2;
UPDATE 1
myshop=# delete from orders  where id=2;
DELETE 1
myshop=# select table_name, action, session_user_name, action_tstamp_clk, row_data, changed_fields from audit.logged_actions;
-[ RECORD 1 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
table_name        | orders
action            | I
session_user_name | postgres
action_tstamp_clk | 2018-05-20 00:15:10.887268+03
row_data          | "id"=>"2", "cur"=>"EUR", "xtime"=>"2018-05-20 00:15:10.883801+03", "product"=>"some fn skin 2", "customer"=>"magicbattler", "quantity"=>"2", "productid"=>"2", "customerid"=>"1", "unit_price"=>"5"
changed_fields    |
-[ RECORD 2 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
table_name        | orders
action            | U
session_user_name | postgres
action_tstamp_clk | 2018-05-20 00:16:12.829065+03
row_data          | "id"=>"2", "cur"=>"EUR", "xtime"=>"2018-05-20 00:15:10.883801+03", "product"=>"some fn skin 2", "customer"=>"magicbattler", "quantity"=>"2", "productid"=>"2", "customerid"=>"1", "unit_price"=>"5"
changed_fields    | "quantity"=>"3"
-[ RECORD 3 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
table_name        | orders
action            | D
session_user_name | postgres
action_tstamp_clk | 2018-05-20 00:16:24.944117+03
row_data          | "id"=>"2", "cur"=>"EUR", "xtime"=>"2018-05-20 00:15:10.883801+03", "product"=>"some fn skin 2", "customer"=>"magicbattler", "quantity"=>"3", "productid"=>"2", "customerid"=>"1", "unit_price"=>"5"
changed_fields    |

Note the changed_fields value on the Update (RECORD 2). There are more advanced uses of the audit trigger, like excluding columns, or using the WHEN clause as shown in the doc. The audit trigger sure seems to do the job of creating useful audit trails inside the audit.logged_actions table. However there are some caveats:

No SELECTs (triggers do not fire on SELECTs) or DDL are tracked
Changes by table owners and super users can be easily tampered
Best practices must be followed regarding the app user(s) and app schema and tables owners

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Pgaudit

Pgaudit is the newest addition to PostgreSQL as far as auditing is concerned. Pgaudit must be installed as an extension, as shown in the project’s github page: https://github.com/pgaudit/pgaudit. Pgaudit logs in the standard PostgreSQL log. Pgaudit works by registering itself upon module load and providing hooks for the executorStart, executorCheckPerms, processUtility and object_access. Therefore pgaudit (in contrast to trigger-based solutions such as audit-trigger discussed in the previous paragraphs) supports READs (SELECT, COPY). Generally with pgaudit we can have two modes of operation or use them combined:

SESSION audit logging
OBJECT audit logging

Session audit logging supports most DML, DDL, privilege and misc commands via classes:

READ (select, copy from)
WRITE (insert, update, delete, truncate, copy to)
FUNCTION (function calls and DO blocks)
ROLE (grant, revoke, create/alter/drop role)
DDL (all DDL except those in ROLE)
MISC (discard, fetch, checkpoint, vacuum)

Metaclass “all” includes all classes. - excludes a class. For instance let us configure Session audit logging for all except MISC, with the following GUC parameters in postgresql.conf:

pgaudit.log_catalog = off
pgaudit.log = 'all, -misc'
pgaudit.log_relation = 'on'
pgaudit.log_parameter = 'on'

By giving the following commands (the same as in the trigger example)

myshop=# insert into orders (customer,customerid,product,productid,unit_price,quantity) VALUES('magicbattler',1,'some fn skin 2',2,5,2);
INSERT 0 1
myshop=# update orders set quantity=3 where id=2;
UPDATE 1
myshop=# delete from orders  where id=2;
DELETE 1
myshop=#

We get the following entries in PostgreSQL log:

% tail -f data/log/postgresql-22.log | grep AUDIT:
[local] [55035] 5b03e693.d6fb 2018-05-22 12:46:37.352 EEST psql postgres@testdb line:7 LOG:  AUDIT: SESSION,5,1,WRITE,INSERT,TABLE,public.orders,"insert into orders (customer,customerid,product,productid,unit_price,quantity) VALUES('magicbattler',1,'some fn skin 2',2,5,2);",<none>
[local] [55035] 5b03e693.d6fb 2018-05-22 12:46:50.120 EEST psql postgres@testdb line:8 LOG:  AUDIT: SESSION,6,1,WRITE,UPDATE,TABLE,public.orders,update orders set quantity=3 where id=2;,<none>
[local] [55035] 5b03e693.d6fb 2018-05-22 12:46:59.888 EEST psql postgres@testdb line:9 LOG:  AUDIT: SESSION,7,1,WRITE,DELETE,TABLE,public.orders,delete from orders  where id=2;,<none>

Note that the text after AUDIT: makes up a perfect audit trail, almost ready to ship to the auditor in spreadsheet-ready csv format. Using session audit logging will give us audit log entries for all operations belonging to the classes defined by pgaudit.log parameter on all tables. However there are cases that we wish only a small subset of the data i.e. only a few tables to be audited. In such cases we may prefer object audit logging which gives us fine grained criteria to selected tables/columns via the PostgreSQL’s privilege system. In order to start using Object audit logging we must first configure the pgaudit.role parameter which defines the master role that pgaudit will use. It makes sense not to give this user any login rights.

CREATE ROLE auditor;
ALTER ROLE auditor WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB NOLOGIN NOREPLICATION NOBYPASSRLS CONNECTION LIMIT 0;

The we specify this value for pgaudit.role in postgresql.conf:

pgaudit.log = none # no need for extensive SESSION logging
pgaudit.role = auditor

Pgaudit OBJECT logging will work by finding if user auditor is granted (directly or inherited) the right to execute the specified action performed on the relations/columns used in a statement. So if we need to ignore all tables, but have detailed logging to table orders, this is the way to do it:

grant ALL on orders to auditor ;

By the above grant we enable full SELECT, INSERT, UPDATE and DELETE logging on table orders. Let’s give once again the INSERT, UPDATE, DELETE of the previous examples and watch the postgresql log:

% tail -f data/log/postgresql-22.log | grep AUDIT:
[local] [60683] 5b040125.ed0b 2018-05-22 14:41:41.989 EEST psql postgres@testdb line:7 LOG:  AUDIT: OBJECT,2,1,WRITE,INSERT,TABLE,public.orders,"insert into orders (customer,customerid,product,productid,unit_price,quantity) VALUES('magicbattler',1,'some fn skin 2',2,5,2);",<none>
[local] [60683] 5b040125.ed0b 2018-05-22 14:41:52.269 EEST psql postgres@testdb line:8 LOG:  AUDIT: OBJECT,3,1,WRITE,UPDATE,TABLE,public.orders,update orders set quantity=3 where id=2;,<none>
[local] [60683] 5b040125.ed0b 2018-05-22 14:42:03.148 EEST psql postgres@testdb line:9 LOG:  AUDIT: OBJECT,4,1,WRITE,DELETE,TABLE,public.orders,delete from orders  where id=2;,<none>

We observe that the output is identical to the SESSION logging discussed above with the difference that instead of SESSION as audit type (the string next to AUDIT: ) now we get OBJECT.

One caveat with OBJECT logging is that TRUNCATEs are not logged. We have to resort to SESSION logging for this. But in this case we end up getting all WRITE activity for all tables. There are talks among the hackers involved to make each command a separate class.

Another thing to keep in mind is that in the case of inheritance if we GRANT access to the auditor on some child table, and not the parent, actions on the parent table which translate to actions on rows of the child table will not be logged.

In addition to the above, the IT people in charge for the integrity of the logs must document a strict and well defined procedure which covers the extraction of the audit trail from the PostgreSQL log files. Those logs might be streamed to an external secure syslog server in order to minimize the chances of any interference or tampering.

Tags:

PostgreSQL

audit logging

security

↧

PostgreSQL Streaming Replication - a Deep Dive

May 28, 2018, 6:31 am

≫ Next: Optimizing Your Linux Environment for MongoDB

≪ Previous: PostgreSQL Audit Logging Best Practices

Knowledge of high availability is a must for anybody managing PostgreSQL. It is a topic that we have seen over and over, but that never gets old. In this blog, we are going to review a little bit of the history of PostgreSQL built-in replication features and deep dive into how streaming replication works.

When talking about replication, we will be talking a lot about WALs. So, let's review a little bit what is this about.

Write Ahead Log (WAL)

Write Ahead Log is a standard method for ensuring data integrity, it is automatically enabled by default.

The WALs are the REDO logs in PostgreSQL. But, what are the REDO logs?

REDO logs contain all changes that were made in the database and they are used by replication, recovery, online backup and point in time recovery (PITR). Any changes that have not been applied to the data pages can be redone from the REDO logs.

Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction.

A WAL record will specify, bit by bit, the changes made to the data. Each WAL record will be appended into a WAL file. The insert position is a Log Sequence Number (LSN) that is a byte offset into the logs, increasing with each new record.

The WALs are stored in pg_xlog (or pg_wal in PostgreSQL 10) directory, under the data directory. These files have a default size of 16MB (the size can be changed by altering the --with-wal-segsize configure option when building the server). They have a unique incremental name, in the following format: "00000001 00000000 00000000".

The number of WAL files contained in pg_xlog (or pg_wal) will depend on the value assigned to the parameter checkpoint_segments (or min_wal_size and max_wal_size, depending on the version) in the postgresql.conf configuration file.

One parameter that we need to setup when configuring all our PostgreSQL installations is the wal_level. It determines how much information is written to the WAL .The default value is minimal, which writes only the information needed to recover from a crash or immediate shutdown. Archive adds logging required for WAL archiving; hot_standby further adds information required to run read-only queries on a standby server; and, finally logical adds information necessary to support logical decoding. This parameter requires a restart, so, it can be hard to change on running prod databases if we have forgotten that.

For further information, you can check the official documentation here or here. Now that we’ve covered the WAL, let's review the replication history…

History of replication in PostgreSQL

The first replication method (warm standby) that PostgreSQL implemented (version 8.2 , back in 2006) was based on the log shipping method.

This means that the WAL records are directly moved from one database server to another to be applied. We can say that is a continuous PITR.

PostgreSQL implements file-based log shipping by transferring WAL records one file (WAL segment) at a time.

This replication implementation has the downside that if there is a major failure on the primary servers, transactions not yet shipped will be lost. So there is a window for data loss (you can tune this by using the archive_timeout parameter, which can be set to as low as a few seconds, but such a low setting will substantially increase the bandwidth required for file shipping).

We can represent this method with the picture below:

PostgreSQL file-based log shipping

So, on version 9.0 (back in 2010), streaming replication was introduced.

This feature allowed us to stay more up-to-date than is possible with file-based log shipping, by transferring WAL records (a WAL file is composed of WAL records) on the fly (record based log shipping), between a master server and one or several slave servers, without waiting for the WAL file to be filled.

In practice, a process called WAL receiver, running on the slave server, will connect to the master server using a TCP/IP connection. In the master server another process exists, named WAL sender, and is in charge of sending the WAL registries to the slave server as they happen.

Streaming replication can be represented as following:

PostgreSQL Streaming replication

By looking at the above diagram we can think, what happens when the communication between the WAL sender and the WAL receiver fails?

When configuring streaming replication, we have the option to enable WAL archiving.

This step is actually not mandatory, but is extremely important for robust replication setup, as it is necessary to avoid the main server to recycle old WAL files that have not yet being applied to the slave. If this occurs we will need to recreate the replica from scratch.

When configuring replication with continuous archiving (as explained here), we are starting from a backup and, to reach the on sync state with the master, we need to apply all the changes hosted in the WAL that happened after the backup. During this process, the standby will first restore all the WAL available in the archive location (done by calling restore_command). The restore_command will fail when we reach the last archived WAL record, so after that, the standby is going to look on the pg_wal (pg_xlog) directory to see if the change exists there (this is actually made to avoid data loss when the master servers crashes and some changes that have already been moved into the replica and applied there have not been yet archived).

If that fails, and the requested record does not exist there, then it will start communicating with the master through streaming replication.

Whenever streaming replication fails, it will go back to step 1 and restore the records from archive again. This loop of retries from the archive, pg_wal, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.

This will be a diagram of such configuration:

PostgreSQL streaming replication with continuous archiving

Streaming replication is asynchronous by default, so at some given moment we can have some transactions that can be committed in the master and not yet replicated into the standby server. This implies some potential data loss.

However this delay between the commit and impact of the changes in the replica is supposed to be really small (some milliseconds), assuming of course that the replica server is powerful enough to keep up with the load.

For the cases when even the risk of a small data loss is not tolerable, version 9.1 introduced the synchronous replication feature.

In synchronous replication each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server.

This method minimizes the possibility of data loss, as for that to happen we will need for both, the master and the standby to fail at the same time.

The obvious downside of this configuration is that the response time for each write transaction increases, as we need to wait until all parties have responded. So the time for a commit is, at minimum, the round trip between the master and the replica. Readonly transactions will not be affected by that.

To setup synchronous replication we need for each of the stand-by servers to specify an application_name in the primary_conninfo of the recovery.conf file: primary_conninfo = '...aplication_name=slaveX' .

We also need to specify the list of the stand-by servers that are going to take part in the synchronous replication : synchronous_standby_name = 'slaveX,slaveY'.

We can setup one or several synchronous servers, and this parameter also specifies which method (FIRST and ANY) to choose synchronous standbys from the listed ones. For more information on how to setup this replication mode please refer here. It is also possible to set up synchronous replication when deploying via ClusterControl, from version 1.6.1 (which is released at the time of writing).

After we have configured our replication, and it is up and running, we will need to have some monitoring over it.

Monitoring PostgreSQL Replication

The pg_stat_replication view on the master server has a lot of relevant information:

postgres=# SELECT * FROM pg_stat_replication;
 pid | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | 
flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state 
-----+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-
----------+------------+-----------+-----------+------------+---------------+------------
 994 |    16467 | repl    | walreceiver      | 192.168.100.42 |                 |       37646 | 2018-05-24 21:27:57.256242-03 |              | streaming | 0/50002C8 | 0/50002C8 | 
0/50002C8 | 0/50002C8  |           |           |            |             0 | async
(1 row)

Let's see this in detail:

pid: Process id of walsender process
usesysid: OID of user which is used for Streaming replication.
usename: Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr: Address of standby/streaming replication
client_hostname: Hostname of standby.
client_port: TCP port number on which standby communicating with WAL sender
backend_start: Start time when SR connected to Master.
state: Current WAL sender state i.e streaming
sent_lsn: Last transaction location sent to standby.
write_lsn: Last transaction written on disk at standby
flush_lsn: Last transaction flush on disk at standby.
replay_lsn: Last transaction flush on disk at standby.
sync_priority: Priority of standby server being chosen as synchronous standby
sync_state: Sync State of standby (is it async or synchronous).

We can also see the WAL sender/receiver processes running on the servers.

Sender (Primary Node):

[root@postgres1 ~]# ps aux |grep postgres
postgres   833  0.0  1.6 392032 16532 ?        Ss   21:25   0:00 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
postgres   847  0.0  0.1 244844  1900 ?        Ss   21:25   0:00 postgres: logger process   
postgres   850  0.0  0.3 392032  3696 ?        Ss   21:25   0:00 postgres: checkpointer process   
postgres   851  0.0  0.3 392032  3180 ?        Ss   21:25   0:00 postgres: writer process   
postgres   852  0.0  0.6 392032  6340 ?        Ss   21:25   0:00 postgres: wal writer process   
postgres   853  0.0  0.3 392440  3052 ?        Ss   21:25   0:00 postgres: autovacuum launcher process   
postgres   854  0.0  0.2 247096  2172 ?        Ss   21:25   0:00 postgres: stats collector process   
postgres   855  0.0  0.2 392324  2504 ?        Ss   21:25   0:00 postgres: bgworker: logical replication launcher   
postgres   994  0.0  0.3 392440  3528 ?        Ss   21:27   0:00 postgres: wal sender process repl 192.168.100.42(37646) streaming 0/50002C8

Receiver (Standby Node):

[root@postgres2 ~]# ps aux |grep postgres
postgres   833  0.0  1.6 392032 16436 ?        Ss   21:27   0:00 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
postgres   848  0.0  0.1 244844  1908 ?        Ss   21:27   0:00 postgres: logger process   
postgres   849  0.0  0.2 392128  2580 ?        Ss   21:27   0:00 postgres: startup process   recovering 000000010000000000000005
postgres   851  0.0  0.3 392032  3472 ?        Ss   21:27   0:00 postgres: checkpointer process   
postgres   852  0.0  0.3 392032  3216 ?        Ss   21:27   0:00 postgres: writer process   
postgres   853  0.0  0.1 246964  1812 ?        Ss   21:27   0:00 postgres: stats collector process   
postgres   854  0.0  0.3 398860  3840 ?        Ss   21:27   0:05 postgres: wal receiver process   streaming 0/50002C8

One way of checking how up to date is our replication is by checking the amount of WAL records generated in the primary, but not yet applied in the standby.

Master:

postgres=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/50002C8
(1 row)

Note: This function is for PostgreSQL 10. For previous versions, you need to use: SELECT pg_current_xlog_location();

Slave:

postgres=# SELECT pg_last_wal_receive_lsn();
pg_last_wal_receive_lsn
-------------------------
0/50002C8
(1 row)

postgres=# SELECT pg_last_wal_replay_lsn();
pg_last_wal_replay_lsn
------------------------
0/50002C8
(1 row)

Note: These functions are for PostgreSQL 10. For previous versions, you need to use: SELECT pg_last_xlog_receive_location(); and SELECT pg_last_xlog_replay_location();

We can use the following query to get the lag in seconds.

PostgreSQL 10:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

Previous Versions:

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

Output:

postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
postgres-# THEN 0
postgres-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
postgres-# END AS log_delay;
log_delay
-----------
        0
(1 row)

To deploy streaming replication setups (synchronous or asynchronous), we can use ClusterControl:

Deploying PostgreSQL replication setups

It also allows us to monitor the replication lag, as well as other key metrics.

PostgreSQL Overview

PostgreSQL Topology View

As streaming replication is based on shipping the WAL records and them being applied to the standby server, it is basically saying what bytes to add or change in what file. As a result, the standby server is actually a bit by bit copy of the master.

We have here some well known limitations:

We cannot replicate into a different version or architecture.
We cannot change anything on the standby server.
We do not have much granularity on what we can replicate.

So, for overcoming these limitations, PostgreSQL 10 has added support for logical replication.

Logical Replication

Logical replication will also use the information in the WAL file, but it will decode it into logical changes. Instead of knowing which byte has changed, we will know exactly what data has been inserted in which table.

It is based in a publish and subscribe model with one or more subscribers, subscribing to one or more publications on a publisher node that looks like this:

PostgreSQL Logical Replication

To know more about Logical Replication in PostgreSQL, we can check the following blog.

With this replication option there many cases that now become possible, like replicating only some of the tables or consolidating multiple databases into a single one.

What new features will come? We will need to stay tuned and check, but we hope that master-master built-in replication is not far away.

Tags:

postgres

streaming replication

high availability

↧

Optimizing Your Linux Environment for MongoDB

May 29, 2018, 2:58 am

≫ Next: Migrating to Galera Cluster for MySQL and MariaDB

≪ Previous: PostgreSQL Streaming Replication - a Deep Dive

MongoDB performance depends on how it utilizes the underlying resources. It stores data on disk, as well as in memory. It uses CPU resources to perform operations, and a network to communicate with its clients. There should be adequate resources to support its general liveliness. In this article we are going to discuss various resource requirements for the MongoDB database system and how we can optimize them for maximum performance.

Requirements for MongoDB

Apart from providing large-scale resources such as the RAM and CPU to the database, tuning the Operating System can also improve performance to some extent. The significant utilities required for establishing a MongoDB environment include:

Enough disk space
Adequate memory
Excellent network connection.

The most common operating system for MongoDB is Linux, so we’ll look at how to optimize it for the database.

Reboot Condition.

There are many tuning techniques that can be applied to Linux. However, as some changes take place without rebooting your host, it is always a good practice to reboot after making changes to ensure they are applied. In this section, the tuning implementations we are going to discuss are:

Network Stack
NTP Daemon
Linux User Limit
File system and Options
Security
Virtual Memory

Network Stack

Like any other software, an excellent network connection provides a better exchange interface for requests and responses with the server. However, MongoDB is not favored with the Linux default kernel network tunings. As the name depicts, this is an arrangement of many layers that can be categorized into 3 main ones: User area, Kernel area and Device area. The user area and kernel area are referred to as host since their tasks are carried out by the CPU. The device area is responsible for sending and receiving packets through an interface called Network Interface Card. For better performance with the MongoDB environment, the host should be confined to a 1Gbps network interface limit. In this case, what we are supposed to tune is the relatively throughput settings which include:

net.core.somaxconn (increase the value)
net.ipv4.tcp_max_syn_backlog (increase the value)
net.ipv4.tcp_fin_timeout (reduce the value)
net.ipv4.tcp_keepalive_intvl (reduce the value)
net.ipv4.tcp_keepalive_time (reduce the value)

To make these changes permanent, create a new file /etc/sysctl.d/mongodb-sysctl.conf if it does not exist and add these lines to it.

net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_max_syn_backlog = 4096

Then run the command as root user /sbin/sysctl -p in order to apply the changes permanently.

NTP Daemon

Network Time Protocol (NTP) is a technique for which a software clock of a Linux system is synchronized with internet time servers. MongoDB, being a cluster, is dependent on time consistency across nodes. For this reason, it is important for the NTP to be run permanently on MongoDB hosts. The importance of the NTP configuration is to ensure continuous serving of the server to some set time after a network disconnection. By default, the NTP is installed on the client side so for MongoDB to install the NTP on a Linux system with Debian/Ubuntu flavor, just run the command:

$ sudo apt-get install ntp

You can visit ntp.conf to see the configuration of the NTP daemon for different OS.

Linux User Limit

Sometimes a user side fault can end up impacting the entire server and host system. To shun this, the Linux system is designed to undertake some system resource limits regarding processes being executed on a per-user basis. This being evident, it will be inappropriate to deploy MongoDB on such default system configurations since it would require more resources than the default provision. Besides, MongoDB is often the main process to utilize the underlying hardware, therefore, it will be predominant to optimize the Linux system for such dedicated usage. ThE database can then fully exploit the available resources.

However, it will not be convenient to disable this limit constraints or set them to an unlimited state. For example, if you run into a shortage of CPU storage or RAM, a small fault can escalate into a huge problem and result into other features to fail - e.g., SSH which is vital in solving the initial problem.

In order to achieve better estimations, you should understand the constraints requirements at the database level. For instance, estimating the number of users that will make requests to the database and processing time. You can refer to Key things to Monitor for MongoDB. A most preferable limit for max-user-processes and open-files are 64000. To set these values create a new file if it does not exist as /etc/security/limits.d and add these lines

mongod       soft        nofile       64000
mongod       hard        nofile       64000
mongod       soft        nproc        64000
mongod       hard        nproc        64000

For you to apply this changes, restart your mongod since the changes apply only to new shells.

File System and Options

MongoDB employs 3 type of filesystems that is, ext3, ext4, and XFS for on-disk database data. For the WiredTiger storage engine employed for MongoDB version greater than 3, the XFS is best used rather than ext4 which is considered to create some stability issues while ext3 is also avoided due to its poor pre-allocation performance. MongoDB does not use the default filesystem technique of performing an access-time metadata update like other systems. You can therefore disable access-time updates to save on the small amount of disk IO activity utilized by these updates.

This can be done by adding a flag noatime to the file system options field in the file etc/fstab for the disk serving MongoDB data.

$ grep "/var/lib/mongo" /proc/mounts
/dev/mapper/data-mongodb /var/lib/mongo ext4 rw, seclabel, noatime, data=ordered 0 0

This change can only be realized when your reboot or restart your MongoDB.

Security

Among the several security features a Linux system has, at kernel-level is the Security-Enhanced Linux. This is an implementation of fine-grained Mandatory Access Control. It provides a bridge to the security policy to determine whether an operation should proceed. Unfortunately, many Linux users set this access control module to warn only or they disable it totally. This is often due to some associated setbacks such as unexpected permission denied error. This module, as much as many people ignore it, plays a major role in reducing local attacks to the server. With this feature enabled and the correspondent modes set to positive, it will provide a secure background for your MongoDB. Therefore, you should enable the SELinux mode and also apply the Enforcing mode especially at the beginning of your installation. To change the SELinux mode to Enforcing: run the command

$ sudo setenforce Enforcing

You can check the running SELinux mode by running

$ sudo getenforce

Become a MongoDB DBA - Bringing MongoDB to Production

Learn about what you need to know to deploy, monitor, manage and scale MongoDB

Download for Free

Virtual Memory

Dirty ratio

MongoDB employs the cache technology to enhance quick fetching of data. In this case, dirty pages are created and some memory will be required to hold them. Dirty ratio therefore becomes the percentage of the total system memory that can hold dirty pages. In most cases, the default values are between (25 - 35)%. If this value is surpassed, then the pages are committed to disk and have an effect of creating a hard pause. To avoid this, you can set the kernel to always flush data through another ratio referred to as dirty_background_ratio whose value ranges between (10% - 15%) to disk in the background without necessarily creating the hard pause.

The aim here is to ensure quality query performance. You can therefore reduce the background ratio if your database system will require large memory. If a hard pause is allowed, you might end up having data duplicates or some data may fail to be recorded during that time. You can also reduce the cache size to avoid data being written to disk in small batches frequently that may end up increasing the disk throughput. To check the currently running value you can run this command:

$ sysctl -a | egrep “vm.dirty.*_ratio”

and you will be presented with something like this.

vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

Swappiness

It is a value ranging from 1 to 100 for which the Virtual Memory manager behaviour can be influenced from. Setting it to 100 implies to swap forcefully to disk and set it to 0 directs the kernel to swap only to shun out-of-memory problems. The default range for Linux is 50 - 60 of which is not appropriate for database systems. In my own test, setting the value between 0 to 10 is optimal. You can always set this value in the /etc/sysctl.conf

vm.swappiness = 5

You can then check this value by running the command

$ sysctl vm.swappiness

For you to apply these changes run the command /sbin/sysctl -p or you can reboot your system.

Tags:

MongoDB

linux

mongodb environment

↧

Migrating to Galera Cluster for MySQL and MariaDB

May 29, 2018, 12:52 pm

≫ Next: Watch the Replay: How to Migrate to Galera Cluster for MySQL & MariaDB

≪ Previous: Optimizing Your Linux Environment for MongoDB

Tuesday, May 29, 2018 - 21:45

Galera Cluster is a mainstream option for high availability MySQL and MariaDB. And though it has established itself as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

While Galera Cluster has some characteristics that make it unsuitable for certain use cases, most applications can still be adapted to run on it.

The benefits are clear: multi-master InnoDB setup with built-in failover and read scalability.

But how do you migrate? Does the schema or application change? What are the limitations? Can a migration be done online, without service interruption? What are the potential risks?

In this webinar, Severalnines Support Engineer Bart Oles walks you through what you need to know in order to migrate from standalone or a master-slave MySQL/MariaDB setup to Galera Cluster.

↧

Watch the Replay: How to Migrate to Galera Cluster for MySQL & MariaDB

May 30, 2018, 4:12 am

≫ Next: Our Guide to MySQL & MariaDB Performance Tuning

≪ Previous: Migrating to Galera Cluster for MySQL and MariaDB

Watch the replay of this webinar with Severalnines Support Engineer Bart Oles, as he walks us through what you need to know in order to migrate from standalone or a master-slave MySQL/MariaDB setup to Galera Cluster.

When considering such a migration, plenty of questions typically come up, such as: how do we migrate? Does the schema or application change? What are the limitations? Can a migration be done online, without service interruption? What are the potential risks?

Galera Cluster has become a mainstream option for high availability MySQL and MariaDB. And though it is now known as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

It has some characteristics that make it unsuitable for certain use cases, however, most applications can still be adapted to run on it.

The benefits are clear: multi-master InnoDB setup with built-in failover and read scalability.

Check out this walk-through on how to migrate to Galera Cluster for MySQL and MariaDB.

Watch the replay and browse through the slides!

Agenda

Application use cases for Galera
Schema design
Events and Triggers
Query design
Migrating the schema
Load balancer and VIP
Loading initial data into the cluster
Limitations:
- Cluster technology
- Application vendor support
Performing Online Migration to Galera
Operational management checklist
Belts and suspenders: Plan B
Demo

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

Tags:

↧

Our Guide to MySQL & MariaDB Performance Tuning

May 31, 2018, 2:20 am

≫ Next: Why is PostgreSQL Running Slow? Tips & Tricks to Get to the Source

≪ Previous: Watch the Replay: How to Migrate to Galera Cluster for MySQL & MariaDB

If you’re asking yourself the following questions when it comes to optimally running your MySQL or MariaDB databases:

How do I tune them to make best use of the hardware?
How do I optimize the Operating System?
How do I best configure MySQL or MariaDB for a specific database workload?

Then this webinar is for you!

We’ll discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL or MariaDB database. We will also cover some of the variables which are frequently modified even though they should not.

Performance tuning is not easy, especially if you’re not an experienced DBA, but you can go a surprisingly long way with a few basic guidelines.

This webinar builds upon blog posts by Krzysztof from the ‘Become a MySQL DBA’ series.

Image:

Agenda:

What to tune and why?
Tuning process
Operating system tuning
- Memory
- I/O performance
MySQL configuration tuning
- Memory
- I/O performance
Useful tools
Do’s and do not’s of MySQL tuning
Changes in MySQL 8.0

Date & Time v2:

Tuesday, June 26, 2018 - 10:00 to 11:15

Tuesday, June 26, 2018 - 12:00 to 13:15

↧

Why is PostgreSQL Running Slow? Tips & Tricks to Get to the Source

May 31, 2018, 2:58 am

≫ Next: How to Recover Galera Cluster or MySQL Replication from Split Brain Syndrome

≪ Previous: Our Guide to MySQL & MariaDB Performance Tuning

As a PostgreSQL Database Administrator, there are the everyday expectations to check on backups, apply DDL changes, make sure the logs don’t have any game breaking ERROR’s, and answer panicked calls from developers who’s reports are running twice as long as normal and they have a meeting in ten minutes.

Even with a good understanding of the health of managed databases, there will always be new cases and new issues popping up relating to performance and how the database “feels”. Whether it’s a panicked email, or an open ticket for “the database feels slow”, this common task can generally be followed with a few steps to check whether or not there is a problem with PostgreSQL, and what that problem may be.

This is by no extent an exhaustive guide, nor do the steps need to be done in any specific order. But it’s rather a set of initial steps that can be taken to help find the common offenders quickly, as well as gain new insight as to what the issue may be. A developer may know how the application acts and responds, but the Database Administrator knows how the database acts and responds to the application, and together, the issue can be found.

NOTE: The queries to be executed should be done as a superuser, such as ‘postgres’ or any database user granted the superuser permissions. Limited users will either be denied or have data omitted.

Step 0 - Information Gathering

Get as much information as possible from whoever says the database seems slow; specific queries, applications connected, timeframes of the performance slowness, etc. The more information they give the easier it will be to find the issue.

Step 1 - Check pg_stat_activity

The request may come in many different forms, but if “slowness” is the general issue, checking pg_stat_activity is the first step to understand just what’s going on. The pg_stat_activity view (documentation for every column in this view can be found here) contains a row for every server process / connection to the database from a client. There is a handful of useful information in this view that can help.

NOTE: pg_stat_activity has been known to change structure over time, refining the data it presents. Understanding of the columns themselves will help build queries dynamically as needed in the future.

Notable columns in pg_stat_activity are:

query: a text column showing the query that’s currently being executed, waiting to be executed, or was last executed (depending on the state). This can help identify what query / queries a developer may be reporting are running slowly.
client_addr: The IP address for which this connection and query originated from. If empty (or Null), it originated from localhost.
backend_start, xact_start, query_start: These three provide a timestamp of when each started respectively. Backend_start represents when the connection to the database was established, xact_start is when the current transaction started, and query_start is when the current (or last) query started.
state: The state of the connection to the database. Active means it’s currently executing a query, ‘idle’ means it’s waiting further input from the client, ‘idle in transaction’ means it’s waiting for further input from the client while holding an open transaction. (There are others, however their likelihood is rare, consult the documentation for more information).
datname: The name of the database the connection is currently connected to. In multiple database clusters, this can help isolate problematic connections.
wait_event_type and wait_event: These columns will be null when a query isn’t waiting, but if it is waiting they will contain information on why the query is waiting, and exploring pg_locks can identify what it’s waiting on. (PostgreSQL 9.5 and before only has a boolean column called ‘waiting’, true if waiting, false if not.

1.1. Is the query waiting / blocked?

If there is a specific query or queries that are “slow” or “hung”, check to see if they are waiting for another query to complete. Due to relation locking, other queries can lock a table and not let any other queries to access or change data until that query or transaction is done.

PostgreSQL 9.5 and earlier:

SELECT * FROM pg_stat_activity WHERE waiting = TRUE;

PostgreSQL 9.6:

SELECT * FROM pg_stat_activity WHERE wait_event IS NOT NULL;

PostgreSQL 10 and later (?):

SELECT * FROM pg_stat_activity WHERE wait_event IS NOT NULL AND backend_type = 'client backend';

The results of this query will show any connections currently waiting on another connection to release locks on a relation that is needed.

If the query is blocked by another connection, there are some ways to find out just what they are. In PostgreSQL 9.6 and later, the function pg_blocking_pids() allows the input of a process ID that’s being blocked, and it will return an array of process ID’s that are responsible for blocking it.

PostgreSQL 9.6 and later:

SELECT * FROM pg_stat_activity 
WHERE pid IN (SELECT pg_blocking_pids(<pid of blocked query>));

PostgreSQL 9.5 and earlier:

SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

(Available from the PostgreSQL Wiki).

These queries will point to whatever is blocking a specific PID that’s provided. With that, a decision can be made to kill the blocking query or connection, or let it run.

Step 2 - If the queries are running, why are they taking so long?

2.1. Is the planner running queries efficiently?

If a query (or set of queries) in question has the status of ‘active’, then it’s actually running. If the whole query isn’t available in pg_stat_activity, fetch it from the developers or the postgresql log and start exploring the query planner.

EXPLAIN SELECT * FROM postgres_stats.table_stats t JOIN hosts h ON (t.host_id = h.host_id) WHERE logged_date >= '2018-02-01' AND logged_date < '2018-02-04' AND t.india_romeo = 569;
Nested Loop  (cost=0.280..1328182.030 rows=2127135 width=335)
  ->  Index Scan using six on victor_oscar echo  (cost=0.280..8.290 rows=1 width=71)
          Index Cond: (india_romeo = 569)
  ->  Append  (cost=0.000..1306902.390 rows=2127135 width=264)
        ->  Seq Scan on india_echo romeo  (cost=0.000..0.000 rows=1 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))
        ->  Seq Scan on juliet victor_echo  (cost=0.000..437153.700 rows=711789 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))
        ->  Seq Scan on india_papa quebec_bravo  (cost=0.000..434936.960 rows=700197 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))
        ->  Seq Scan on two oscar  (cost=0.000..434811.720 rows=715148 width=264)
                Filter: ((logged_date >= '2018-02-01'::timestamp with time zone) AND (logged_date < '2018-02-04'::timestamp with time zone) AND (india_romeo = 569))

This example shows a query plan for a two table join that also hits a partitioned table. We’re looking for anything that can cause the query to be slow, and in this case the planner is doing several Sequential Scans on partitions, suggesting that they are missing indexes. Adding indexes to these tables for column ‘india_romeo’ will instantly improve this query.

Things to look for are sequential scans, nested loops, expensive sorting, etc. Understanding the query planner is crucial to making sure queries are performing the best way possible, official documentation can be read for more information here.

2.2. Are the tables involved bloated?

If the queries are still feeling slow without the query planner pointing at anything obvious, it’s time to check the health of the tables involved. Are they too big? Are they bloated?

SELECT n_live_tup, n_dead_tup from pg_stat_user_tables where relname = ‘mytable’;
n_live_tup  | n_dead_tup
------------+------------
      15677 |    8275431
(1 row)

Here we see that there are many times more dead rows than live rows, which means to find the correct rows, the engine must sift through data that’s not even relevant to find real data. A vacuum / vacuum full on this table will increase performance significantly.

Step 3 - Check the logs

If the issue still can’t be found, check the logs for any clues.

FATAL / ERROR messages:

Look for messages that may be causing issues, such as deadlocks or long wait times to gain a lock.

Checkpoints

Hopefully log_checkpoints is set to on, which will write checkpoint information to the logs. There are two types of checkpoints, timed and requested (forced). If checkpoints are being forced, then dirty buffers in memory must be written to disk before processing more queries, which can give a database system an overall feeling of “slowness”. Increasing checkpoint_segments or max_wal_size (depending on the database version) will give the checkpointer more room to work with, as well as help the background writer take some of the writing load.

Step 4 - What’s the health of the host system?

If there’s no clues in the database itself, perhaps the host itself is overloaded or having issues. Anything from an overloaded IO chanel to disk, memory overflowing to swap, or even a failing drive, none of these issues would be apparent with anything we looked at before. Assuming the database is running on a *nix based operating system, here are a few things that can help.

4.1. System load

Using ‘top’, look at the load average for the host. If the number is approaching or exceeding the number of cores on the system, it could be simply too many concurrent connections hitting the database bringing it to a crawl to catch up.

load average: 3.43, 5.25, 4.85

4.2. System memory and SWAP

Using ‘free’, check to see if SWAP has been used at all. Memory overflowing to SWAP in a PostgreSQL database environment is extremely bad for performance, and many DBA’s will even eliminate SWAP from database hosts, as an ‘out of memory’ error is more preferable than a sluggish system to many.

If SWAP is being used, a reboot of the system will clear it out, and increasing total system memory or re-configuring memory usage for PostgreSQL (such as lowering shared_buffers or work_mem) may be in order.

[postgres@livedb1 ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7986         225        1297          12        6462        7473
Swap:          7987        2048        5939

4.3. Disk access

PostgreSQL attempts to do a lot of its work in memory, and spread out writing to disk to minimize bottlenecks, but on an overloaded system with heavy writing, it’s easily possible to see heavy reads and writes cause the whole system to slow as it catches up on the demands. Faster disks, more disks and IO channels are some ways to increase the amount of work that can be done.

Tools like ‘iostat’ or ‘iotop’ can help pinpoint if there is a disk bottleneck, and where it may be coming from.

4.4. Check the logs

If all else fails, or even if not, logs should always be checked to see if the system is reporting anything that’s not right. We already discussed checking the postgresql.logs, but the system logs can give information about issues such as failing disks, failing memory, network problems, etc. Any one of these issues can cause the database to act slow and unpredictable, so a good understanding of perfect health can help find these issues.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Step 5 - Something still not make sense?

Even the most seasoned administrators will run into something new that doesn’t make sense. That’s where the global PostgreSQL community can come in to help out. Much like step #0, the more clear information given to the community, the easier they can help out.

5.1. PostgreSQL Mailing Lists

Since PostgreSQL is developed and managed by the open source community, there are thousands of people who talk through the mailing lists to discuss countless topics including features, errors, and performance issues. The mailing lists can be found here, with pgsql-admin and pgsql-performance being the most important for looking for help with performance issues.

5.2. IRC

Freenode hosts several PostgreSQL channels with developers and administrators all over the world, and it’s not hard to find a helpful person to track down where issues may be coming from. More information can be found on the PostgreSQL IRC page.

Tags:

PostgreSQL

postgres performance tuning

↧

How to Recover Galera Cluster or MySQL Replication from Split Brain Syndrome

June 1, 2018, 2:58 am

≫ Next: Multi Datacenter setups with PostgreSQL

≪ Previous: Why is PostgreSQL Running Slow? Tips & Tricks to Get to the Source

You may have heard about the term “split brain”. What it is? How does it affect your clusters? In this blog post we will discuss what exactly it is, what danger it may pose to your database, how we can prevent it, and if everything goes wrong, how to recover from it.

Long gone are the days of single instances, nowadays almost all databases run in replication groups or clusters. This is great for high availability and scalability, but a distributed database introduces new dangers and limitations. One case which can be deadly is a network split. Imagine a cluster of multiple nodes which, due to network issues, was split in two parts. For obvious reasons (data consistency), both parts shouldn’t handle traffic at the same time as they are isolated from each other and data cannot be transferred between them. It is also wrong from the application point of view - even if, eventually, there would be a way to sync the data (although reconciliation of 2 datasets is not trivial). For a while, part of the application would be unaware of the changes made by other application hosts, which accesses the other part of the database cluster. This can lead to serious problems.

The condition in which the cluster has been divided in two or more parts that are willing to accept writes is called “split brain”.

The biggest problem with split brain is data drift, as writes happen on both parts of the cluster. None of MySQL flavors provide automated means of merging datasets that have diverged. You will not find such feature in MySQL replication, Group Replication or Galera. Once the data has diverged, the only option is to either use one of the parts of the cluster as the source of truth and discard changes executed on the other part - unless we can follow some manual process in order to merge the data.

This is why we will start with how to prevent split brain from happening. This is so much easier than having to fix any data discrepancy.

How to prevent split brain

The exact solution depends on the type of the database and the setup of the environment. We will take a look at some of the most common cases for Galera Cluster and MySQL Replication.

Galera cluster

Galera has a built-in “circuit breaker” to handle split brain: it rely on a quorum mechanism. If a majority (50% + 1) of the nodes are available in the cluster, Galera will operate normally. If there is no majority, Galera will stop serving traffic and switch to so called “non-Primary” state. This is pretty much all you need to deal with a split brain situation while using Galera. Sure, there are manual methods to force Galera into “Primary” state even if there’s not a majority. Thing is, unless you do that, you should be safe.

The way how quorum is calculated has important repercussions - at a single datacenter level, you want to have an odd number of nodes. Three nodes give you a tolerance for failure of one node (2 nodes match the requirement of more than 50% of the nodes in the cluster being available). Five nodes will give you a tolerance for failure of two nodes (5 - 2 = 3 which is more than 50% from 5 nodes). On the other hand, using four nodes will not improve your tolerance over three node cluster. It would still handle only a failure of one node (4 - 1 = 3, more than 50% from 4) while failure of two nodes will render the cluster unusable (4 - 2 = 2, just 50%, not more).

While deploying Galera cluster in a single datacenter, please keep in mind that, ideally, you would like to distribute nodes across multiple availability zones (separate power source, network, etc.) - as long as they do exist in your datacenter, that is. A simple setup may look like below:

At the multi-datacenter level, those considerations are also applicable. If you want Galera cluster to automatically handle datacenter failures, you should use an odd number of datacenters. To reduce costs, you can use a Galera arbitrator in one of them instead of a database node. Galera arbitrator (garbd) is a process which takes part in the quorum calculation but it does not contain any data. This makes it possible to use it even on very small instances as it is not resource-intensive - although the network connectivity has to be good as it ‘sees’ all the replication traffic. Example setup may look like on a diagram below:

MySQL Replication

With MySQL replication the biggest issue is that there is no quorum mechanism builtin, as it is in Galera cluster. Therefore more steps are required to ensure that your setup will not be affected by a split brain.

One method is to avoid cross-datacenter automated failovers. You can configure your failover solution (it can be through ClusterControl, or MHA or Orchestrator) to failover only within single datacenter. If there was a full datacenter outage, it would be up to the admin to decide how to failover and how to ensure that the servers in the failed datacenter will not be used.

There are options to make it more automated. You can use Consul to store data about the nodes in the replication setup, and which one of them is the master. Then it will be up to the admin (or via some scripting) to update this entry and move writes to the second datacenter. You can benefit from an Orchestrator/Raft setup where Orchestrator nodes can be distributed across multiple datacenters and detect split brain. Based on this you could take different actions like, as we mentioned previously, update entries in our Consul or etcd. The point is that this is a much more complex environment to setup and automate than Galera cluster. Below you can find example of multi-datacenter setup for MySQL replication.

Please keep in mind that you still have to create scripts to make it work, i.e. monitor Orchestrator nodes for a split brain and take necessary actions to implement STONITH and ensure that the master in datacenter A will not be used once the network converge and connectivity will be restored.

Split brain happened - what to do next?

The worst case scenario happened and we have data drift. We will try to give you some hints what can be done here. Unfortunately, the exact steps will depend mostly on your schema design so it will not be possible to write a precise how-to guide.

What you have to keep in mind is that the ultimate goal will be to copy data from one master to the other and recreate all relations between tables.

First of all, you have to identify which node will continue serving data as master. This is a dataset to which you will merge data stored on the other “master” instance. Once that’s done, you have to identify data from old master which is missing on the current master. This will be manual work. If you have timestamps in your tables, you can leverage them to pinpoint the missing data. Ultimately, binary logs will contain all data modifications so you can rely on them. You may also have to rely on your knowledge of the data structure and relations between tables. If your data is normalized, one record in one table could be related to records in other tables. For example, your application may insert data to “user” table which is related to “address” table using user_id. You will have to find all related rows and extract them.

Next step will be to load this data into the new master. Here comes the tricky part - if you prepared your setups beforehand, this could be simply a matter of running a couple of inserts. If not, this may be rather complex. It’s all about primary key and unique index values. If your primary key values are generated as unique on each server using some sort of UUID generator or using auto_increment_increment and auto_increment_offset settings in MySQL, you can be sure that the data from the old master you have to insert won’t cause primary key or unique key conflicts with data on the new master. Otherwise, you may have to manually modify data from the old master to ensure it can be inserted correctly. It sounds complex, so let’s take a look at an example.

Let’s imagine we insert rows using auto_increment on node A, which is a master. For the sake of simplicity, we will focus on a single row only. There are columns ‘id’ and ‘value’.

If we insert it without any particular setup, we’ll see entries like below:

1000, ‘some value0’
1001, ‘some value1’
1002, ‘some value2’
1003, ‘some value3’

Those will replicate to the slave (B). If the split brain happens and writes will be executed on both old and new master, we will end up with following situation:

1000, ‘some value0’
1001, ‘some value1’
1002, ‘some value2’
1003, ‘some value3’
1004, ‘some value4’
1005, ‘some value5’
1006, ‘some value7’

1000, ‘some value0’
1001, ‘some value1’
1002, ‘some value2’
1003, ‘some value3’
1004, ‘some value6’
1005, ‘some value8’
1006, ‘some value9’

As you can see, there’s no way to simply dump records with id of 1004, 1005 and 1006 from node A and store them on node B because we will end up with duplicated primary key entries. What needs to be done is to change values of id column in the rows that will be inserted to a value larger than the maximum value of the id column from the table. This is all what’s needed for single rows. For more complex relations, where multiple tables are involved, you may have to make the changes in multiple locations.

On the other hand, if we had anticipated this potential problem and configured our nodes to store odd id’s on node A and even id’s on node B, the problem would have been so much easier to solve.

Node A was configured with auto_increment_offset = 1 and auto_increment_increment = 2

Node B was configured with auto_increment_offset = 2 and auto_increment_increment = 2

This is how the data would look on node A before the split brain:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’

When split brain happened, it will look like below.

Node A:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’
1009, ‘some value4’
1011, ‘some value5’
1013, ‘some value7’

Node B:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’
1008, ‘some value6’
1010, ‘some value8’
1012, ‘some value9’

Now we can easily copy missing data from node A:

1009, ‘some value4’
1011, ‘some value5’
1013, ‘some value7’

And load it to node B ending up with following data set:

1001, ‘some value0’
1003, ‘some value1’
1005, ‘some value2’
1007, ‘some value3’
1008, ‘some value6’
1009, ‘some value4’
1010, ‘some value8’
1011, ‘some value5’
1012, ‘some value9’
1013, ‘some value7’

Sure, rows are not in the original order, but this should be ok. In the worst case scenario you will have to order by ‘value’ column in queries and maybe add an index on it to make the sorting fast.

Now, imagine hundreds or thousands of rows and a highly normalized table structure - to restore one row may mean you will have to restore several of them in additional tables. With a need to change id’s (because you didn’t have protective settings in place) across all related rows and all of this being manual work, you can imagine that this is not the best situation to be in. It takes time to recover and it is an error-prone process. Luckily, as we discussed at the beginning, there are means to minimize chances that split brain will impact your system or to reduce the work that needs to be done to sync back your nodes. Make sure you use them and stay prepared.

Tags:

↧

Multi Datacenter setups with PostgreSQL

June 4, 2018, 2:58 am

≫ Next: Webinar: MySQL & MariaDB Performance Tuning for Dummies

≪ Previous: How to Recover Galera Cluster or MySQL Replication from Split Brain Syndrome

The main goals of a multi-datacenter (or multi-DC) setup — regardless of whether the database ecosystem is SQL (PostgreSQL, MySQL), or NoSQL (MongoDB, Cassandra) to name just a few — are Low Latency for end users, High Availability, and Disaster Recovery. At the heart of such an environment lies the ability to replicate data, in ways that ensure its durability (as a side note Cassandra’s durability configuration parameters are similar to those used by PostgreSQL). The various replication requirements will be discussed below, however, the extreme cases will be left to the curious for further research.

Replication using asynchronous log shipping has been available in PostgreSQL for a long time, and synchronous replication introduced in version 9.1 opened a whole new set of options to developers of PostgreSQL management tools.

Things to Consider

One way to understanding the complexity of a PostgreSQL multi-DC implementation is by learning from the solutions implemented for other database systems, while keeping in mind that PostgreSQL insists on being ACID compliant.

A multi-DC setup includes, in most cases at least one datacenter in the cloud. While cloud providers take on the burden of managing the database replication on behalf of their clients, they do not usually match the features available in specialized management tools. For example with many enterprises embracing hybrid cloud and/or multi-cloud solutions, in addition to their existing on premise infrastructure, a multi-DC tool should be able to handle such a mixed environment.

Further, in order to minimize downtime during a failover, the PostgreSQL management system should be able to request (via an API call) a DNS update, so the database requests are routed to the new master cluster.

Networks spanning large geographical areas are high latency connections and all solutions must compromise: forget about synchronous replication, and use one primary with many read replicas. See the AWS MongoDB and Severalnines/Galera Cluster studies for an in-depth analysis of network effects on replication. On a related note, a nifty tool for testing the latency between locations is Wonder Network Ping Statistics.

While the high latency nature of WAN cannot be changed, the user experience can be dramatically improved by ensuring that reads are served from a read-replica close to the user location, however with some caveats. By moving replicas away from the primary, writes are delayed and thus we must do away with synchronous replication. The solution must also be able to work around other issues such as read-after-write-consistency and stale secondary reads due to connection loss.

In order to minimize the RTO, data needs to be replicated to a durable storage that is also able to provide high read throughput, and according to Citus Data one option that meets those requirements is AWS S3.

The very notion of multiple data center implies that the database management system must be able to present the DBA with a global view of all data centers and the various PostgreSQL clusters within them, manage multiple versions of PostgreSQL, and configure the replication between them.

When replicating writes to regional data centers, the propagation delay must be monitored. If the delay exceeds a threshold, an alarm should be triggered indicating that the replica contains stale data. The same principle applies to asynchronous multi-master replication.

In a synchronous setup, high latency, or network disruptions may lead to delays in serving client requests while waiting for the commit to complete, while in asynchronous configurations there are risks of split-brain, or degraded performance for an extended period of time. Split-brain and delays on synchronous commits are unavoidable even with well established replication solutions as explained in the article Geo-Distributed Database Clusters with Galera.

Another consideration is vendor support — as of this writing AWS does not support PostgreSQL cross-region replicas.

Intelligent management systems should monitor the network latency between data centers and recommend or adjust changes e.g. synchronous replication is perfectly fine between AWS Availability Zones where data centers are connected using fiber networks. That way a solution can achieve zero data loss and it can also implement master-master replication along with load balancing. Note that AWS Aurora PostgreSQL does not currently provide a master-master replication option.

Decide on the level of replication: cluster, database, table. The decision criteria should include bandwidth costs.

Implement cascaded replication in order to work around network disruptions that can prevent replicas to receive updates from master due to geographical distance.

Solutions

Taking into consideration the all the requirements identify the products that are best suited for the job. A note of caution though: each solution comes with its own caveats that must be dealt with by following the recommendations in the product documentation. See for example the BDR Monitoring requirement.

The PostgreSQL official documentation contains a list of non-commercial open source applications, and an expanded list including commercial closed source solutions can be found at the Replication, Clustering, and Connection Pooling wiki page. A few of those tools have been reviewed in more detail in the Top PG Clustering HA Solutions for PostgreSQL article.

There is no turnkey solution, but some products can provide most of the features, especially when working with the vendor.

Here’s a non-exhaustive list:

Citus Data provide their own PostgreSQL build, enhanced with impressive enterprise features and deep integration with AWS.
EnterpriseDB offer a large suite of services that can be combined to meet most of the requirements. Most information is at Product Documentation.
Postgres-BDR is a powerful replication tool designed specifically for geographically distributed clusters, however it doesn’t integrate with any cloud provider.
ClusterControl comes with an impressive feature set- for managing PostgreSQL. It also has limited cloud integration.
ElephantSQL works across many cloud providers. However, there is no option for an on-premise setup.
Crunchy PostgreSQL for Kubernetes is a cloud agnostic product built on the upstream PostgreSQL.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Conclusion

As we’ve seen, when it comes to choosing a PostgreSQL multi-datacenter solution, there isn’t a one-size fits all solution. Often, compromising is a must. However, a good understanding of the requirements and implications can go a long way in making an informed decision.

Compared to static (read-only) data, a solution for databases needs to consider the replication of updates (writes). The literature describing both SQL and NoSQL replication solutions insists on using a single source of truth for writes with many replicas in order to avoid issues such as split-brain, and read-after-write consistency.

Lastly, interoperability is a key requirement considering that multi-DC setups may span data centers located on premise, and various cloud providers.

Tags:

PostgreSQL

multi-dc

postgresql data center

↧

Webinar: MySQL & MariaDB Performance Tuning for Dummies

June 5, 2018, 2:58 am

≫ Next: ClusterControl Release 1.6.1: MariaDB Backup & PostgreSQL in the Cloud

≪ Previous: Multi Datacenter setups with PostgreSQL

You’re running MySQL or MariaDB as backend database, how do you tune it to make best use of the hardware? How do you optimize the Operating System? How do you best configure MySQL or MariaDB for a specific database workload?

Do these questions sound familiar to you? Maybe you’re having to deal with that type of situation yourself?

MySQL & MariaDB Performance Tuning Webinar

A database server needs CPU, memory, disk and network in order to function. Understanding these resources is important for anybody managing a production database. Any resource that is weak or overloaded can become a limiting factor and cause the database server to perform poorly.

In this webinar, we’ll discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL or MariaDB database. We will also cover some of the variables which are frequently modified even though they should not.

Performance tuning is not easy, especially if you’re not an experienced DBA, but you can go a surprisingly long way with a few basic guidelines.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, June 26th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

North America/LatAm

Tuesday, June 26th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Agenda

What to tune and why?
Tuning process
Operating system tuning
- Memory
- I/O performance
MySQL configuration tuning
- Memory
- I/O performance
Useful tools
Do’s and do not’s of MySQL tuning
Changes in MySQL 8.0

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

This webinar builds upon blog posts by Krzysztof from the ‘Become a MySQL DBA’ series.

We look forward to “seeing” you there!

Tags:

↧

ClusterControl Release 1.6.1: MariaDB Backup & PostgreSQL in the Cloud

June 6, 2018, 2:58 am

≫ Next: MySQL on Docker: Running a MariaDB Galera Cluster without Container Orchestration Tools - Part 1

≪ Previous: Webinar: MySQL & MariaDB Performance Tuning for Dummies

We are excited to announce the 1.6.1 release of ClusterControl - the all-inclusive database management system that lets you easily deploy, monitor, manage and scale highly available open source databases in any environment: on-premise or in the cloud.

ClusterControl 1.6.1 introduces new Backup Management features for MariaDB, Deployment & Configuration Management features for PostgreSQL in the Cloud as well as a new Monitoring & Alerting feature with the integration with ServiceNow, the popular services management system for the enterprise … and more!

Release Highlights

For MariaDB - Backup Management Features

We’ve added Backup Management features with the addition of MariaDB Backup based clusters as well as support for MaxScale 2.2 (MariaDB’s load balancing technology).

For PostgreSQL - Deployment & Configuration Management Features

We’ve built new Deployment & Configuration Management features for deploying PostgreSQL to the Cloud. Users can also now easily deploy Synchronous Replication Slaves using this latest version of ClusterControl.

Monitoring & Alerting Feature - ServiceNow

ClusterControl users can now easily integrate with ServiceNow, the popular services management system for the enterprise.

Additional Highlights

We’ve also recently implemented improvements and fixes to ClusterControl’s MySQL (NDB) Cluster features.

View the ClusterControl ChangeLog for all the details!

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

View Release Details and Resources

Release Details

For MariaDB - Backup Management Features

With MariaDB Server 10.1, the MariaDB team introduced MariaDB Compression and Data-at-Rest Encryption, which is supported by MariaDB Backup. It’s an open source tool provided by MariaDB (and a fork of Percona XtraBackup) for performing physical online backups of InnoDB, Aria and MyISAM tables.

ClusterControl 1.6.1 now features support for MariaDB Backup for MariaDB-based systems. Users can now easily create, restore and schedule backups for their MariaDB databases using MariaDB Backup - whether full or incremental. Let ClusterControl manage your backups, saving you time for the rest of the maintenance of your databases.

With ClusterControl 1.6.1, we also introduce support for MaxScale 2.2, MariaDB’s load balancing technology.

Scheduling a new backup with ClusterControl using MariaDB Backup:

For PostgreSQL - Deployment & Configuration Management Features

ClusterControl 1.6.1 introduces new cloud deployment features for our PostgreSQL users. Whether you’re looking to deploy PostgreSQL nodes using management/public IPs for monitoring connections and data/private IPs for replication traffic; or you’re looking to deploy HAProxy using management/public IPs and private IPs for configurations - ClusterControl does it for you.

ClusterControl now also automates the deployment of Synchronous Replication Slaves for PostgreSQL. Synchronous replication offers the ability to confirm that all changes made by a transaction have been transferred to one or more synchronous standby servers, which allows you to build data-loss-less PostgreSQL clusters; and faster failovers.

Deploying PostgreSQL in the Cloud:

Monitoring & Alerting Feature - ServiceNow

With this new release, we are pleased to announce that ServiceNow has been added as a new notifications integration to ClusterControl. This service management system provides technical management support (such as asset and license management) to the IT operations of large corporations, including help desk functionalities and is a very popular integration. This allows enterprises to connect ClusterControl with ServiceNow and benefit from both systems’ features.

Adding ServiceNow as a New Integration to ClusterControl:

Additional New Functionalities

View the ClusterControl ChangeLog for all the details!

Download ClusterControl today!

Happy Clustering!

Tags:

↧

MySQL on Docker: Running a MariaDB Galera Cluster without Container Orchestration Tools - Part 1

June 7, 2018, 2:58 am

≫ Next: Deploying PostgreSQL on a Docker Container

≪ Previous: ClusterControl Release 1.6.1: MariaDB Backup & PostgreSQL in the Cloud

Container orchestration tools simplify the running of a distributed system, by deploying and redeploying containers and handling any failures that occur. One might need to move applications around, e.g., to handle updates, scaling, or underlying host failures. While this sounds great, it does not always work well with a strongly consistent database cluster like Galera. You can’t just move database nodes around, they are not stateless applications. Also, the order in which you perform operations on a cluster has high significance. For instance, restarting a Galera cluster has to start from the most advanced node, or else you will lose data. Therefore, we’ll show you how to run Galera Cluster on Docker without a container orchestration tool, so you have total control.

In this blog post, we are going to look into how to run a MariaDB Galera Cluster on Docker containers using the standard Docker image on multiple Docker hosts, without the help of orchestration tools like Swarm or Kubernetes. This approach is similar to running a Galera Cluster on standard hosts, but the process management is configured through Docker.

Before we jump further into details, we assume you have installed Docker, disabled SElinux/AppArmor and cleared up the rules inside iptables, firewalld or ufw (whichever you are using). The following are three dedicated Docker hosts for our database cluster:

host1.local - 192.168.55.161
host2.local - 192.168.55.162
host3.local - 192.168.55.163

Multi-host Networking

First of all, the default Docker networking is bound to the local host. Docker Swarm introduces another networking layer called overlay network, which extends the container internetworking to multiple Docker hosts in a cluster called Swarm. Long before this integration came into place, there were many network plugins developed to support this - Flannel, Calico, Weave are some of them.

Here, we are going to use Weave as the Docker network plugin for multi-host networking. This is mainly due to its simplicity to get it installed and running, and support for DNS resolver (containers running under this network can resolve each other's hostname). There are two ways to get Weave running - systemd or through Docker. We are going to install it as a systemd unit, so it's independent from Docker daemon (otherwise, we would have to start Docker first before Weave gets activated).

Download and install Weave:

$ curl -L git.io/weave -o /usr/local/bin/weave
$ chmod a+x /usr/local/bin/weave

Create a systemd unit file for Weave:

$ cat > /etc/systemd/system/weave.service << EOF
[Unit]
Description=Weave Network
Documentation=http://docs.weave.works/weave/latest_release/
Requires=docker.service
After=docker.service
[Service]
EnvironmentFile=-/etc/sysconfig/weave
ExecStartPre=/usr/local/bin/weave launch --no-restart $PEERS
ExecStart=/usr/bin/docker attach weave
ExecStop=/usr/local/bin/weave stop
[Install]
WantedBy=multi-user.target
EOF

Define IP addresses or hostname of the peers inside /etc/sysconfig/weave:

$ echo 'PEERS="192.168.55.161 192.168.55.162 192.168.55.163"'> /etc/sysconfig/weave

Start and enable Weave on boot:

$ systemctl start weave
$ systemctl enable weave

Repeat the above 4 steps on all Docker hosts. Verify with the following command once done:

$ weave status

The number of peers is what we are looking after. It should be 3:

          ...
          Peers: 3 (with 6 established connections)
          ...

Running a Galera Cluster

Now the network is ready, it's time to fire our database containers and form a cluster. The basic rules are:

Container must be created under --net=weave to have multi-host connectivity.
Container ports that need to be published are 3306, 4444, 4567, 4568.
The Docker image must support Galera. If you'd like to use Oracle MySQL, then get the Codership version. If you'd like Percona's, use this image instead. In this blog post, we are using MariaDB's.

The reasons we chose MariaDB as the Galera cluster vendor are:

Galera is embedded into MariaDB, starting from MariaDB 10.1.
The MariaDB image is maintained by the Docker and MariaDB teams.
One of the most popular Docker images out there.

Bootstrapping a Galera Cluster has to be performed in sequence. Firstly, the most up-to-date node must be started with "wsrep_cluster_address=gcomm://". Then, start the remaining nodes with a full address consisting of all nodes in the cluster, e.g, "wsrep_cluster_address=gcomm://node1,node2,node3". To accomplish these steps using container, we have to do some extra steps to ensure all containers are running homogeneously. So the plan is:

We would need to start with 4 containers in this order - mariadb0 (bootstrap), mariadb2, mariadb3, mariadb1.
Container mariadb0 will be using the same datadir and configdir with mariadb1.
Use mariadb0 on host1 for the first bootstrap, then start mariadb2 on host2, mariadb3 on host3.
Remove mariadb0 on host1 to give way for mariadb1.
Lastly, start mariadb1 on host1.

At the end of the day, you would have a three-node Galera Cluster (mariadb1, mariadb2, mariadb3). The first container (mariadb0) is a transient container for bootstrapping purposes only, using cluster address "gcomm://". It shares the same datadir and configdir with mariadb1 and will be removed once the cluster is formed (mariadb2 and mariadb3 are up) and nodes are synced.

By default, Galera is turned off in MariaDB and needs to be enabled with a flag called wsrep_on (set to ON) and wsrep_provider (set to the Galera library path) plus a number of Galera-related parameters. Thus, we need to define a custom configuration file for the container to configure Galera correctly.

Let's start with the first container, mariadb0. Create a file under /containers/mariadb0/conf.d/my.cnf and add the following lines:

$ mkdir -p /containers/mariadb0/conf.d
$ cat /containers/mariadb0/conf.d/my.cnf
[mysqld]

default_storage_engine          = InnoDB
binlog_format                   = ROW

innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_file_per_table           = 1
innodb_autoinc_lock_mode        = 2
innodb_lock_schedule_algorithm  = FCFS # MariaDB >10.1.19 and >10.2.3 only

wsrep_on                        = ON
wsrep_provider                  = /usr/lib/galera/libgalera_smm.so
wsrep_sst_method                = xtrabackup-v2

Since the image doesn't come with MariaDB Backup (which is the preferred SST method for MariaDB 10.1 and MariaDB 10.2), we are going to stick with xtrabackup-v2 for the time being.

To perform the first bootstrap for the cluster, run the bootstrap container (mariadb0) on host1:

$ docker run -d \
        --name mariadb0 \
        --hostname mariadb0.weave.local \
        --net weave \
        --publish "3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --env MYSQL_USER=proxysql \
        --env MYSQL_PASSWORD=proxysqlpassword \
        --volume /containers/mariadb1/datadir:/var/lib/mysql \
        --volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
        --wsrep_cluster_address=gcomm:// \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb0.weave.local

The parameters used in the the above command are:

--name, creates the container named "mariadb0",
--hostname, assigns the container a hostname "mariadb0.weave.local",
--net, places the container in the weave network for multi-host networing support,
--publish, exposes ports 3306, 4444, 4567, 4568 on the container to the host,
$(weave dns-args), configures DNS resolver for this container. This command can be translated into Docker run as "--dns=172.17.0.1 --dns-search=weave.local.",
--env MYSQL_ROOT_PASSWORD, the MySQL root password,
--env MYSQL_USER, creates "proxysql" user to be used later with ProxySQL for database routing,
--env MYSQL_PASSWORD, the "proxysql" user password,
--volume /containers/mariadb1/datadir:/var/lib/mysql, creates /containers/mariadb1/datadir if does not exist and map it with /var/lib/mysql (MySQL datadir) of the container (for bootstrap node, this could be skipped),
--volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d, mounts the files under directory /containers/mariadb1/conf.d of the Docker host, into the container at /etc/mysql/mariadb.conf.d.
mariadb:10.2.15, uses MariaDB 10.2.15 image from here,
--wsrep_cluster_address, Galera connection string for the cluster. "gcomm://" means bootstrap. For the rest of the containers, we are going to use a full address instead.
--wsrep_sst_auth, authentication string for SST user. Use the same user as root,
--wsrep_node_address, the node hostname, in this case we are going to use the FQDN provided by Weave.

The bootstrap container contains several key things:

The name, hostname and wsrep_node_address is mariadb0, but it uses the volumes of mariadb1.
The cluster address is "gcomm://"
There are two additional --env parameters - MYSQL_USER and MYSQL_PASSWORD. This parameters will create additional user for our proxysql monitoring purpose.

Verify with the following command:

$ docker ps
$ docker logs -f mariadb0

Once you see the following line, it indicates the bootstrap process is completed and Galera is active:

2018-05-30 23:19:30 139816524539648 [Note] WSREP: Synchronized with group, ready for connections

Create the directory to load our custom configuration file in the remaining hosts:

$ mkdir -p /containers/mariadb2/conf.d # on host2
$ mkdir -p /containers/mariadb3/conf.d # on host3

Then, copy the my.cnf that we've created for mariadb0 and mariadb1 to mariadb2 and mariadb3 respectively:

$ scp /containers/mariadb1/conf.d/my.cnf /containers/mariadb2/conf.d/ # on host1
$ scp /containers/mariadb1/conf.d/my.cnf /containers/mariadb3/conf.d/ # on host1

Next, create another 2 database containers (mariadb2 and mariadb3) on host2 and host3 respectively:

$ docker run -d \
        --name ${NAME} \
        --hostname ${NAME}.weave.local \
        --net weave \
        --publish "3306:3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/${NAME}/datadir:/var/lib/mysql \
        --volume /containers/${NAME}/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
    
--wsrep_cluster_address=gcomm://mariadb0.weave.local,mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=${NAME}.weave.local

** Replace ${NAME} with mariadb2 or mariadb3 respectively.

However, there is a catch. The entrypoint script checks the mysqld service in the background after database initialization by using MySQL root user without password. Since Galera automatically performs synchronization through SST or IST when starting up, the MySQL root user password will change, mirroring the bootstrapped node. Thus, you would see the following error during the first start up:

018-05-30 23:27:13 140003794790144 [Warning] Access denied for user 'root'@'localhost' (using password: NO)
MySQL init process in progress…
MySQL init process failed.

The trick is to restart the failed containers once more, because this time, the MySQL datadir would have been created (in the first run attempt) and it would skip the database initialization part:

$ docker start mariadb2 # on host2
$ docker start mariadb3 # on host3

Once started, verify by looking at the following line:

$ docker logs -f mariadb2
…
2018-05-30 23:28:39 139808069601024 [Note] WSREP: Synchronized with group, ready for connections

At this point, there are 3 containers running, mariadb0, mariadb2 and mariadb3. Take note that mariadb0 is started using the bootstrap command (gcomm://), which means if the container is automatically restarted by Docker in the future, it could potentially become disjointed with the primary component. Thus, we need to remove this container and replace it with mariadb1, using the same Galera connection string with the rest and use the same datadir and configdir with mariadb0.

First, stop mariadb0 by sending SIGTERM (to ensure the node is going to be shutdown gracefully):

$ docker kill -s 15 mariadb0

Then, start mariadb1 on host1 using similar command as mariadb2 or mariadb3:

$ docker run -d \
        --name mariadb1 \
        --hostname mariadb1.weave.local \
        --net weave \
        --publish "3306:3306" \
        --publish "4444" \
        --publish "4567" \
        --publish "4568" \
        $(weave dns-args) \
        --env MYSQL_ROOT_PASSWORD="PM7%cB43$sd@^1" \
        --volume /containers/mariadb1/datadir:/var/lib/mysql \
        --volume /containers/mariadb1/conf.d:/etc/mysql/mariadb.conf.d \
        mariadb:10.2.15 \
    
--wsrep_cluster_address=gcomm://mariadb0.weave.local,mariadb1.weave.local,mariadb2.weave.local,mariadb3.weave.local \
        --wsrep_sst_auth="root:PM7%cB43$sd@^1" \
        --wsrep_node_address=mariadb1.weave.local

This time, you don't need to do the restart trick because MySQL datadir already exists (created by mariadb0). Once the container is started, verify the cluster size is 3, the status must be in Primary and the local state is synced:

$ docker exec -it mariadb3 mysql -uroot "-pPM7%cB43$sd@^1" -e 'select variable_name, variable_value from information_schema.global_status where variable_name in ("wsrep_cluster_size", "wsrep_local_state_comment", "wsrep_cluster_status", "wsrep_incoming_addresses")'
+---------------------------+-------------------------------------------------------------------------------+
| variable_name             | variable_value                                                                |
+---------------------------+-------------------------------------------------------------------------------+
| WSREP_CLUSTER_SIZE        | 3                                                                             |
| WSREP_CLUSTER_STATUS      | Primary                                                                       |
| WSREP_INCOMING_ADDRESSES  | mariadb1.weave.local:3306,mariadb3.weave.local:3306,mariadb2.weave.local:3306 |
| WSREP_LOCAL_STATE_COMMENT | Synced                                                                        |
+---------------------------+-------------------------------------------------------------------------------+

At this point, our architecture is looking something like this:

Although the run command is pretty long, it well describes the container's characteristics. It's probably a good idea to wrap the command in a script to simplify the execution steps, or use a compose file instead.

Database Routing with ProxySQL

Now we have three database containers running. The only way to access to the cluster now is to access the individual Docker host’s published port of MySQL, which is 3306 (map to 3306 to the container). So what happens if one of the database containers fails? You have to manually failover the client's connection to the next available node. Depending on the application connector, you could also specify a list of nodes and let the connector do the failover and query routing for you (Connector/J, PHP mysqlnd). Otherwise, it would be a good idea to unify the database resources into a single resource, that can be called a service.

This is where ProxySQL comes into the picture. ProxySQL can act as the query router, load balancing the database connections similar to what "Service" in Swarm or Kubernetes world can do. We have built a ProxySQL Docker image for this purpose and will maintain the image for every new version with our best effort.

Before we run the ProxySQL container, we have to prepare the configuration file. The following is what we have configured for proxysql1. We create a custom configuration file under /containers/proxysql1/proxysql.cnf on host1:

$ cat /containers/proxysql1/proxysql.cnf
datadir="/var/lib/proxysql"
admin_variables=
{
        admin_credentials="admin:admin"
        mysql_ifaces="0.0.0.0:6032"
        refresh_interval=2000
}
mysql_variables=
{
        threads=4
        max_connections=2048
        default_query_delay=0
        default_query_timeout=36000000
        have_compress=true
        poll_timeout=2000
        interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
        default_schema="information_schema"
        stacksize=1048576
        server_version="5.1.30"
        connect_timeout_server=10000
        monitor_history=60000
        monitor_connect_interval=200000
        monitor_ping_interval=200000
        ping_interval_server=10000
        ping_timeout_server=200
        commands_stats=true
        sessions_sort=true
        monitor_username="proxysql"
        monitor_password="proxysqlpassword"
}
mysql_servers =
(
        { address="mariadb1.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb2.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb3.weave.local" , port=3306 , hostgroup=10, max_connections=100 },
        { address="mariadb1.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb2.weave.local" , port=3306 , hostgroup=20, max_connections=100 },
        { address="mariadb3.weave.local" , port=3306 , hostgroup=20, max_connections=100 }
)
mysql_users =
(
        { username = "sbtest" , password = "password" , default_hostgroup = 10 , active = 1 }
)
mysql_query_rules =
(
        {
                rule_id=100
                active=1
                match_pattern="^SELECT .* FOR UPDATE"
                destination_hostgroup=10
                apply=1
        },
        {
                rule_id=200
                active=1
                match_pattern="^SELECT .*"
                destination_hostgroup=20
                apply=1
        },
        {
                rule_id=300
                active=1
                match_pattern=".*"
                destination_hostgroup=10
                apply=1
        }
)
scheduler =
(
        {
                id = 1
                filename = "/usr/share/proxysql/tools/proxysql_galera_checker.sh"
                active = 1
                interval_ms = 2000
                arg1 = "10"
                arg2 = "20"
                arg3 = "1"
                arg4 = "1"
                arg5 = "/var/lib/proxysql/proxysql_galera_checker.log"
        }
)

The above configuration will:

configure two host groups, the single-writer and multi-writer group, as defined under "mysql_servers" section,
send reads to all Galera nodes (hostgroup 20) while write operations will go to a single Galera server (hostgroup 10),
schedule the proxysql_galera_checker.sh,
use monitor_username and monitor_password as the monitoring credentials created when we first bootstrapped the cluster (mariadb0).

Copy the configuration file to host2, for ProxySQL redundancy:

$ mkdir -p /containers/proxysql2/ # on host2
$ scp /containers/proxysql1/proxysql.cnf /container/proxysql2/ # on host1

Then, run the ProxySQL containers on host1 and host2 respectively:

$ docker run -d \
        --name=${NAME} \
        --publish 6033 \
        --publish 6032 \
        --restart always \
        --net=weave \
        $(weave dns-args) \
        --hostname ${NAME}.weave.local \
        -v /containers/${NAME}/proxysql.cnf:/etc/proxysql.cnf \
        -v /containers/${NAME}/data:/var/lib/proxysql \
        severalnines/proxysql

** Replace ${NAME} with proxysql1 or proxysql2 respectively.

We specified --restart=always to make it always available regardless of the exit status, as well as automatic startup when Docker daemon starts. This will make sure the ProxySQL containers act like a daemon.

Verify the MySQL servers status monitored by both ProxySQL instances (OFFLINE_SOFT is expected for the single-writer host group):

$ docker exec -it proxysql1 mysql -uadmin -padmin -h127.0.0.1 -P6032 -e 'select hostgroup_id,hostname,status from mysql_servers'
+--------------+----------------------+--------------+
| hostgroup_id | hostname             | status       |
+--------------+----------------------+--------------+
| 10           | mariadb1.weave.local | ONLINE       |
| 10           | mariadb2.weave.local | OFFLINE_SOFT |
| 10           | mariadb3.weave.local | OFFLINE_SOFT |
| 20           | mariadb1.weave.local | ONLINE       |
| 20           | mariadb2.weave.local | ONLINE       |
| 20           | mariadb3.weave.local | ONLINE       |
+--------------+----------------------+--------------+

At this point, our architecture is looking something like this:

All connections coming from 6033 (either from the host1, host2 or container's network) will be load balanced to the backend database containers using ProxySQL. If you would like to access an individual database server, use port 3306 of the physical host instead. There is no virtual IP address as single endpoint configured for the ProxySQL service, but we could have that by using Keepalived, which is explained in the next section.

Virtual IP Address with Keepalived

Since we configured ProxySQL containers to be running on host1 and host2, we are going to use Keepalived containers to tie these hosts together and provide virtual IP address via the host network. This allows a single endpoint for applications or clients to connect to the load balancing layer backed by ProxySQL.

As usual, create a custom configuration file for our Keepalived service. Here is the content of /containers/keepalived1/keepalived.conf:

vrrp_instance VI_DOCKER {
   interface ens33               # interface to monitor
   state MASTER
   virtual_router_id 52          # Assign one ID for this route
   priority 101
   unicast_src_ip 192.168.55.161
   unicast_peer {
      192.168.55.162
   }
   virtual_ipaddress {
      192.168.55.160             # the virtual IP
}

Copy the configuration file to host2 for the second instance:

$ mkdir -p /containers/keepalived2/ # on host2
$ scp /containers/keepalived1/keepalived.conf /container/keepalived2/ # on host1

Change the priority from 101 to 100 inside the copied configuration file on host2:

$ sed -i 's/101/100/g' /containers/keepalived2/keepalived.conf

**The higher priority instance will hold the virtual IP address (in this case is host1), until the VRRP communication is interrupted (in case host1 goes down).

Then, run the following command on host1 and host2 respectively:

$ docker run -d \
        --name=${NAME} \
        --cap-add=NET_ADMIN \
        --net=host \
        --restart=always \
        --volume /containers/${NAME}/keepalived.conf:/usr/local/etc/keepalived/keepalived.conf \ osixia/keepalived:1.4.4

** Replace ${NAME} with keepalived1 and keepalived2.

The run command tells Docker to:

--name, create a container with
--cap-add=NET_ADMIN, add Linux capabilities for network admin scope
--net=host, attach the container into the host network. This will provide virtual IP address on the host interface, ens33
--restart=always, always keep the container running,
--volume=/containers/${NAME}/keepalived.conf:/usr/local/etc/keepalived/keepalived.conf, map the custom configuration file for container's usage.

After both containers are started, verify the virtual IP address existence by looking at the physical network interface of the MASTER node:

$ ip a | grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.55.161/24 brd 192.168.55.255 scope global ens33
    inet 192.168.55.160/32 scope global ens33

The clients and applications may now use the virtual IP address, 192.168.55.160 to access the database service. This virtual IP address exists on host1 at this moment. If host1 goes down, keepalived2 will take over the IP address and bring it up on host2. Take note that the configuration for this keepalived does not monitor the ProxySQL containers. It only monitors the VRRP advertisement of the Keepalived peers.

At this point, our architecture is looking something like this:

Summary

So, now we have a MariaDB Galera Cluster fronted by a highly available ProxySQL service, all running on Docker containers.

In part two, we are going to look into how to manage this setup. We’ll look at how to perform operations like graceful shutdown, bootstrapping, detecting the most advanced node, failover, recovery, scaling up/down, upgrades, backup and so on. We will also discuss the pros and cons of having this setup for our clustered database service.

Happy containerizing!

Tags:

↧

Apache configuration - Debian/Ubuntu

Apache configuration - Red Hat, Centos

Troubleshooting

What is Chain Replication?

Why Use Chain Replication?

How to Setup a Chain Replica

Tips & Tricks for Chain Replication

Measuring Chain Replication Performance

Storing Backups in the Cloud

Extend your cluster with asynchronous replication

Load balancers in multi-datacenter

Introduction

PostgreSQL

MySQL

ClusterControl for PostgreSQL

So What Measures Can We Take to Mitigate the Number of Risks to Our PostgreSQL Installs?

A Photo Opportunity

Observations

Recommended Reading

I. CVE-2018-1052 date published 2018-02-09 : Update Date 3/10/2018

II. CVE-2018-1053 date published 2018-02-09 : Update Date 3/15/2018

III. CVE-2017-14798 date published 2018-03-01 : Update Date 3/26/2018

IV. CVE-2018-1058 date published 2018-03-02 : Update Date 3/22/2018

SQL Injection

Postgres Role Privileges

Strategies, Best Practices and Hardening

Conclusion

LDAP

LDAP + PostgreSQL

Understanding the above LDAP entry

Will that make LDAP secured enough?

Tips to achieve LDAP integration with PostgreSQL

Kerberos

PostgreSQL + Kerberos

Will that make Kerberos Secured enough ?

TIPS

RADIUS

RADIUS + PostgreSQL

Importance of SSL

Performance Impact of using external authentication systems

Integrating external authentication systems with ClusterControl - PostgreSQL

Introduction to Auditing

The Audit Lifecycle

Planning

Control Objectives

Findings

The Assessment Report

What is Audit Logging and Why Should You Do It?

Audit Logging with PostgreSQL

audit-trigger 91plus

Pgaudit

Write Ahead Log (WAL)

History of replication in PostgreSQL

Monitoring PostgreSQL Replication

Logical Replication

Requirements for MongoDB

Reboot Condition.

Network Stack

NTP Daemon

Linux User Limit

File System and Options

Security

Virtual Memory

Dirty ratio

Swappiness

Agenda

Speaker

Step 0 - Information Gathering

Step 1 - Check pg_stat_activity

1.1. Is the query waiting / blocked?

Step 2 - If the queries are running, why are they taking so long?

2.1. Is the planner running queries efficiently?

2.2. Are the tables involved bloated?

Step 3 - Check the logs

Checkpoints

Step 4 - What’s the health of the host system?

4.1. System load

4.2. System memory and SWAP

4.3. Disk access

4.4. Check the logs