How to Get Started with Open Source Database Management

February 27, 2018, 1:46 pm

≫ Next: Crash Course Webinar Replay: How to Get Started With Open Source Database Management

≪ Previous: How to Get Started with ClusterControl

Tuesday, February 27, 2018 - 10:00 to 11:15

Join Krzysztof Książek, Senior Support Engineer at Severalnines and expert Database Administrator (DBA), for this 60min crash course on managing open source databases!

If you’ve recently been tasked with taking care of an open source database and are not sure how to proceed; or have worked with open source databases before, but never had to manage them (in other words, you’re not a DBA) … this webinar is for you!

What should you focus on? What are the most important tasks of a DBA?

Krzysztof will share some tips to help you get started on your adventure in open source database management and will try to make it as database-agnostic as possible. In the end, no matter what database you end up managing, some of the tasks and skills required are the same.

So if you’re a sysadmin, DevOps, developer, system architect, … or a DBA in search of a refresher session, make sure to watch this webinar replay!

↧

Crash Course Webinar Replay: How to Get Started With Open Source Database Management

February 28, 2018, 12:03 am

≫ Next: Setting Up an Optimal Environment for PostgreSQL

≪ Previous: How to Get Started with Open Source Database Management

At Severalnines we write alot about advanced topics in the open source database world, but if you are just getting started with the technologies then this is the webinar for you. Watch as Krzysztof Książek, Senior Support Engineer at Severalnines and an expert in database administration, delivers a 60min crash course on how to get started with managing open source databases.

This webinar replay has been expanded from the blog of the same name recently published by Krzysztof.

If you’ve just been tasked with taking care of an open source database and are not sure how to proceed; or have worked with open source databases before, but never had to manage them (in other words, you’re not a DBA) … this webinar is for you! It introduces all the basic information you need to know to get started using MySQL, MongoDB, PostgreSQL and other open source technologies.

Krzysztof shares some open source database-agnostic tips because in the end, no matter what database you end up managing, some of the tasks and skills required are the same.

So if you’re a SysAdmin, DevOps, Developer, System Architect, IT Manager … or a new DBA in search of a refresher session, make sure to watch this replay!

Agenda

What does working with open source databases involve?
- Runbooks
- Testing procedures
Automation
- Why automate?
- What to automate?
Tips and tricks for typical procedures in open source database management
- Backups
- High Availability
- Disaster Recovery
- Monitoring
- Health Checks
- Performance Tuning

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

Tags:

open source database

Manage

howto

how to

open source database management

webinar

crash course

↧

Setting Up an Optimal Environment for PostgreSQL

March 1, 2018, 3:29 am

≫ Next: Failover for MySQL Replication (and others) - Should it be Automated?

≪ Previous: Crash Course Webinar Replay: How to Get Started With Open Source Database Management

Welcome to PostgreSQL, a powerful open source database system that can host anything from a few megabytes of customer data for a small-town-business, to hundreds of terabytes of ‘big data’ for multinational corporations. Regardless of the application, it’s likely that some setup and configuration help will be needed to get the database ready for action.

When a new server is installed, PostgreSQL’ s settings are very minimum as they are designed to run on the least amount of hardware possible. However they are very rarely optimal. Here, we will go over a basic setup for new projects, and how to set PostgreSQL up to run the most optimally on new projects.

Hosting

On-Premise Hosting

With an on-premise database, the best option is for a bare metal host, as Virtual Machines generally perform slower unless we’re talking about high end enterprise level VM’s. This also allows for tighter control over CPU, Memory, and Disk setups. This however comes with the need to have an expert on hand (or contract) to do server maintenance.

Cloud

Hosting a database in the cloud can be wonderful in some aspects, or a nightmare in others. Unless the cloud platform chosen is highly optimized (which generally means higher price), it may have trouble with higher load environments. Keep an eye out for whether or not the cloud server is shared or dedicated (dedicated allowing full performance from the server for the application), as well as the level of IOPS (Input/output Operations Per Second) provided by a cloud server. When (or if) the application grows to the point that the majority of data cannot be stored in memory, disk access speed is crucial.

General Host Setup

The main pillars needed to reliably set up PostgreSQL are based on the CPU, Memory, and Disk abilities of the host. Depending on the applications needs, a sufficient host as well as a well-tuned PostgreSQL configuration will have an amazing impact on the performance of the database system.

Choosing an Operating System

PostgreSQL can be compiled on most Unix-like operating systems, as well as Windows. However performance on Windows is not even comparable to a Unix-like system, so unless it’s for a small throw away project, sticking to an established Unix-like system will be the way to go. For this discussion, we’ll stick to Linux based systems.

The seemingly highest used Linux distribution used for hosting PostgreSQL is a Red Hat based system, such as CentOS or Scientific Linux, or even Red Hat itself. Since Red Hat and CentOS focus on stability and performance, the community behind these projects work hard to make sure important applications, such as databases, are on the most secure and most reliable build of Linux possible.

NOTE: Linux has a range of kernel versions that are not optimal for running PostgreSQL, so they are highly suggested to be avoided if possible (especially on applications where peak performance is the utmost importance). Benchmarks have shown that the number of transactions per second drop from kernel version 3.4 – 3.10, but recovers and significantly improves in kernel 3.12. This unfortunately rules out using CentOS 7 if going the CentOS route. CentOS 6 is still a valid and supported version of the Operating System, and CentOS 8 is expected to be released before 6 becomes unsupported.

Installation

Installation can be done either by source, or using repositories maintained by either the distribution of Linux chosen, or better yet, the PostgreSQL Global Development Group (PGDG), which maintains repositories for Red Hat based systems (Red Hat, Scientific Linux, CentOS, Amazon Linux AMI, Oracle Enterprise Linux, and Fedora), as well as packages for Debian and Ubuntu. Using the PGDG packages will ensure updates to PostgreSQL are available for update upon release, rather than waiting for the Linux distribution’s built in repositories to approve and provide them.

CPU

These days, it’s not hard to have multiple cores available for a database host. PostgreSQL itself has only recently started adding multi-threading capabilities on the query level, and will be getting much better in the years to come. But even without these new and upcoming improvements, PostgreSQL itself spawns new threads for each connection to the database by a client. These threads will essentially use a core when active, so number of cores required will depend on the level of needed concurrent connections and concurrent queries.

A good baseline to start out with is a 4 core system for a small application. Assuming applications do a dance between executing queries and sleeping, a 4 core system can handle a couple dozen connections before being overloaded. Adding more cores will help scale with an increasing workload. It’s not uncommon for very large PostgreSQL databases to have 48+ cores to serve many hundreds of connections.

Tuning Tips: Even if hyper-threading is available, transactions per second are generally higher when hyper-threading is disabled. For database queries that aren’t too complex, but higher in frequency, more cores is more important than faster cores.

Memory

Memory is an extremely important aspect for PostgreSQL’s overall performance. The main setting for PostgreSQL in terms of memory is shared_buffers, which is a chunk of memory allocated directly to the PostgreSQL server for data caching. The higher the likelihood of the needed data is living in memory, the quicker queries return, and quicker queries mean a more efficient CPU core setup as discussed in the previous section.

Queries also, at times, need memory to perform sorting operations on data before it’s returned to the client. This either uses additional ad-hoc memory (separate from shared_buffers), or temporary files on disk, which is much slower.

Tuning Tips: A basic starting point for setting shared_buffers is to set it to 1/4th the value of available system ram. This allows the operating system to also do its own caching of data, as well as any running processes other than the database itself.

Increasing work_mem can speed up sorting operations, however increasing it too much can force the host to run out of memory all together, as the value set can be partially or fully issued multiple times per query. If multiple queries request multiple blocks of memory for sorting, it can quickly add up to more memory than what is available on the host. Keep it low, and raise it slowly until performance is where desired.

Using the ‘free’ command (such as ‘free -h’), set effective_cache_size to a the sum of memory that’s free and cached. This lets the query planner know the level of OS caching may be available, and run better query plans.

Disk

Disk performance can be one of the more important things to consider when setting up a system. Input / Output speeds are important for large data loads, or fetching huge amounts of data to be processed. It also determines how quickly PostgreSQL can sync memory with disk to keep the memory pool optimal.

Some preparation in disks can help instantly improve potential performance, as well as future proof the database system for growth.

Separate disks
A fresh install of PostgreSQL will create the cluster’s data directory somewhere on the main (and possibly only) available drive on the system.
A basic setup using more drives would be adding a separate drive (or set of drives via RAID). It has the benefit of having all database related data transfer operating on a different I/O channel from the main operating system. It also allows the database to grow without fear of insufficient space causing issues and errors elsewhere in the operating system.
For databases with an extreme amount of activity, the PostgreSQL Transaction Log (xlog) directory can be placed on yet another drive, separating more heavy I/O to another channel away from the main OS as well as the main data directory. This is an advanced measure that helps squeeze more performance out of a system, that may otherwise be near its limits.
Using RAID
Setting up RAID for the database drives not only protects from data loss, it can also improve performance if using the right RAID configuration. RAID 1 or 10 are generally thought to be the best, and 10 offers parity and overall speed. RAID 5, however, while having higher levels of redundancy, suffers from significant performance decrease due to the way it spreads data around multiple disks. Plan out the best available option with plenty of space for data growth, and this will be a configuration that won’t need to be changed often, if at all.
Using SSD
Solid State Drives are wonderful for performance, and if they meet the budget, enterprise SSD’s can make heavy data processing workloads night and day faster. Smaller to medium databases with smaller to medium workloads may be overkill, but when fighting for the smallest percentage increase on large applications, SSD can be that silver bullet.

Tuning Tips: Chose a drive setup that is best for the application at hand, and has plenty of space to grow with time as the data increases.

If using a SSD, setting random_page_cost to 1.5 or 2 (the default is 4) will be beneficial to the query planner, since random data fetching is much quicker than seen on spinning disks.

Initial Configuration Settings

When setting up PostgreSQL for the first time, there’s a handful of configuration settings that can be easily changed based on the power of the host. As the application queries the database over time, specific tuning can be done based on the application’s needs. However that will be the topic for a separate tuning blog.

Memory Settings

shared_buffers: Set to 1/4th of the system memory. If the system has less than 1 GB of total memory, set to ~ 1/8th of total system memory

work_mem: The default is 4MB, and may even be plenty for the application in question. But if temp files are being created often, and those files are fairly small (tens of megabytes), it might be worth upping this setting. A conservative entry level setting can be (1/4th system memory / max_connections). This setting depends highly on the actual behavior and frequency of queries to the database, so should be only increased with caution. Be ready to reduce it back to previous levels if issues occur.

effective_cache_size: Set to the sum of memory that’s free and cached reported by the ‘free’ command.

Checkpoint Settings

For PostgreSQL 9.4 and below:
checkpoint_segments: A number of checkpoint segments (16 megabytes each) to give the Write Ahead Log system. The default is 3, and can safely be increased to 64 for even small databases.

For PostgreSQL 9.5 and above:
max_wal_size: This replaced checkpoint_segments as a setting. The default is 1GB, and can remain here until needing further changes.

Security

listen_address: This setting determines what personal IP addresses / Network Cards to listen to connections on. In a simple setup, there will likely be only one, while more advanced networks may have multiple cards to connect to multiple networks. * Signifies listen to everything. However, if the application accessing the database is to live on the same host as the database itself, keeping it as ‘localhost’ is sufficient.

Logging

Some basic logging settings that won’t overload the logs are as follows.

log_checkpoints = on
log_connections = on
log_disconnections = on
log_temp_files = 0

Tags:

↧

Failover for MySQL Replication (and others) - Should it be Automated?

March 2, 2018, 3:34 am

≫ Next: Top PG Clustering HA Solutions for PostgreSQL

≪ Previous: Setting Up an Optimal Environment for PostgreSQL

Automatic failover for MySQL Replication has been subject to debate for many years.

Is it a good thing or a bad thing?

For those with long memory in the MySQL world, they might remember the GitHub outage in 2012 which was mainly caused by software taking the wrong decisions.

GitHub had then just migrated to a combo of MySQL Replication, Corosync, Pacemaker and Percona Replication Manager. PRM decided to do a failover after failing health checks on the master, which was overloaded during a schema migration. A new master was selected, but it performed poorly because of cold caches. The high query load from the busy site caused PRM heartbeats to fail again on the cold master, and PRM then triggered another failover to the original master. And the problems just continued, as summarized below.

Source: Henrik Ingo & Massimo Brignoli’s at Percona Live 2013

Fast forward a couple of years and GitHub is back with a pretty sophisticated framework for managing MySQL Replication and automated failover! As Shlomi Noach puts it:

“To that effect, we employ automated master failovers. The time it would take a human to wake up & fix a failed master is beyond our expectancy of availability, and operating such a failover is sometimes non-trivial. We expect master failures to be automatically detected and recovered within 30 seconds or less, and we expect failover to result in minimal loss of available hosts.”

Most companies are not GitHub, but one could argue that no company likes outages. Outages are disruptive to any business, and they also cost money. My guess is that most companies out there probably wished they had some sort of automated failover, and the reasons not to implement it are probably the complexity of the existing solutions, lack of competence in implementing such solutions, or lack of trust in software to take such an important decision.

There are a number of automated failover solutions out there, including (and not limited to) MHA, MMM, MRM, mysqlfailover, Orchestrator and ClusterControl. Some of them have been on the market for a number of years, others are more recent. That is a good sign, multiple solutions mean that the market is there and people are trying to address the problem.

When we designed automatic failover within ClusterControl, we used a few guiding principles:

Make sure the master is really dead before you failover
In case of a network partition, where the failover software loses contact with the master, it will stop seeing it. But the master might be working well and can be seen by the rest of the replication topology.
ClusterControl gathers information from all the database nodes as well as any database proxies/load balancers used, and then builds a representation of the topology. It will not attempt a failover if the slaves can see the master, nor if ClusterControl is not 100% sure about the state of the master.
ClusterControl also makes it easy to visualize the topology of the setup, as well as the status of the different nodes (this is ClusterControl’s understanding of the state of the system, based on the information it gathers).
Failover only once
Much has been written about flapping. It can get very messy if the availability tool decides to do multiple failovers. That’s a dangerous situation. Each master elected, however brief the period it held the master role, might have their own sets of changes that were never replicated to any server. So you may end up with inconsistency across all the elected masters.
Do not failover to an inconsistent slave
When selecting a slave to promote as master, we ensure the slave does not have inconsistencies, e.g. errant transactions, as this may very well break replication.
Only write to the master
Replication goes from the master to the slave(s). Writing directly to a slave would create a diverging dataset, and that can be a potential source of problem. We set the slaves to read_only, and super_read_only in more recent versions of MySQL or MariaDB. We also advise the use of a load balancer, e.g., ProxySQL or MaxScale, to shield the application layer from the underlying database topology and any changes to it. The load balancer also enforces writes on the current master.
Do not automatically recover the failed master
If the master has failed and a new master has been elected, ClusterControl will not try to recover the failed master. Why? That server might have data that has not yet been replicated, and the administrator would need to do some investigation into the failure. Ok, you can still configure ClusterControl to wipe out the data on the failed master and have it join as a slave to the new master - if you are ok with losing some data. But by default, ClusterControl will let the failed master be, until someone looks at it and decides to re-introduce it into the topology.

So, should you automate failover? It depends on how you have configured replication. Circular replication setups with multiple write-able masters or complex topologies are probably not good candidates for auto failover. We would stick to the above principles when designing a replication solution.

On PostgreSQL

When it comes to PostgreSQL streaming replication, ClusterControl uses similar principles to automate failover. For PostgreSQL, ClusterControl supports both asynchronous and synchronous replication models between the master and the slaves. In both cases and in the event of failure, the slave with the most up-to-date data is elected as the new master. Failed masters are not automatically recovered/fixed to rejoin the replication setup.

There are a few protective measures taken to make sure the failed master is down and stays down, e.g. it is removed from the load balancing set in the proxy and it is killed if e.g. the user would restart it manually. It is a bit more challenging there to detect network splits between ClusterControl and the master, since the slaves do not provide any information about the status of the master they are replicating from. So a proxy in front of the database setup is important as it can provide another path to the master.

On MongoDB

MongoDB replication within a replicaset via the oplog is very similar to binlog replication, so how come MongoDB automatically recovers a failed master? The problem is still there, and MongoDB addresses that by rolling back any changes that were not replicated to the slaves at the time of failure. That data is removed and placed in a ‘rollback’ folder, so it is up to the administrator to restore it.

To find out more, check out ClusterControl; and feel free to comment or ask questions below.

Tags:

↧

Designing Open Source Databases for High Availability

March 5, 2018, 10:30 am

≫ Next: New Webinar on How to Design Open Source Databases for High Availability

≪ Previous: Top PG Clustering HA Solutions for PostgreSQL

It is said that if you are not designing for failure, then you are heading for failure. How do you design a database system from the ground up to withstand failure? This can be a challenge as failures happen in many different ways, sometimes in ways that would be hard to imagine. This is a consequence of the complexity of today’s database environments.

At Severalnines we’re big fans of high availability databases and have seen our fair share of failure scenarios across the thousands of database deployments we enable every year.

In this webinar, we’ll look at the different types of failures you might encounter and what mechanisms can be used to address them. We will also look at some of popular HA solutions used today, and how they can help you achieve different levels of availability.

Image:

Agenda:

Why design for High Availability?
High availability concepts
- CAP theorem
- PACELC theorem
Trade offs
- Deployment and operational cost
- System complexity
- Performance issues
- Lock management
Architecting databases for failures
- Capacity planning
- Redundancy
- Load balancing
- Failover and switchover
- Quorum and split brain
- Fencing
- Multi datacenter and multi-cloud setups
- Recovery policy
High availability solutions
- Database architecture determines Availability
- Active-Standby failover solution with shared storage or DRBD
- Master-slave replication
- Master-master cluster
Failover and switchover mechanisms
- Reverse proxy
- Caching
- Virtual IP address
- Application connector

Date & Time v2:

Tuesday, March 27, 2018 - 10:00 to 11:15

Tuesday, March 27, 2018 - 12:00 to 13:15

↧

New Webinar on How to Design Open Source Databases for High Availability

March 6, 2018, 6:35 am

≫ Next: How ClusterControl Configures Virtual IP and What to Expect During Failover

≪ Previous: Designing Open Source Databases for High Availability

Join us March 27th for this webinar on how to design open source databases for high availability with Ashraf Sharif, Senior Support Engineer at Severalnines. From discussing high availability concepts through to failover or switch over mechanisms, Ashraf will cover all the need-to-know information when it comes to building highly available database infrastructures.

It’s been said that not designing for failure leads to failure; but what is the best way to design a database system from the ground up to withstand failure?

Designing open source databases for high availability can be a challenge as failures happen in many different ways, which sometimes go beyond imagination. This is one of the consequences of the complexity of today’s open source database environments.

At Severalnines we’re big fans of high availability databases and have seen our fair share of failure scenarios across the thousands of database deployment attempts that we come across every year.

In this webinar, we’ll look at the different types of failures you might encounter and what mechanisms can be used to address them. We will also look at some of popular high availability solutions used today, and how they can help you achieve different levels of availability.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, March 27th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

North America/LatAm

Tuesday, March 27th at 09:00 PDT (US) / 12:00 EDT (US)

Agenda

Why design for High Availability?
High availability concepts
- CAP theorem
- PACELC theorem
Trade offs
- Deployment and operational cost
- System complexity
- Performance issues
- Lock management
Architecting databases for failures
- Capacity planning
- Redundancy
- Load balancing
- Failover and switchover
- Quorum and split brain
- Fencing
- Multi datacenter and multi-cloud setups
- Recovery policy
High availability solutions
- Database architecture determines Availability
- Active-Standby failover solution with shared storage or DRBD
- Master-slave replication
- Master-master cluster
Failover and switchover mechanisms
- Reverse proxy
- Caching
- Virtual IP address
- Application connector

Speaker

Ashraf Sharif is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

Tags:

database high availability

↧

How ClusterControl Configures Virtual IP and What to Expect During Failover

March 7, 2018, 2:59 am

≫ Next: Updated: ClusterControl Tips & Tricks - Transparent Database Failover for your Applications

≪ Previous: New Webinar on How to Design Open Source Databases for High Availability

Virtual IP address is an IP address that does not correspond to an actual physical network interface. It floats between multiple network interfaces and only one active interface will be holding the IP address for fault tolerance and mobility. ClusterControl uses Keepalived to provide virtual IP address integration with database load balancers to eliminate any single point of failure (SPOF) at the load balancer level.

In this blog post, we’ll show you how ClusterControl configures virtual IP address and what you can expect when failover or failback happens. Understanding this behaviour is vital in order to minimize any service interruption, and smoothen up maintenance operations that need to be performed occasionally.

Requirements

There are some requirements to run Keepalived in your network:

IP protocol 112 (Virtual Router Redundancy Protocol - VRRP) must be supported in the network. Some networks disable support for VRRP, especially inter-VLAN communications. Please verify this with the network administrator.
If you use multicast, the network must support multicast request (use ip a | grep -i multicast). Otherwise, you can use unicast via unicast_src_ip and unicast_peer options. Using multicast is useful when you have a dynamic environment like a cloud environment, or when IP assignment is performed through DHCP.
A set of VRRP instances must use a unique virtual_router_id value, that cannot be shared among other instances. Otherwise, you will see bogus packets and will likely break the master-backup switch over.
If you are running on a cloud environment like AWS, you probably need to use an external script (hint: use "notify" option) to dissociate and associate the virtual IP address (Elastic IP) so it is recognized and routable by the router.

Deploying Keepalived

In order to install Keepalived through ClusterControl, you need two or more load balancers installed by or imported into ClusterControl. For production usage, we highly recommend the load balancer software to be running on a standalone host and not co-located with your database nodes.

After you have at least two load balancers managed by ClusterControl, to install Keepalived and enable virtual IP address, just go to ClusterControl -> pick the cluster -> Manage -> Load Balancer -> Keepalived:

Most of the fields are self-explanatory. You can deploy a new set of Keepalived or import existing Keepalived instances. The important fields include the actual virtual IP address and the network interface where the virtual IP address will exist. If the hosts are using two different interface names, specify the interface name of the Keepalived 1 host, then manually modify the configuration file on Keepalived 2 with a correct interface name later on.

VRRP Instance

At the current time of writing, ClusterControl v1.5.1 installs Keepalived v1.3.5 (depending on the host operation system) and the following is what is configured for the VRRP instance:

vrrp_instance VI_PROXYSQL {
   interface eth0                # interface to monitor
   state MASTER
   virtual_router_id 51          # Assign one ID for this route
   priority 100
   unicast_src_ip 10.0.0.246
   unicast_peer {
      10.0.0.204
   }
   virtual_ipaddress {
       10.0.0.100                        # the virtual IP
   }
   track_script {
       chk_proxysql
   }
#    notify /usr/local/bin/notify_keepalived.sh
}

ClusterControl configures the VRRP instance to communicate through unicast. With unicast, we must define all unicast peers of the other Keepalived nodes. It is less dynamic but works most of the time. With multicast, you can remove those lines (unicast_*) and rely on multicast IP address for host discovery and peering. It's simpler but it is commonly blocked by network administrators.

The next part is the virtual IP address. You can specify multiple virtual IP addresses per VRRP instance, separated by new line. Load balancing in HAProxy/ProxySQL and Keepalived at the same time also requires the ability to bind to an IP address that is nonlocal, meaning that it is not assigned to a device on the local system. This allows a running load balancer instance to bind to an IP that is not local for failover. Thus ClusterControl also configures net.ipv4.ip_nonlocal_bind=1 inside /etc/sysctl.conf.

The next directive is the track_script, where you can specify the script to the health check process which is explained in the next section.

Health Checks

ClusterControl configures Keepalived to perform health checks by examining the error code returned by the track_script. In the Keepalived configuration file, which by default, is located at /etc/keepalived/keepalived.conf, you should see something like this:

   track_script {
       chk_proxysql
   }

Where it calls chk_proxysql which contains:

vrrp_script chk_proxysql {
   script "killall -0 proxysql"   # verify the pid existence
   interval 2                    # check every 2 seconds
   weight 2                      # add 2 points of prio if OK
}

The "killall -0" command returns exit code 0 if there is a process called "proxysql" running on the host. Otherwise, the instance would have to demote itself and start initiating failover as explained in the next section. Take note that Keepalived also supports Linux Virtual Server (LVS) components to perform health checks, where it's also capable of load balancing TCP/IP connections, similar to HAProxy, but that's out of the scope of this blog post.

Simulating Failover

For VRRP components, Keepalived uses VRRP protocol (IP protocol 112) to communicate between VRRP instances. The higher priority value of a MASTER means the master will always have the higher privilege to hold the virtual IP address, unless you configure the instance with "nopreempt". Let's use an example to better explain the failover and failback flow. Consider the following diagram:

There are three ProxySQL instances in front of three MySQL Galera nodes. Every ProxySQL host is configured with Keepalived as MASTER with the following priority number:

ProxySQL1 - priority 101
ProxySQL2 - priority 100
ProxySQL3 - priority 99

When Keepalived is started as a MASTER, it will first advertise the priority number to the members and then associate itself with the virtual IP address. As opposed to the BACKUP instance, it will only observe the advertisement and only assign the virtual IP address once it has confirmed it can elevate itself to a MASTER.

Take note that if you kill the "proxysql" or "haproxy" process manually via kill command, systemd process manager will by default attempt to recover the process that is being ungracefully stopped. Also, if you have ClusterControl auto recovery turned on, ClusterControl will always attempt to start the process even if you perform a clean shutdown via systemd (systemctl stop proxysql). To best simulate the failure, we suggest the user to turn off ClusterControl's automatic recovery feature or simply shut down the ProxySQL server to break the communication.

If we shut down ProxySQL1, the virtual IP address will be failed over to the next host which holds higher priority at that particular time (which is ProxySQL2) :

You would see the following in the syslog of the failed node:

Feb 27 05:21:59 proxysql1 systemd: Unit proxysql.service entered failed state.
Feb 27 05:21:59 proxysql1 Keepalived_vrrp[12589]: /usr/bin/killall -0 proxysql exited with status 1
Feb 27 05:21:59 proxysql1 Keepalived_vrrp[12589]: VRRP_Script(chk_proxysql) failed
Feb 27 05:21:59 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Changing effective priority from 103 to 101
Feb 27 05:22:00 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Received advert with higher priority 102, ours 101
Feb 27 05:22:00 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Entering BACKUP STATE
Feb 27 05:22:00 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) removing protocol VIPs.

While on the secondary node, the following happened:

Feb 27 05:22:00 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) forcing a new MASTER election
Feb 27 05:22:01 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) Transition to MASTER STATE
Feb 27 05:22:02 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) Entering MASTER STATE
Feb 27 05:22:02 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) setting protocol VIPs.
Feb 27 05:22:02 proxysql2 Keepalived_vrrp[7794]: Sending gratuitous ARP on eth0 for 10.0.0.100
Feb 27 05:22:02 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) Sending/queueing gratuitous ARPs on eth0 for 10.0.0.100
Feb 27 05:22:02 proxysql2 Keepalived_vrrp[7794]: Sending gratuitous ARP on eth0 for 10.0.0.100
Feb 27 05:22:02 proxysql2 avahi-daemon[346]: Registering new address record for 10.0.0.100 on eth0.IPv4.

In this case, the failover took around 3 seconds, with maximum failover time would be interval + advert_int. Behind the scenes, the database endpoint has changed and database traffic is being routed through ProxySQL2 without applications noticing.

When ProxySQL1 comes back online, it will force for a new MASTER election and take over the IP address due to higher priority:

Feb 27 05:38:34 proxysql1 Keepalived_vrrp[12589]: VRRP_Script(chk_proxysql) succeeded
Feb 27 05:38:35 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Changing effective priority from 101 to 103
Feb 27 05:38:36 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) forcing a new MASTER election
Feb 27 05:38:37 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Transition to MASTER STATE
Feb 27 05:38:38 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Entering MASTER STATE
Feb 27 05:38:38 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) setting protocol VIPs.
Feb 27 05:38:38 proxysql1 Keepalived_vrrp[12589]: Sending gratuitous ARP on eth0 for 10.0.0.100
Feb 27 05:38:38 proxysql1 Keepalived_vrrp[12589]: VRRP_Instance(VI_PROXYSQL) Sending/queueing gratuitous ARPs on eth0 for 10.0.0.100
Feb 27 05:38:38 proxysql1 avahi-daemon[343]: Registering new address record for 10.0.0.100 on eth0.IPv4.

At the same time, ProxySQL2 demotes itself to BACKUP state and removes the virtual IP address from the network interface:

Feb 27 05:38:36 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) Received advert with higher priority 103, ours 102
Feb 27 05:38:36 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) Entering BACKUP STATE
Feb 27 05:38:36 proxysql2 Keepalived_vrrp[7794]: VRRP_Instance(VI_PROXYSQL) removing protocol VIPs.
Feb 27 05:38:36 proxysql2 avahi-daemon[346]: Withdrawing address record for 10.0.0.100 on eth0.

At this point, ProxySQL1 is back online and becomes the active load balancer that serves the connections from applications and clients. VRRP will normally preempt a lower priority server when a higher priority server comes online. If you would like to make the IP address stay on ProxySQL2 after ProxySQL1 backs online, use "nopreempt" option. This allows the lower priority machine to maintain the master role, even when a higher priority machine comes back online. However, for this to work, the initial state of this entry must be BACKUP. Otherwise, you will notice the following line:

Feb 27 05:50:33 proxysql2 Keepalived_vrrp[6298]: (VI_PROXYSQL): Warning - nopreempt will not work with initial state MASTER

Since by default ClusterControl configures all nodes as MASTER, you have to configure the following configuration option for the respective VRRP instance accordingly:

vrrp_instance VI_PROXYSQL {
...
   state BACKUP
   nopreempt
...
}

Restart the Keepalived process to load these changes. The virtual IP address will only be failed over to ProxySQL1 or ProxySQL3 (depending on the priority and which node is available at that point of time) if the health check fails on ProxySQL2. In many cases, running Keepalived on two hosts will suffice.

Tags:

↧

Updated: ClusterControl Tips & Tricks - Transparent Database Failover for your Applications

March 8, 2018, 2:50 am

≫ Next: Join Us in Amsterdam for a Meetup with OptimaData & VidaXL

≪ Previous: How ClusterControl Configures Virtual IP and What to Expect During Failover

ClusterControl is a great tool to deploy and manage databases clusters - if you are into MySQL, you can easily deploy clusters based on both traditional MySQL master-slave replication, Galera Cluster or MySQL NDB Cluster. To achieve high availability, deploying a cluster is not enough though. Nodes may (and will most probably) go down, and your system has to be able to adapt to those changes.

This adaptation can happen at different levels. You can implement some kind of logic within the application - it would check the state of cluster nodes and direct traffic to the ones which are reachable at the given moment. You can also build a proxy layer which will implement high availability in your system. In this blog post, we’d like to share some tips on how you can achieve that using ClusterControl.

Deploying HAProxy using the ClusterControl

HAProxy is the standard - one of the most popular proxies used in connection with MySQL (but not only, of course). ClusterControl supports deployment and monitoring of HAProxy nodes. It also helps to implement high availability of the proxy itself using keepalived.

Deployment is pretty simple - you need to pick or fill in the IP address of a host where HAProxy will be installed, pick port, load balancing policy, decide if ClusterControl should use existing repository or the most recent source code to deploy HAProxy. You can also pick which backend nodes you’d like to have included in the proxy configuration, and whether they should be active or backup.

By default, the HAProxy instance deployed by ClusterControl will work on MySQL Cluster (NDB), Galera Cluster, PostgreSQL streaming replication and MySQL Replication. For master-slave replication, ClusterControl can configure two listeners, one for read-only and another one for read-write. Applications will then have to send reads and writes to the respective ports. For multi-master replication, ClusterControl will setup the standard TCP load-balancing based on least connection balancing algorithm (e.g., for Galera Cluster where all nodes are writeable).

Keepalived is used to add high availability to the proxy layer. When you have at least two HAProxy nodes in your system, you can install Keepalived from the ClusterControl UI.

You’ll have to pick two HAProxy nodes and they will be configured as an active - standby pair. A Virtual IP would be assigned to the active server and, should it fail, it will be reassigned to the standby proxy. This way you can just connect to the VIP and all your queries will be routed to the currently active and working HAProxy node.

You can find more details in how the internals are configured by reading through our HAProxy tutorial.

Deploying ProxySQL using ClusterControl

While HAProxy is a rock-solid proxy and very popular choice, it lacks database awareness, e.g., read-write split. The only way to do it in HAProxy is to create two backends and listen on two ports - one for reads and one for writes. This is, usually, fine but it requires you to implement changes in your application - the application has to understand what is a read and what is a write, and then direct those queries to the correct port. It’d be much easier to just connect to a single port and let the proxy decide what to do next - this is something HAProxy cannot do as what it does is just routing packets - no packet inspection is done and, especially, it has no understanding of the MySQL protocol.

ProxySQL solves this problem - it talks MySQL protocol and it can (among other things) perform a read-write split through its powerful query rules and route the incoming MySQL traffic according to various criterias. Installation of MaxScale from ClusterControl is simple - you want to go to Manage -> Load Balancer section and fill the “Deploy ProxySQL” tab with the required data.

In short, we need to pick where ProxySQL will be installed, what administration user and password it should have, which monitoring user it should use to connect to the MySQL backends and verify their status and monitor state. From ClusterControl, you can either create a new user to be used by the application - you can decide on its name, password, access to which databases are granted and what MySQL privileges that user will have. Such user will be created on both MySQL and ProxySQL side. Second option, more suitable for existing infrastructures, is to use the existing database users. You need to pass username and password, and such user will be created only on ProxySQL.

Finally, you need to answer a question: are you using implicit transactions? By that we understand transactions started by running SET autocommit=0; If you do use it, ClusterControl will configure ProxySQL to send all of the traffic to the master. This is required to ensure ProxySQL will handle transactions correctly in ProxySQL 1.3.x and earlier. If you don’t use SET autocommit=0 to create new transaction, ClusterControl will configure read/write split.

ProxySQL, as every proxy, can become a single point of failure and it has to be made redundant to achieve high availability. There are a couple of methods to do that. One of them is to collocate ProxySQL on the web nodes. The idea here is that, most of the time, the ProxySQL process will work just fine and the reason for its unavailability is that the whole node went down. In such case, if ProxySQL is collocated with the web node, not much harm has been done because that particular web node will not be available either.

Another method, is to use Keepalived in a similar way like we did in the case of HAProxy.

You can find more details in how the internals are configured by reading through our ProxySQL tutorial.

Tags:

↧

Join Us in Amsterdam for a Meetup with OptimaData & VidaXL

March 9, 2018, 2:54 am

≫ Next: Tips & Tricks for Navigating the PostgreSQL Community

≪ Previous: Updated: ClusterControl Tips & Tricks - Transparent Database Failover for your Applications

Severalnines, our partner OptimaData, and customer VidaXL are joining forces to present the meetup “How to Manage Fast Growing Databases” in Amsterdam featuring a myriad of technical information and tips on load balancing, automation, open source database management and more!

Join us on April 10th at Circl (Gustav Mahlerplein 1B, Amsterdam) where we will have 2 great talks lined up for you.

First Krzysztof Książek – Senior Support Engineer at Severalnines- will kick off and give you an extensive insight into the vast array of options to load balance your MySQL databases.

Furthermore we are very pleased that Zeger Knops- Head of Technology VidaXL has agreed to share his experiences with Database Automation within the fast growing international platform of VidaXL.

AGENDA

18:00 - 18:25: Arrival and Drinks
18:25 - 18:30: Welcome by organizers
18:30 - 19:15: Krzysztof Książek, Senior Support Engineer at Severalnines; Talk on MySQL Load balancers (Specifically MaxScale, ProxySQL, HAProxy, MySQL Router & nginx)
19:15 - 20:00: Zeger Knops, Head of Business Technology at VidaXL; Talk on Leveraging Database Automation in a Fast-paces Global Environment.

Program

Krzysztof Książek - Senior Support Engineer Severalnines

MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - a close-up look

Load balancing MySQL connections and queries using HAProxy has been popular in the past years. Recently however, we have seen the arrival of MaxScale, MySQL Router, ProxySQL and now also Nginx as a reverse proxy.For which use cases do you use them and how well do they integrate in your environment? This session aims to give a solid grounding in load balancer technologies for MySQL and MariaDB. We will review the main open-source options available: from application connectors (php-mysqlnd, jdbc), TCP reverse proxies (HAProxy, Keepalived, Nginx) and SQL-aware load balancers (MaxScale, ProxySQL, MySQL Router) and take a look into the best practices for backend health checks to ensure load balanced connections are routed to the correct nodes in several MySQL clustering topologies. You'll gain a good understanding of how the different options compare, and enough knowledge to decide which ones to explore further.

Zeger Knops – Head of Business Technology VidaXL

Leveraging database automation software in a fast paced international global e-commerce environment VidaXL is a rapidly growing international online retailer with is base in the Netherlands. Currently the company operates with 1000 employees 29 local webshops in Europe, US and Australia. Each business day it processes 15000 orders per day for their 2.5 million customers world wide. VidaXL is in the process of expanding its product catalogue to over 10,000,000 items within a short period of time. Scaling from thousands to millions of products is a giant leap and required a strong infrastructure foundation and high performing and high available (MySQL and MongoDB) databases. To achieve this VidaXL has opted for database management automation software (ClusterControl) to achieve that In his talk Zeger will share his experience with you.

Location

This meetup will be held at the magnificent circular building Circl in Amsterdam. In the Event space where this meetup will take place at Circl you will get a welcome drink, before the start of the agenda.

Circl is very easy to reach. From the Amsterdam Zuid train station you can walk to Circl in a few minutes. At station Amsterdam Zuid also stop metro line 51, metro line 50 and tram 5. Also bus number 62 stops nearby, stop Hogewerf.

Travelling by car? The address of Circl is Gustav Mahlerplein 1B. Parking is possible at Q-Park Mahler (Aaron Coplandstraat 8, Amsterdam) or Q-Park Symphony (Leo Smitstraat 4, Amsterdam).

Learn More

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. Severalnines is often called the “anti-startup” as it is entirely self-funded by its founders. The company has enabled over 32,000 deployments to date via its popular product ClusterControl. Currently counting BT, Orange, Cisco, CNRS, Technicolor, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore, Japan and the United States.

About OptimaData BV

OptimaData is a full-service, multi-platform database services provider. OptimaData provides all services related to database management such as consultancy, managed services and training. In addition, OptimaData provides recruitment services for temporary and permanent database staff. OptimaData is a trusted partner for database related expertise and services for medium and large companies such as Travix, IceMobile, Budget Energie, Basecone and Volksbank.

Read About Our Partnership

About VidaXL

vidaXL is a rapidly growing international online retailer. Our success is based on our belief that things can always be better and cheaper: ‘Expect more’. Because nobody likes to pay too much for products. We are continually expanding our product range and offer the best products for the best price. We like to go the extra mile for our customers by improving popular products and making them even cheaper.

Tags:

↧

Tips & Tricks for Navigating the PostgreSQL Community

March 12, 2018, 8:53 am

≫ Next: Key Things to Monitor in PostgreSQL - Analyzing Your Workload

≪ Previous: Join Us in Amsterdam for a Meetup with OptimaData & VidaXL

This blog is about the PostgreSQL community, how it works and how best to navigate it. Note that this is merely an overview ... there is a lot of existing documentation.

Overview of the Community, How Development Works

PostgreSQL is developed and maintained by a globally-dispersed network of highly skilled volunteers passionate about relational database computing referred to as the PostgreSQL Global Development Group. A handful of core team members together handle special responsibilities like coordinating release activities, special internal communications, policy announcements, overseeing commit privileges and the hosting infrastructure, disciplinary and other leadership issues as well as individual responsibility for specialty coding, development, and maintenance contribution areas. About forty additional individuals are considered major contributors who have, as the name implies, undertaken comprehensive development or maintenance activities for significant codebase features or closely related projects. And several dozen more individuals are actively making various other contributions. Aside from the active contributors, a long list of past contributors are recognized for work on the project. It is the skill and high standards of this team that has resulted in the rich and robust feature set of PostgreSQL.

Many of the contributors have full-time jobs that relate directly to PostgreSQL or other Open Source software, and the enthusiastic support of their employers makes their enduring engagement with the PostgreSQL community feasible.

Contributing individuals coordinate using collaboration tools such as Internet Relay Chat (irc://irc.freenode.net/PostgreSQL) and PostgreSQL community mailing lists (https://www.PostgreSQL.org/community/lists). If you are new to IRC or mailing lists, then make an effort specifically to read up on etiquette and protocols (one good article appears at https://fedoramagazine.org/beginners-guide-irc/), and after you join, spend some time just listening to on-going conversations and search the archives for previous similar questions before jumping in with your own issues.

Note that the team is not static: Anyone can become a contributor by, well, contributing … but your contribution will be expected to meet those same high standards!

The team maintains a Wiki page (https://wiki.postgresql.org/) that, amongst a lot of very detailed and helpful information like articles, tutorials, code snippets and more, presents a TODO list of PostgreSQL bugs and feature requests and other areas where effort might be needed. If you want to be part of the team, this is a good place to browse. Items are added only after thorough discussion on the developer mailing list.

The community follows a process, visualized as the steps in Figure 1.

Figure 1. Conceptualized outline of the PostgreSQL development process.

That is, the value of any non-trivial new code implementation is expected to be first discussed and deemed (by consensus) desirable. Then investment is made in design: design of the interface, syntax, semantics and behaviors, and consideration of backward compatibility issues. You want to get buy-in from the developer community on what is the problem to be solved and what this implementation will accomplish. You definitely do NOT want to go off and develop something in a vacuum on your own. There’s literally decades worth of very high quality collective experience embodied in the team, and you want, and they expect, to have ideas vetted early.

The PostgreSQL source code is stored and managed using the Git version control system, so a local copy can be checked out from https://git.postgresql.org/ to commence implementation. Note that for durable maintainability, patches must blend in with surrounding code and follow the established coding conventions (http://developer.postgresql.org/pgdocs/postgres/source.html), so it is a good idea to study any similar code sections to learn and emulate the conventions. Generally, the standard format BSD style is used. Also, be sure to update documentation as appropriate.

Testing involves first making sure existing regression tests succeed and that there are no compiler warnings, but also adding corresponding new tests to exercise the newly-implemented feature(s).

When the new functionality implementation in your local repository is complete, use the Git diff functionality to create a patch. Patches are submitted via email to the pgsql-hackers mailing list for review and comments, but you don’t have to wait until your work is complete … smart practise would be to ask for feedback incrementally. The Wiki page describes expectations as to format and helpful explanatory context and how to show respect for code reviewer’s time.

The core developers periodically schedule commit fests, during which all accumulated unapplied patches are added to the source code repository by authorized committers. As a contributor, your code will have undergone rigorous review and likely your own developer skills will be the better for it. To return the favor, there is an expectation that you will devote time to reviewing patches from others.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Top Websites to Get Information or Learn PostgreSQL

Community Website - this is the main launching place into life with PostgreSQL	https://www.postgresql.org/
Wiki - Wide-ranging topics related to PostgreSQL	https://wiki.postgresql.org/
IRC Channel - Developers are active participants here	irc://irc.freenode.net/PostgreSQL
Source code repository	https://git.postgresql.org/
pgAdmin GUI client	https://www.pgadmin.org/
Biographies of significant community members	https://www.postgresql.org/community/contributors/
Eric Raymond’s famous post on smart questions	http://www.catb.org/esr/faqs/smart-questions.html
Database schema change control	http://sqitch.org/
Database unit testing	http://pgtap.org/

The Few Tools You Can’t Live Without

The fundamental command line tools for working with a PostgreSQL database are part of the normal distribution. The workhorse is the psql command line utility, which provides an interactive interface with lots of functionality for querying, displaying, and modifying database metadata, as well as executing data definition (DDL) and data manipulation (DML) statements.

Other included utilities of note include pg_basebackup for establishing a baseline for replication-based backup, pg_dump for extracting a database into a script file or other archive file, pg_restore for restoring a from a pg_dump archive, and others. All of these tools have excellent manual pages as well as being detailed in the standard documentation and numerous on-line tutorials.

pgAdmin is a very popular graphical user interface tool that provides similar functionality as the psql command line utility, but with point-and-click convenience. Figure 2 shows a screenshot of pgAdmin III. On the left is a panel showing all the database objects in the cluster on the attached-to host server. You can drill down into the structure to list all databases, schemas, tables, views, functions, etc, and even open tables and views to examine the contained data. For each object, the tool will create the SQL DML for dropping and re-creating the object, too, as shown on the lower right panel. This is a convenient way to make modifications during database development.

Figure 2. The pgAdmin III utility.

A couple of my favorites tools for application developer teams are Sqitch (http://sqitch.org/), for database change control, and pgTAP (http://pgtap.org/). Sqitch enables stand-alone change management and iterative development by means of scripts written in the SQL dialect native to your implementation, not just PostgreSQL. For each database design change, you write three scripts: one to deploy the change, one to undo the change in case reverting to a previous version is necessary, and one to verify or test the change. The scripts and related files can be maintained in your revision control system right alongside your application code. PgTAP is a testing framework that includes a suite of functionality for verifying integrity of the database. All the pgTAP scripts are similarly plain text files compliant with normal revision management and change control processes. Once I started using these two tools, I found it hard to imagine ever again doing database work without.

Tips and Tricks

The PostgreSQL general mailing list is the most active of the various community lists and is the main community interface for free support to users. A pretty broad range of questions appear on this list, sometimes generating lengthy back-and-forth, but most often getting quick, informative, and to-the-point responses.

When posting a question related to using PostgreSQL, you generally want to always include background information including the version of PostgreSQL you are using (listed by the psql command line tool with “psql --version”), the operating system on which the server is running, and then maybe a description of the operating environment, such as whether it may be predominately read heavy or write heavy, typical number of users and concurrency concerns, changes you have made from the default server configuration (i.e., the pg_hba.conf and postgresql.conf files), etc. Oftentimes, a description of what you are trying to accomplish is valuable, rather than some obtuse analogy, as you may well get suggestions for improvement that you had not even thought of on your own. Also, you will get the best response if you include actual DDL, DML, and sample data illustrating the problem and facilitating others to recreate what you are seeing -- yes, people will actually run your sample code and work with you.

Additionally, if you are asking about improving query performance, you will want to provide the query plan, i.e., the EXPLAIN output. This is generated by running your query unaltered except for prefixing it literally with the word “EXPLAIN”, as shown in Figure 3 in the pgAdmin tool or the psql command line utility.

Figure 3. Producing a query plan with EXPLAIN.

Under EXPLAIN, instead of actually running the query, the server returns the query plan, which lists detailed output of how the query will be executed, including which indexes will be used to optimize data access, where table scans might happen, and estimates of the cost and amount of data involved with each step. The kind of help you will get from the experienced practitioners monitoring the mailing list may pinpoint issues and help to suggest possible new indexes or changes to the filtering or join criteria.

Lastly, when participating in mailing list discussions there are two important things you want to keep in mind.

First, the mail list server is set up to send messages configured so that when you reply, by default your email software will reply only to the original message author. To be sure your message goes to the list, you must use your mail software “reply-all” feature, which will then include both the message author and the list address.

Second, the convention on the PostgreSQL mailing lists is to reply in-line and to NOT TOP POST. This last point is a long-standing convention in this community, and for many newcomers seems unusual enough that gentle admonishments are very common. Opinions vary on how much of the original message to retain for context in your reply. Some people chafe at the sometimes unwieldy growth in size of the message when the entire original message is retained in lots of back-and-forth discussion. Me personally, I like to delete anything that is not relevant to what specifically I am replying to so as to keep the message terse and focussed. Just bear in mind that there is decades of mailing list history retained on-line for historical documentation and future research, so retaining context and flow IS considered very important.

This article gets you started, now go forth, and dive in!

Tags:

↧

Key Things to Monitor in PostgreSQL - Analyzing Your Workload

March 14, 2018, 4:24 am

≫ Next: Comparing Oracle RAC HA Solution to Galera Cluster for MySQL or MariaDB

≪ Previous: Tips & Tricks for Navigating the PostgreSQL Community

Key Things to Monitor in PostgreSQL - Analyzing Your Workload

In computer systems, monitoring is the process of gathering metrics, analyzing, computing statistics and generating summaries and graphs regarding the performance or the capacity of a system, as well as generating alerts in case of unexpected problems or failures which require immediate attention or action. Therefore, monitoring has two uses: one for historic data analysis and presentation which help us identify medium and long term trends within our system and thus help us plan for upgrades, and a second one for immediate action in case of trouble.

Monitoring helps us identify problems and react to those problems concerning a wide range of fields such as:

Infrastructure/Hardware (physical or virtual)
Network
Storage
System Software
Application Software
Security

Monitoring is a major part of the work of a DBA. PostgreSQL, traditionally, has been known to be “low-maintenance” thanks to its sophisticated design and this means that the system can live with low attendance when compared to other alternatives. However, for serious installations where high availability and performance are of key importance, the database system has to be regularly monitored.

The role of the PostgreSQL DBA can step up to higher levels within the company’s hierarchy besides strictly technical: apart from basic monitoring and performance analysis, must be able to spot changes in usage patterns, identify the possible causes, verify the assumptions and finally translate the findings in business terms. As an example, the DBA must be able to identify some sudden change in a certain activity that might be linked to a possible security threat. So the role of the PostgreSQL DBA is a key role within the company, and must work closely with other departmental heads in order to identify and solve problems that arise. Monitoring is a great part of this responsibility.

PostgreSQL provides many out of the box tools to help us gather and analyze data. In addition, due to its extensibility, it provides the means to develop new modules into the core system.

PostgreSQL is highly dependent on the system (hardware and software) it runs on. We cannot expect a PostgreSQL server to perform good if there are problems in any of the vital components in the rest of the system. So the role of the PostgreSQL DBA overlaps with the role of the sysadmin. Below, as we examine what to watch in PostgreSQL monitoring, we will encounter both system-dependent variables and metrics as well as PostgreSQL’s specific figures.

Monitoring does not come for free. A good investment must be put in it by the company/organization with a commitment to manage and maintain the whole monitoring process. It also adds a slight load on the PostgreSQL server as well. This is little to worry about if everything is configured correctly, but we must keep in mind that this can be another way to misuse the system.

System Monitoring Basics

Important variables in System monitoring are:

CPU Usage
Network Usage
Disk Space / Disk Utilization
RAM Usage
Disk IOPS
Swap space usage
Network Errors

Here is an example of ClusterControl showing graphs for some critical PostgreSQL variables coming from pg_stat_database and pg_stat_bgwriter (which we will cover in the following paragraphs) while running pgbench -c 64 -t 1000 pgbench twice:

We notice that we have a peak on blocks-read in the first run, but we get close to zero during the second run as all blocks are found in shared_buffers.

Other variables of interest are paging activity, interrupts, context switches, among others. There is a plethora of tools to use in Linux/BSDs and unix or unix-like systems. Some of them are:

ps: for a list of the processes running
top/htop/systat: for system (CPU / memory) utilization monitoring
vmstat: for general system activity (including virtual memory) monitoring
iostat/iotop/top -mio: for IO monitoring
ntop: for network monitoring

Here is an example of vmstat on a FreeBSD box during a query which requires some disk reads and also some computation:

procs  memory      page                         disks      faults          cpu
r b w  avm   fre   flt   re  pi  po   fr    sr  ad0 ad1  in     sy    cs us sy id
0 0 0  98G  666M   421   0   0   0   170  2281    5  0  538   6361  2593  1  1 97
0 0 0  98G  665M   141   0   0   0     0  2288   13  0  622  11055  3748  3  2 94
--- query starts here ---
0 0 0  98G  608M   622   0   0   0   166  2287 1072  0 1883  16496 12202  3  2 94
0 0 0  98G  394M   101   0   0   0     2  2284 4578  0 5815  24236 39205  3  5 92
2 0 0  98G  224M  4861   0   0   0  1711  2287 3588  0 4806  24370 31504  4  6 91
0 0 0  98G  546M    84 188   0   0 39052 41183 2832  0 4017  26007 27131  5  7 88
2 0 0  98G  469M   418   0   0   1   397  2289 1590  0 2356  11789 15030  2  2 96
0 0 0  98G  339M   112   0   0   0   348  2300 2852  0 3858  17250 25249  3  4 93
--- query ends here ---
1 0 0  98G  332M  1622   0   0   0   213  2289    4  0  531   6929  2502  3  2 95

Repeating the query we would not notice any new burst in disk activity since those blocks of disk would already be in the OS’s cache. Although, the PostgreSQL DBA must be able to fully understand what is happening in the underlying infrastructure where the database runs, more complex system monitoring is usually a job for the sysadmin, as this is a large topic in itself.

In linux, a very handy shortcut for the top utility is pressing “C”, which toggles showing the command line of the processes. PostgreSQL by default rewrites the command line of the backends with the actual SQL activity they are running at the moment and also the user.

PostgreSQL Monitoring Basics

Important variables in PostgreSQL monitoring are:

Buffer cache performance (cache hits vs disk reads)
Number of commits
Number of connections
Number of sessions
Checkpoints and bgwriter statistics
Vacuums
Locks
Replication
And last but definitely not least, queries

Generally there are two ways in a monitoring setup to perform data collection:

To acquire data via a Log
To acquire data by querying PostgreSQL system

Log file-based data acquisition depends on the (properly configured) PostgreSQL log. We can use this kind of logging for “off-line” processing of the data. Log file-based monitoring is best suited when minimal overhead to the PostgreSQL server is required and when we don’t care about live data or about getting live alerts (although live monitoring using log file data can be possible by e.g. directing postgresql log to syslog and then streaming syslog to another server dedicated for log processing).

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

PostgreSQL Statistics Collector

PostgreSQL provides a rich set of views and functions readily available via the Statistics Collector subsystem. Again those data are divided in two categories:

Dynamic information on what the system is doing at the moment.
Statistics accumulated since the statistics collector subsystem was last reset.

Dynamic statistics views provide info about current activity per process (pg_stat_activity), status of physical replication (pg_stat_replication), status of physical standby (pg_stat_wal_receiver) or logical (pg_stat_subscription), ssl (pg_stat_ssl) and vacuum (pg_stat_progress_vacuum).

Collected statistics views provide info about important background processes such as the wal archiver, the bgwriter, and database objects: user or system tables, indexes, sequences and functions as well as the databases themselves.

It should be quite obvious by now that there are multiple ways to categorize data related to monitoring:

By source:
- System tools (ps, top, iotop, etc)
- PgSQL Log file
- Database
  - Dynamic
  - Collected
By specific database operation:
- Buffer cache
- Commits
- Queries
- Sessions
- Checkpoints
- Etc

After reading this article and experimenting with the notions, concepts and terms presented, you should be able to make a 2D matrix with all the possible combinations. As an example, the specific PostgreSQL activity (SQL command) can be found using: ps or top (system utilities), the PostgreSQL log files, pg_stat_activity (dynamic view), but also using pg_stat_statements an extension found in contrib (collected stats view). Likewise, information about locks can be found in the PostgreSQL log files, pg_locks and pg_stat_activity (presented just below) using wait_event and wait_event_type. Because of this, it is difficult covering the vast area of monitoring in a uni-dimensional linear fashion, and the author risks creating confusion to the reader because of this. In order to avoid this we will cover monitoring roughly by following the course of the official documentation, and adding related information as needed.

Dynamic Statistics Views

Using pg_stat_activity we are able to see what is the current activity by the various backend processes. For instance if we run the following query on table parts with about 3M rows:

testdb=# \d parts
                         Table "public.parts"
   Column   |          Type          | Collation | Nullable | Default
------------+------------------------+-----------+----------+---------
 id         | integer                |           |          |
 partno     | character varying(20)  |           |          |
 partname   | character varying(80)  |           |          |
 partdescr  | text                   |           |          |
 machine_id | integer                |           |          |
 parttype   | character varying(100) |           |          |
 date_added | date                   |           |          |

And lets run the following query, which needs some seconds to complete:

testdb=# select avg(age(date_added)) FROM parts;

By opening a new terminal and running the following query, while the previous is still running, we get:

testdb=# select pid,usename,application_name,client_addr,backend_start,xact_start,query_start,state,backend_xid,backend_xmin,query,backend_type from pg_stat_activity where datid=411547739 and usename ='achix' and state='active';
-[ RECORD 1 ]----+----------------------------------------
pid              | 21305
usename          | achix
application_name | psql
client_addr      |
backend_start    | 2018-03-02 18:04:35.833677+02
xact_start       | 2018-03-02 18:04:35.832564+02
query_start      | 2018-03-02 18:04:35.832564+02
state            | active
backend_xid      |
backend_xmin     | 438132638
query            | select avg(age(date_added)) FROM parts;
backend_type     | background worker
-[ RECORD 2 ]----+----------------------------------------
pid              | 21187
usename          | achix
application_name | psql
client_addr      |
backend_start    | 2018-03-02 18:02:06.834787+02
xact_start       | 2018-03-02 18:04:35.826065+02
query_start      | 2018-03-02 18:04:35.826065+02
state            | active
backend_xid      |
backend_xmin     | 438132638
query            | select avg(age(date_added)) FROM parts;
backend_type     | client backend
-[ RECORD 3 ]----+----------------------------------------
pid              | 21306
usename          | achix
application_name | psql
client_addr      |
backend_start    | 2018-03-02 18:04:35.837829+02
xact_start       | 2018-03-02 18:04:35.836707+02
query_start      | 2018-03-02 18:04:35.836707+02
state            | active
backend_xid      |
backend_xmin     | 438132638
query            | select avg(age(date_added)) FROM parts;
backend_type     | background worker

The pg_stat_activity view gives us info about the backend process, the user, the client, the transaction, the query, the state as well as a comprehensive info about the waiting status of the query.

But why 3 rows? In versions >=9.6, if a query can be run in parallel, or portions of it can be run in parallel, and the optimizer thinks that parallel execution is the fastest strategy, then it creates a Gather or Gather Merge node, and then requests at most max_parallel_workers_per_gather background worker processes, which by default is 2, hence the 3 rows we see in the output above. We can tell apart the client backend process from the background worker by using the backend_type column. For the pg_stat_activity view to be enabled you’ll have to make sure that the system configuration parameter track_activities is on. The pg_stat_activity provides rich information in order to determine blocked queries by the use of wait_event_type and wait_event columns.

A more refined way to monitor statements is via the pg_stat_statements contrib extension, mentioned earlier. On a recent Linux system (Ubuntu 17.10, PostgreSQL 9.6), this can be installed fairly easy:

testdb=# create extension pg_stat_statements ;
CREATE EXTENSION
testdb=# alter system set shared_preload_libraries TO 'pg_stat_statements';
ALTER SYSTEM
testdb=# \q
postgres@achix-dell:~$ sudo systemctl restart postgresql
postgres@achix-dell:~$ psql testdb
psql (9.6.7)
Type "help" for help.

testdb=# \d pg_stat_statements

Let’s create a table with 100000 rows, and then reset pg_stat_statements, restart the PostgreSQL server, perform a select on this table on the (still cold) system, and then see the contents of pg_stat_statements for the select:

testdb=# select 'descr '||gs as descr,gs as id into medtable from  generate_series(1,100000) as gs;
SELECT 100000
testdb=# select pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------
 
(1 row)

testdb=# \q
postgres@achix-dell:~$ sudo systemctl restart postgresql
postgres@achix-dell:~$ psql testdb -c 'select * from medtable'> /dev/null
testdb=# select shared_blks_hit,shared_blks_read from pg_stat_statements where query like '%select%from%medtable%';
 shared_blks_hit | shared_blks_read
-----------------+------------------
               0 |              541
(1 row)

testdb=#

Now let’s perform the select * once more and then look again in the contents of pg_stat_statements for this query:

postgres@achix-dell:~$ psql testdb -c 'select * from medtable'> /dev/null
postgres@achix-dell:~$ psql testdb
psql (9.6.7)
Type "help" for help.

testdb=# select shared_blks_hit,shared_blks_read from pg_stat_statements where query like '%select%from%medtable%';
 shared_blks_hit | shared_blks_read
-----------------+------------------
             541 |              541
(1 row)

So, the second time the select statement finds all the required blocks in the PostgreSQL shared buffers, and pg_stat_statements reports this via shared_blks_hit. pg_stat_statements provides info about the total number of calls of a statement, the total_time, min_time, max_time and mean_time, which can be extremely helpful when trying to analyze the workload of your system. A slow query that is run very frequently should require immediate attention. Similarly, consistently low hit rates may signify the need to review the shared_buffers setting.

pg_stat_replication provides info on the current status of replication for each wal_sender. Let’s suppose we have setup a simple replication topology with our primary and one hot standby, then we may query pg_stat_replication on the primary (doing the same on the standby will yield no results unless we have setup cascading replication and this specific standby serves as an upstream to other downstream standbys) to see the current status of replication:

testdb=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 1317
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 10.0.2.2
client_hostname  |
client_port      | 48192
backend_start    | 2018-03-03 11:59:21.315524+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/3029DB8
write_lsn        | 0/3029DB8
flush_lsn        | 0/3029DB8
replay_lsn       | 0/3029DB8
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

The 4 columns sent_lsn, write_lsn, flush_lsn, replay_lsn tell us the exact WAL position at each stage of the replication process at the remote standby. Then we create some heavy traffic on the primary with a command like:

testdb=# insert into foo(descr) select 'descr ' || gs from generate_series(1,10000000) gs;

And look at pg_stat_replication again:

postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 1317
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 10.0.2.2
client_hostname  |
client_port      | 48192
backend_start    | 2018-03-03 11:59:21.315524+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/D5E0000
write_lsn        | 0/D560000
flush_lsn        | 0/D4E0000
replay_lsn       | 0/C5FF0A0
write_lag        | 00:00:04.166639
flush_lag        | 00:00:04.180333
replay_lag       | 00:00:04.614416
sync_priority    | 0
sync_state       | async

Now we see that we have a delay between the primary and the standby depicted in the sent_lsn, write_lsn, flush_lsn, replay_lsn values. Since PgSQL 10.0 the pg_stat_replication also shows the lag between a recently locally flushed WAL and the time it took to be remotely written, flushed and replayed respectively. Seeing nulls in those 3 columns means that the primary and the standby are in sync.

The equivalent of pg_stat_replication on the standby side is called: pg_stat_wal_receiver:

testdb=# select * from pg_stat_wal_receiver ;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 17867
status                | streaming
receive_start_lsn     | 0/F000000
receive_start_tli     | 1
received_lsn          | 0/3163F210
received_tli          | 1
last_msg_send_time    | 2018-03-03 13:32:42.516551+00
last_msg_receipt_time | 2018-03-03 13:33:28.644394+00
latest_end_lsn        | 0/3163F210
latest_end_time       | 2018-03-03 13:32:42.516551+00
slot_name             | fbsdclone
conninfo              | user=postgres passfile=/usr/local/var/lib/pgsql/.pgpass dbname=replication host=10.0.2.2 port=20432 fallback_application_name=walreceiver sslmode=disable sslcompression=1 target_session_attrs=any

testdb=#

When there is no activity, and the standby has replayed everything then latest_end_lsn must be equal to sent_lsn on the primary (and all intermediate log sequence numbers).

Similarly to physical replication, in the case of logical replication, where the role of the primary is taken by the publisher, and the role of the standby is taken by the subscriber, naturally the role of pg_stat_wal_receiver is taken by pg_stat_subscription. We can query pg_stat_subscription as follows:

testdb=# select * from pg_stat_subscription ;
-[ RECORD 1 ]---------+------------------------------
subid                 | 24615
subname               | alltables_sub
pid                   | 1132
relid                 |
received_lsn          | 0/33005498
last_msg_send_time    | 2018-03-03 17:05:36.004545+00
last_msg_receipt_time | 2018-03-03 17:05:35.990659+00
latest_end_lsn        | 0/33005498
latest_end_time       | 2018-03-03 17:05:36.004545+00

Note that on the publisher side, the corresponding view is the same as in the case of physical replication: pg_stat_replication.

Collected Statistics Views

pg_stat_archiver view has one row which gives info about the wal archiver. Keeping a snapshot of this row at regular intervals lets you calculate the size of the WAL traffic between those intervals. Also it gives info about failures while archiving WAL files.

pg_stat_bgwriter view gives very important information on the behavior of:

The checkpointer
The background writer
The (client serving) backends

Since this view gives accumulative data since the last reset It is very useful to create another timestamped table with periodic snapshots of pg_stat_bgwriter, so that it will be easy to get an incremental perspective between two snapshots. Tuning is a science (or magic), and it requires extensive logging and monitoring as well as a clear understanding of the underlying concepts and PostgreSQL internals in order to have good results, and this view is where to start, looking for things such as:

Are the checkpoints_timed the vast majority of the total checkpoints? If not then actions must be taken, results measured, and iterate the whole process until no improvements are found.
Are the buffers_checkpoint a good majority over the other two kinds (buffers_clean but most importantly buffers_backend) ? If buffers_backend are high, then again, certain configurations parameters must be changed, new measurements to be taken and reassessed.

Pg_stat_[user|sys|all]_tables

The most basic usage of those views is to verify that our vacuum strategy works as expected. Large values of dead tuples relative to live tuples signifies inefficient vacuuming. Those views also provide info on seq vs index scans and fetches, info about num of rows inserted, updated, deleted as well as HOT updates. You should try to keep the number of HOT updates as high as possible in order to improve performance.

Pg_stat_[user|sys|all]_indexes

Here the system stores and shows info on individual index usage. One thing to keep in mind is that idx_tup_read is more accurate than idx_tup_fetch. Non PK/ non unique Indexes with low idx_scan should be considered for removal, since they only hinder HOT updates. As mentioned in the previous blog, over-indexing should be avoided, indexing comes at a cost.

Pg_statio_[user|sys|all]_tables

In those views we can find info on the performance of the cache regarding table heap reads, index reads and TOAST reads. A simple query to count for the percentage of hits, and the distribution of the hits across tables would be:

with statioqry as (select relid,heap_blks_hit,heap_blks_read,row_number() OVER (ORDER BY 100.0*heap_blks_hit::numeric/(heap_blks_hit+heap_blks_read) DESC),COUNT(*) OVER () from pg_statio_user_tables where heap_blks_hit+heap_blks_read >0)
select relid,row_number,100.0*heap_blks_hit::float8/(heap_blks_hit+heap_blks_read) as "heap block hits %", 100.0 * row_number::real/count as "In top %" from statioqry order by row_number;
   relid   | row_number | heap block hits % |     In top %      
-----------+------------+-------------------+-------------------
     16599 |          1 |  99.9993058404502 | 0.373134328358209
     18353 |          2 |  99.9992251425738 | 0.746268656716418
     18338 |          3 |    99.99917566565 |  1.11940298507463
     17269 |          4 |  99.9990617323798 |  1.49253731343284
     18062 |          5 |  99.9988021889522 |  1.86567164179104
     18075 |          6 |  99.9985334109273 |  2.23880597014925
     18365 |          7 |  99.9968070500335 |  2.61194029850746
………..
     18904 |        127 |  97.2972972972973 |  47.3880597014925
     18801 |        128 |  97.1631205673759 |  47.7611940298507
     16851 |        129 |  97.1428571428571 |   48.134328358209
     17321 |        130 |  97.0043198249512 |  48.5074626865672
     17136 |        131 |                97 |  48.8805970149254
     17719 |        132 |  96.9791612263018 |  49.2537313432836
     17688 |        133 |   96.969696969697 |  49.6268656716418
     18872 |        134 |  96.9333333333333 |                50
     17312 |        135 |  96.8181818181818 |  50.3731343283582
……………..
     17829 |        220 |  60.2721026527734 |   82.089552238806
     17332 |        221 |  60.0276625172891 |  82.4626865671642
     18493 |        222 |                60 |  82.8358208955224
     17757 |        223 |  59.7222222222222 |  83.2089552238806
     17586 |        224 |  59.4827586206897 |  83.5820895522388

This tells us that at least 50% of the tables have hit rates larger than 96.93%, and 83.5% of the tables have a hit rate better than 59.4%

Pg_statio_[user|sys|all]_indexes

This view contains block read/hit information for indexes.

Pg_stat_database

This view contains one row per database. It shows some of the info of the preceding views aggregated to the whole database (blocks read, blocks hit, info on tups), some information relevant to the whole database (total xactions, temp files, conflicts, deadclocks, read/write time), and finally number of current backends.

Things to look for here are the ratio of blks_hit/(blks_hit + blks_read): the higher the value the better for the system’s I/O. However misses should not necessarily be accounted for disk reads as they may have very well been served by the OS’s filesys cache.

Similarly to other collected statistics views mentioned above, one should create a timestamped version of the pg_stat_database view and have a view at the differences between two consecutive snapshots:

Are the number of rollbacks increasing?
Or the number of committed xactions?
Are we getting way more conflicts than yesterday (this applies to standbys)?
Do we have abnormally high numbers of deadlocks?

All those are very important data. The first two might mean some change in some usage pattern, that must be explained. High number of conflicts might mean replication needs some tuning. High number of deadlocks is bad for many reasons. Not only performance is low because transactions get rolled back, but also if an application suffers from deadlocks in a single master topology, the problems will only get amplified if we move to multi-master. In this case, the software engineering department must rewrite the pieces of the code that cause the deadlocks.

Locks

Locking is a very important topic in PostgreSQL and deserves its own blog(s). Nevertheless basic locks monitoring has to be done in the same fashion as the other aspects of monitoring presented above. pg_locks view provides real time information on the current locks in the system. We may catch long waiting locks by setting log_lock_waits, then information on long waiting waits will be logged in the PgSQL log. If we notice unusual high locking which results in long waits then again, as in the case with the deadlocks mentioned above, the software engineers must review any pieces of code that might cause long held locks, e.g. explicit locking in the application (LOCK TABLE or SELECT … FOR UPDATE).

Similarly to the case of deadlocks, a system with short locks will move easier to a multi-master setup.

Tags:

↧

Comparing Oracle RAC HA Solution to Galera Cluster for MySQL or MariaDB

March 15, 2018, 3:42 am

≫ Next: Migrating from MySQL to PostgreSQL - What You Should Know

≪ Previous: Key Things to Monitor in PostgreSQL - Analyzing Your Workload

Business has continuously desired to derive insights from information to make reliable, smarter, real-time, fact-based decisions. As firms rely more on data and databases, information and data processing is the core of many business operations and business decisions. The faith in the database is total. None of the day-to-day company services can run without the underlying database platforms. As a consequence, the necessity on scalability and performance of database system software is more critical than ever. The principal benefits of the clustered database system are scalability and high availability. In this blog, we will try to compare Oracle RAC and Galera Cluster in the light of these two aspects. Real Application Clusters (RAC) is Oracle’s premium solution to clustering Oracle databases and provides High Availability and Scalability. Galera Cluster is the most popular clustering technology for MySQL and MariaDB.

Architecture overview

Oracle RAC uses Oracle Clusterware software to bind multiple servers. Oracle Clusterware is a cluster management solution that is integrated with Oracle Database, but it can also be used with other services, not only the database. The Oracle Clusterware is an additional software installed on servers running the same operating system, which lets the servers to be chained together to operate as if they were one server.

Oracle Clusterware watches the instance and automatically restarts it if a crash occurs. If your application is well designed, you may not experience any service interruption. Only a group of sessions (those connected to the failed instance) is affected by the failure. The blackout can be efficiently masked to the end user using advanced RAC features like Fast Application Notification and the Oracle client’s Fast Connection Failover. Oracle Clusterware controls node membership and prevents split brain symptoms in which two or more instances attempt to control the instance.

Galera Cluster is a synchronous active-active database clustering technology for MySQL and MariaDB. Galera Cluster differs from what is known as Oracle’s MySQL Cluster - NDB. MariaDB cluster is based on the multi-master replication plugin provided by Codership. Since version 5.5, the Galera plugin (wsrep API) is an integral part of MariaDB. Percona XtraDB Cluster (PXC) is also based on the Galera plugin. The Galera plugin architecture stands on three core layers: certification, replication, and group communication framework. Certification layer prepares the write-sets and does the certification checks on them, guaranteeing that they can be applied. Replication layer manages the replication protocol and provides the total ordering capability. Group Communication Framework implements a plugin architecture which allows other systems to connect via gcomm back-end schema.

To keep the state identical across the cluster, the wsrep API uses a Global Transaction ID. GTID unique identifier is created and associated with each transaction committed on the database node. In Oracle RAC, various database instances share access to resources such as data blocks in the buffer cache to enqueue data blocks. Access to the shared resources between RAC instances needs to be coordinated to avoid conflict. To organize shared access to these resources, the distributed cache maintains information such as data block ID, which RAC instance holds the current version of this data block, and the lock mode in which each instance contains the data block.

Data storage key concepts

Oracle RAC relies on a distributed disk architecture. The database files, control files and online redo logs for the database need be accessible to each node in the cluster. There is a variation of ways to configure shared storage including directly attached disks, Storage Area Networks (SAN), and Network Attached Storage (NAS) and Oracle ASM. Two most popular are OCFS and ASM. Oracle Cluster File System (OCFS) is a shared file system designed specifically for Oracle RAC. OCFS eliminates the requirement that Oracle database files be connected to logical drives and enables all nodes to share a single Oracle Home ASM, RAW Device. Oracle ASM is Oracle's advised storage management solution that provides an alternative to conventional volume managers, file systems, and raw devices. The Oracle ASM provides a virtualization layer between the database and storage. It treats multiple disks as a single disk group and lets you dynamically add or remove drives while maintaining databases online.

There is no need to build sophisticated shared disk storage for Galera, as each node has its full copy of data. However it is a good practice to make the storage reliable with battery-backed write caches.

Oracle RAC, Cluster storage

Galera replication, disks attached to database nodes

Cluster nodes communication and cache

Oracle Real Application Clusters has a shared cache architecture, it utilizes Oracle Grid Infrastructure to enable the sharing of server and storage resources. Communication between nodes is the critical aspect of cluster integrity. Each node must have at least two network adapters or network interface cards: one for the public network interface, and one for the interconnect. Each cluster node is connected to all other nodes via a private high-speed network, also recognized as the cluster interconnect.

Oracle RAC, network architecture

The private network is typically formed with Gigabit Ethernet, but for high-volume environments, many vendors offer low-latency, high-bandwidth solutions designed for Oracle RAC. Linux also extends a means of bonding multiple physical NICs into a single virtual NIC to provide increased bandwidth and availability.

While the default approach to connecting Galera nodes is to use a single NIC per host, you can have more than one card. ClusterControl can assist you with such setup. The main difference is the bandwidth requirement on the interconnect. Oracle RAC ships blocks of data between instances, so it places a heavier load on the interconnect as compared to Galera write-sets (which consist of a list of operations).

With Redundant Interconnect Usage in RAC, you can identify multiple interfaces to use for the private cluster network, without the need of using bonding or other technologies. This functionality is available starting with Oracle Database 11gR2. If you use the Oracle Clusterware excessive interconnect feature, then you must use IPv4 addresses for the interfaces (UDP is a default).

To manage high availability, each cluster node is assigned a virtual IP address (VIP). In the event of node failure, the failed node's IP address can be reassigned to a surviving node to allow applications continue to reach the database through the same IP address.

Sophisticated network setup is necessary to Oracle's Cache Fusion technology to couple the physical memory in each host into a single cache. Oracle Cache Fusion provides data stored in the cache of one Oracle instance to be accessed by any other instance by transporting it across the private network. It also protects data integrity and cache coherency by transmitting locking and supplementary synchronization information beyond cluster nodes.

On top of the described network setup, you can set a single database address for your application - Single Client Access Name (SCAN). The primary purpose of SCAN is to provide ease of connection management. For instance, you can add new nodes to the cluster without changing your client connection string. This functionality is because Oracle will automatically distribute requests accordingly based on the SCAN IPs which point to the underlying VIPs. Scan listeners do the bridge between clients and the underlying local listeners which are VIP-dependent.

For Galera Cluster, the equivalent of SCAN would be adding a database proxy in front of the Galera nodes. The proxy would be a single point of contact for applications, it can blacklist failed nodes and route queries to healthy nodes. The proxy itself can be made redundant with Keepalived and Virtual IP.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Failover and data recovery

The main difference between Oracle RAC and MySQL Galera Cluster is that Galera is shared nothing architecture. Instead of shared disks, Galera uses certification based replication with group communication and transaction ordering to achieve synchronous replication. A database cluster should be able to survive a loss of a node, although it's achieved in different ways. In case of Galera, the critical aspect is the number of nodes, Galera requires a quorum to stay operational. A three node cluster can survive the crash of one node. With more nodes in your cluster, your availability will grow. Oracle RAC doesn't require a quorum to stay operational after a node crash. It is because of the access to distributed storage that keeps consistent information about cluster state. However, your data storage could be a potential point of failure in your high availability plan. While it's reasonably straightforward task to have Galera cluster nodes spread across geolocation data centers, it wouldn't be that easy with RAC. Oracle RAC requires additional high-end disk mirroring however, basic RAID like redundancy can be achieved inside an ASM diskgroup.

Disk Group Type	Supported Mirroring Levels	Default Mirroring Level
External redundancy	Unprotected (none)	Unprotected
Normal redundancy	Two-way, three-way, unprotected (none)	Two-way
High redundancy	Three-way	Three-way
Flex redundancy	Two-way, three-way, unprotected (none)	Two-way (newly-created)
Extended redundancy	Two-way, three-way, unprotected (none)	Two-way

ASM Disk Group redundancy

Locking Schemes

In a single-user database, a user can alter data without concern for other sessions modifying the same data at the same time. However, in a multi-user database multi-node environment, this become more tricky. A multi-user database must provide the following:

data concurrency - the assurance that users can access data at the same time,
data consistency - the assurance that each user sees a consistent view of the data.

Cluster instances require three main types of concurrency locking:

Data concurrency reads on different instances,
Data concurrency reads and writes on different instances,
Data concurrency writes on different instances.

Oracle lets you choose the policy for locking, either pessimistic or optimistic, depending on your requirements. To obtain concurrency locking, RAC has two additional buffers. They are Global Cache Service (GCS) and Global Enqueue Service (GES). These two services cover the Cache Fusion process, resource transfers, and resource escalations among the instances. GES include cache locks, dictionary locks, transaction locks and table locks. GCS maintains the block modes and block transfers between the instances.

In Galera cluster, each node has its storage and buffers. When a transaction is started, database resources local to that node are involved. At commit, the operations that are part of that transaction are broadcasted as part of a write-set, to the rest of the group. Since all nodes have the same state, the write-set will either be successful on all nodes or it will fail on all nodes.

Galera Cluster uses at the cluster-level optimistic concurrency control, which can appear in transactions that result in a COMMIT aborting. The first commit wins. When aborts occur at the cluster level, Galera Cluster gives a deadlock error. This may or may not impact your application architecture. High number of rows to replicate in a single transaction would impact node responses, although there are techniques to avoid such behavior.

Hardware & Software requirements

Configuring both clusters hardware doesn’t require potent resources. Minimal Oracle RAC cluster configuration would be satisfied by two servers with two CPUs, physical memory at least 1.5 GB of RAM, an amount of swap space equal to the amount of RAM and two Gigabit Ethernet NICs. Galera’s minimum node configuration is three nodes (one of nodes can be an arbitrator, gardb), each with 1GHz single-core CPU 512MB RAM, 100 Mbps network card. While these are the minimal, we can safely say that in both cases you would probably like to have more resources for your production system.

Each node stores software so you would need to prepare several gigabytes of your storage. Oracle and Galera both have the ability to individually patch the nodes by taking them down one at a time. This rolling patch avoids a complete application outage as there are always database nodes available to handle traffic.

What is important to mention is that a production Galera cluster can easily run on VM’s or basic bare metal, while RAC would need investment in sophisticated shared storage and fiber communication.

Monitoring and management

Oracle Enterprise Manager is the favored approach for monitoring Oracle RAC and Oracle Clusterware. Oracle Enterprise Manager is an Oracle Web-based unified management system for monitoring and administering your database environment. It’s part of Oracle Enterprise License and should be installed on separate server. Cluster control monitoring and management is done via combination on crsctl and srvctl commands which are part of cluster binaries. Below you can find a couple of example commands.

Clusterware Resource Status Check:

    crsctl status resource -t (or shorter: crsctl stat res -t)

Example:

$ crsctl stat res ora.test1.vip
NAME=ora.test1.vip
TYPE=ora.cluster_vip_net1.type
TARGET=ONLINE
STATE=ONLINE on test1

Check the status of the Oracle Clusterware stack:

    crsctl check cluster

Example:

$ crsctl check cluster -all
*****************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
*****************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Check the status of Oracle High Availability Services and the Oracle Clusterware stack on the local server:

    crsctl check crs

Example:

$ crsctl check crs
CRS-4638: Oracle High Availablity Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Stop Oracle High Availability Services on the local server.

    crsctl stop has

Stop Oracle High Availability Services on the local server.

    crsctl start has

Displays the status of node applications:

    srvctl status nodeapps

Displays the configuration information for all SCAN VIPs

    srvctl config scan

Example:

srvctl config scan -scannumber 1
SCAN name: testscan, Network: 1
Subnet IPv4: 192.51.100.1/203.0.113.46/eth0, static
Subnet IPv6: 
SCAN 1 IPv4 VIP: 192.51.100.195
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes:
SCAN VIP is individually disabled on nodes:

The Cluster Verification Utility (CVU) performs system checks in preparation for installation, patch updates, or other system changes:

    cluvfy comp ocr

Example:

Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configurationl...
All nodes free of non-clustered, local-only configurations
ASM Running check passed. ASM is running on all specified nodes
Checking OCR config file “/etc/oracle/ocr.loc"...
OCR config file “/etc/oracle/ocr.loc" check successful
Disk group for ocr location “+DATA" available on all the nodes
NOTE:
This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check passed
Verification of OCR integrity was successful.

Galera nodes and the cluster requires the wsrep API to report several statuses, which is exposed. There are currently 34 dedicated status variables can be viewed with SHOW STATUS statement.

mysql> SHOW STATUS LIKE 'wsrep_%';

wsrep_apply_oooe
wsrep_apply_oool
wsrep_cert_deps_distance
wsrep_cluster_conf_id
wsrep_cluster_size
wsrep_cluster_state_uuid
wsrep_cluster_status
wsrep_connected
wsrep_flow_control_paused
wsrep_flow_control_paused_ns
wsrep_flow_control_recv

wsrep_local_send_queue_avg
wsrep_local_state_uuid
wsrep_protocol_version
wsrep_provider_name
wsrep_provider_vendor
wsrep_provider_version
wsrep_flow_control_sent
wsrep_gcomm_uuid
wsrep_last_committed
wsrep_local_bf_aborts
wsrep_local_cert_failures

wsrep_local_commits
wsrep_local_index
wsrep_local_recv_queue
wsrep_local_recv_queue_avg
wsrep_local_replays
wsrep_local_send_queue
wsrep_ready
wsrep_received
wsrep_received_bytes
wsrep_replicated
wsrep_replicated_bytes
wsrep_thread_count

The administration of MySQL Galera Cluster in many aspects is very similar. There are just few exceptions like bootstrapping the cluster from initial node or recovering nodes via SST or IST operations.

Bootstrapping cluster:

$ service mysql bootstrap # sysvinit
$ service mysql start --wsrep-new-cluster # sysvinit
$ galera_new_cluster # systemd
$ mysqld_safe --wsrep-new-cluster # command line

The equivalent Web-based, out of the box solution to manage and monitor Galera Cluster is ClusterControl. It provides a web-based interface to deploy clusters, monitors key metrics, provides database advisors, and take care of management tasks like backup and restore, automatic patching, traffic encryption and availability management.

Restrictions on workload

Oracle provides SCAN technology which we found missing in Galera Cluster. The benefit of SCAN is that the client’s connection information does not need to change if you add or remove nodes or databases in the cluster. When using SCAN, the Oracle database randomly connects to one of the available SCAN listeners (typically three) in a round robin fashion and balances the connections between them. Two kinds load balancing can be configured: client side, connect time load balancing and on the server side, run time load balancing. Although there is nothing similar within Galera cluster itself, the same functionality can be addressed with additional software like ProxySQL, HAProxy, Maxscale combined with Keepalived.

When it comes to application workload design for Galera Cluster, you should avoid conflicting updates on the same row, as it leads to deadlocks across the cluster. Avoid bulk inserts or updates, as these might be larger than the maximum allowed writeset. That might also cause cluster stalls.

Designing Oracle HA with RAC you need to keep in mind that RAC only protects against server failure, and you need to mirror the storage and have network redundancy. Modern web applications require access to location-independent data services, and because of RAC’s storage architecture limitations, it can be tricky to achieve. You also need to spend a notable amount of time to gain relevant knowledge to manage the environment; it is a long process. On the application workload side, there are some drawbacks. Distributing separated read or write operations on the same dataset is not optimal because latency is added by supplementary internode data exchange. Things like partitioning, sequence cache, and sorting operations should be reviewed before migrating to RAC.

Multi data-center redundancy

According to the Oracle documentation, the maximum distance between two boxes connected in a point-to-point fashion and running synchronously can be only 10 km. Using specialized devices, this distance can be increased to 100 km.

Galera Cluster is well known for its multi-datacenter replication capabilities. It has rich support for Wider Area Networks network settings. It can be configured for high network latency by taking Round-Trip Time (RTT) measurements between cluster nodes and adjusting necessary parameters. The wsrep_provider_options parameters allow you to configure settings like suspect_timeout, interactive_timeout, join_retrans_timouts and many more.

Using Galera and RAC in Cloud

Per Oracle note www.oracle.com/technetwork/database/options/.../rac-cloud-support-2843861.pdf no third-party cloud currently meets Oracle’s requirements regarding natively provided shared storage. “Native” in this context means that the cloud provider must support shared storage as part of their infrastructure as per Oracle’s support policy.

Thanks to its shared nothing architecture, which is not tied to a sophisticated storage solution, Galera cluster can be easily deployed in a cloud environment. Things like:

optimized network protocol,
topology-aware replication,
traffic encryption,
detection and automatic eviction of unreliable nodes,

makes cloud migration process more reliable.

Licenses and hidden costs

Oracle licensing is a complex topic and would require a separate blog article. The cluster factor makes it even more difficult. The cost goes up as we have to add some options to license a complete RAC solution. Here we just want to highlight what to expect and where to find more information.

RAC is a feature of Oracle Enterprise Edition license. Oracle Enterprise license is split into two types, per named user and per processor. If you consider Enterprise Edition with per core license, then the single core cost is RAC 23,000 USD + Oracle DB EE 47,500 USD, and you still need to add a ~ 22% support fee. We would like to refer to a great blog on pricing found on https://flashdba.com/2013/09/18/the-real-cost-of-oracle-rac/.

Flashdba calculated the price of a four node Oracle RAC. The total amount was 902,400 USD plus additional 595,584 USD for three years DB maintenance, and that does not include features like partitioning or in-memory database, all that with 60% Oracle discount.

Galera Cluster is an open source solution that anyone can run for free. Subscriptions are available for production implementations that require vendor support. A good TCO calculation can be found at https://severalnines.com/blog/database-tco-calculating-total-cost-ownership-mysql-management.

Conclusion

While there are significant differences in architecture, both clusters share the main principles and can achieve similar goals. Oracle enterprise product comes with everything out of the box (and it's price). With a cost in the range of >1M USD as seen above, it is a high-end solution that many enterprises would not be able to afford. Galera Cluster can be described as a decent high availability solution for the masses. In certain cases, Galera may well be a very good alternative to Oracle RAC. One drawback is that you have to build your own stack, although that can be completely automated with ClusterControl. We’d love to hear your thoughts on this.

Tags:

↧

Migrating from MySQL to PostgreSQL - What You Should Know

March 16, 2018, 3:15 am

≫ Next: Getting Started with ProxySQL - MySQL & MariaDB Load Balancing Tutorial

≪ Previous: Comparing Oracle RAC HA Solution to Galera Cluster for MySQL or MariaDB

Whether migrating a database or project from MySQL to PostgreSQL, or choosing PostgreSQL for a new project with only MySQL knowledge, there are a few things to know about PostgreSQL and the differences between the two database systems.

PostgreSQL is a fully open source database system released under its own license, the PostgreSQL License, which is described as "a liberal Open Source license, similar to the BSD or MIT licenses.” This has allowed The PostgreSQL Global Development Group (commonly referred to as PGDG), who develops and maintains the open source project, to improve the project with help from people around the world, turning it into one of the most stable and feature rich database solutions available. Today, PostgreSQL competes with the top proprietary and open source database systems for features, performance, and popularity.

PostgreSQL is a highly compliant Relational Database System that’s scalable, customizable, and has a thriving community of people improving it every day.

What PostgreSQL Needs

In a previous blog, we discussed setting up and optimizing PostgreSQL for a new project. It is a good introduction to PostgreSQL configuration and behavior, and can be found here: https://severalnines.com/blog/setting-optimal-environment-postgresql.

If migrating an application from MySQL to PostgreSQL, the best place to start would be to host it on similar hardware or hosting platform as the source MySQL database.

On Premise

If hosting the database on premise, bare metal hosts (rather than Virtual Machines) are generally the best option for hosting PostgreSQL. Virtual Machines do add some helpful features at times, but they come at the cost of losing power and performance from the host in general, while bare metal allows the PostgreSQL software to have full access to performance with fewer layers between it and the hardware. On premise hosts would need an administrator to maintain the databases, whether it’s a full time employee or contractor, whichever makes more sense for the application needs.

In The Cloud

Cloud hosting has come a long way in the past few years, and countless companies across the world host their databases in cloud based servers. Since cloud hosts are highly configurable, the right size and power of host can be selected for the specific needs of the database, with a cost that matches.

Depending on the hosting option used, new hosts can be provisioned quickly, memory / cpu / disk can be tweaked quickly, and even additional backup methods can be available. When choosing a cloud host, look for whether a host is dedicated or shared, dedicated being better for extremely high load databases. Another key is to make sure the IOPS available for the cloud host is good enough for the database activity needs. Even with a large memory pool for PostgreSQL, there will always be disk operations to write data to disk, or fetch data when not in memory.

Cloud Services

Since PostgreSQL is increasing in popularity, it’s being found available on many cloud database hosting services like Heroku, Amazon AWS, and others, and is quickly catching up to the popularity of MySQL. These services allow a third party to host and manage a PostgreSQL database easily, allowing focus to remain on the application.

Concepts / term comparisons

There are a few comparisons to cover when migrating from MySQL to PostgreSQL, common configuration parameters, terms, or concepts that operate similarly but have their differences.

Database Terms

Various database terms can have different meanings within different implementations of the technology. Between MySQL and PostgreSQL, there’s a few basic terms that are understood slightly differently, so a translation is sometimes needed.

“Cluster”

In MySQL, a ‘cluster’ usually refers to multiple MySQL database hosts connected together to appear as a single database or set of databases to clients.

In PostgreSQL, when referencing a ‘cluster’, it is a single running instance of the database software and all its sub-processes, which then contains one or more databases.

“Database”

In MySQL, queries can access tables from different databases at the same time (provided the user has permission to access each database).

SELECT *
FROM customer_database.customer_table t1
JOIN orders_database.order_table t2 ON t1.customer_id = t2.customer_id
WHERE name = ‘Bob’;

However in PostgreSQL this cannot happen unless using Foreign Data Wrappers (a topic for another time). Instead, a PostgreSQL database has the option for multiple ‘schemas’ which operate similarly to databases in MySQL. Schemas contain the tables, indexes, etc, and can be accessed simultaneously by the same connection to the database that houses them.

SELECT *
FROM customer_schema.customer_table t1
JOIN orders_schema.order_table t2 ON t1.customer_id = t2.customer_id
WHERE name = ‘Bob’;

Interfacing with the PostgreSQL

In the MySQL command line client (mysql), interfacing with the database uses key works like ‘DESCRIBE table’ or ‘SHOW TABLES’. The PostgreSQL command line client (psql) uses its own form of ‘backslash commands’. For example, instead of ‘SHOW TABLES’, PostgreSQL’s command is ‘\dt’, and instead of ‘SHOW DATABASES;’, the command is ‘\l’.

A full list of commands for ‘psql’ can be found by the backslash command ‘\?’ within psql.

Language Support

Like MySQL, PostgreSQL has libraries and plugins for all major languages, as well as ODBC drivers along the lines of MySQL and Oracle. Finding a great and stable library for any language needed is an easy task.

Stored Procedures

Unlike MySQL, PostgreSQL has a wide range of supported Procedural Languages to choose from. In the base install of PostgreSQL, the supported languages are PL/pgSQL (SQL Procedural Language), PL/Tcl (Tcl Procedural Language), PL/Perl (Perl Procedural Language), and PL/Python (Python Procedural Language). Third party developers may have more languages not officially supported by the main PostgreSQL group.

Configuration

Memory
MySQL tunes this with key_buffer_size when using MyISAM, and with innodb_buffer_pool_size when using InnoDB.
PostgreSQL uses shared_buffers for the main memory block given to the database for caching data, and generally sticks around 1/4th of system memory unless certain scenarios require that to change. Queries using memory for sorting use the work_mem value, which should be increased cautiously.

Tools for migration

Migrating to PostgreSQL can take some work, but there are tools the community has developed to help with the process. Generally they will convert / migrate the data from MySQL to PostgreSQL, and recreate tables / indexes. Stored Procedures or functions, are a different story, and usually require manual re-writing either in part, or from the ground up.

Some example tools available are pgloader and FromMySqlToPostgreSql. Pgloader is a tool written in Common Lisp that imports data from MySQL into PostgreSQL using the COPY command, and loads data, indexes, foreign keys, and comments with data conversion to represent the data correctly in PostgreSQL as intended. FromMySqlToPostgreSql is a similar tool written in PHP, and can convert MySQL data types to PostgreSQL as well as foreign keys and indexes. Both tools are free, however many other tools (free and paid) exist and are newly developed as new versions of each database software are released.

Converting should always include in depth evaluation after the migration to make sure data was converted correctly and functionality works as expected. Testing beforehand is always encouraged for timings and data validation.

Replication Options

If coming from MySQL where replication has been used, or replication is needed at all for any reason, PostgreSQL has several options available, each with its own pros and cons, depending on what is needed through replication.

Built In:
By default, PostgreSQL has its own built in replication mode for Point In Time Recovery (PITR). This can be set up using either file-based log shipping, where Write Ahead Log files are shipped to a standby server where they are read and replayed, or Streaming Replication, where a read only standby server fetches transaction logs over a database connection to replay them.
Either one of these built in options can be set up as either a ‘warm standby’ or ‘hot standby.’ A ‘warm standby’ doesn’t allow connections but is ready to become a master at any time to replace a master having issues. A ‘hot standby’ allows read-only connections to connect and issue queries, in addition to being ready to become a read/write master at any time as well if needed.
Slony:
One of the oldest replication tools for PostgreSQL is Slony, which is a trigger based replication method that allows a high level of customization. Slony allows the setup of a Master node and any number of Replica nodes, and the ability to switch the Master to any node desired, and allows the administrator to choose which tables (if not wanting all tables) to replicate. It’s been used not just for replicating data in case of failure / load balancing, but shipping specific data to other services, or even minimal downtime upgrades, since replication can go across different versions of PostgreSQL.
Slony does have the main requirement that every table to be replicated have either a PRIMARY KEY, or a UNIQUE index without nullable columns.
Bucardo:
When it comes to multi-master options, Bucardo is one of few for PostgreSQL. Like Slony, it’s a third party software package that sits on top of PostgreSQL. Bucardo calls itself “an asynchronous PostgreSQL replication system, allowing for both multi-master and multi-slave operations.” The main benefit is multi-master replication, that works fairly well, however it does lack conflict resolution, so applications should be aware of possible issues and fix accordingly.
There are many other replication tools as well, and finding the one that works best for an application depends on the specific needs.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Community

PostgreSQL has a thriving community willing to help with any issues / info that may be needed.

IRC
An active IRC chatroom named #postgresql is available on freenode, as administrators and developers world wide chat about PostgreSQL and related projects / issues. There’s even smaller rooms for specifics like Slony, Bucardo, and more.
Mailing lists
There are a handful of PostgreSQL mailing lists for ‘general’, ‘admin’, ‘performance’, and even ‘novice’ (a great place to start if new to PostgreSQL in general). The mailing lists are subscribed to by many around the world, and provide a very useful wealth of resources to answer any question that may need answering.
A full list of PostgreSQL mailing lists can be found at https://www.postgresql.org/list/
User Groups
User groups are a great place to get involved and active in the community, and many large cities worldwide have a PostgreSQL User Group (PUG) available to join and attend, and if not, consider starting one. These groups are great for networking, learning new technologies, and even just asking questions in person to people from any level of experience.
Documentation
Most Importantly, PostgreSQL is documented very well. Any information for configuration parameters, SQL functions, usage, all can be easily learned through the official documentation provided on PostgreSQL’s website. If at all anything is unclear, the community will help in the previous outlined options.

Tags:

PostgreSQL

MySQL

migration

↧

Getting Started with ProxySQL - MySQL & MariaDB Load Balancing Tutorial

March 19, 2018, 6:45 am

≫ Next: Updated: Become a ClusterControl DBA: Making your DB components HA via Load Balancers

≪ Previous: Migrating from MySQL to PostgreSQL - What You Should Know

We’re excited to announce a major update to our tutorial “Database Load Balancing for MySQL and MariaDB with ProxySQL”

ProxySQL is a lightweight yet complex protocol-aware proxy that sits between the MySQL clients and servers. It is a gate, which basically separates clients from databases, and is therefore an entry point used to access all the database servers.

In this new update we’ve…

Updated the information about how to best deploy ProxySQL via ClusterControl
Revamped the section “Getting Started with ProxySQL”
Added a new section on Data Masking
Added new frequently asked questions (FAQs)

Load balancing and high availability go hand-in-hand. ClusterControl makes it easy to deploy and configure several different load balancing technologies for MySQL and MariaDB with a point-and-click graphical interface, allowing you to easily try them out and see which ones work best for your unique needs.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

ClusterControl for ProxySQL

Included in ClusterControl Advanced and Enterprise, ProxySQL enables MySQL, MariaDB and Percona XtraDB database systems to easily manage intense, high-traffic database applications without losing availability. ClusterControl offers advanced, point-and-click configuration management features for the load balancing technologies we support. We know the issues regularly faced and make it easy to customize and configure the load balancer for your unique application needs.

We know load balancing and support many different technologies. ClusterControl has many things preconfigured to get you started with a couple of clicks. If you run into challenged we also provide resources and on-the-spot support to help ensure your configurations are running at peak performance.

ClusterControl delivers on an array of features to help deploy and manage ProxySQL

Advanced Graphical Interface - ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.
Point and Click deployment - With ClusterControl you’re able to apply point and click deployments to MySQL, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Galera Cluster, and Percona XtraDB technologies, as well the top related load balancers with HAProxy, MaxScale and ProxySQL.
Suite of monitoring graphs - With comprehensive reports you have a clear view of data points like connections, queries, data transfer and utilization, and more.
Configuration Management - Easily configure and manage your ProxySQL deployments with a simple UI. With ClusterControl you can create servers, re-orientate your setup, create users, set rules, manage query routing, and enable variable configurations.

Make sure to check out the update tutorial today!

Tags:

proxysql

MySQL

MariaDB

database load balancing

high availability

↧

Updated: Become a ClusterControl DBA: Making your DB components HA via Load Balancers

March 20, 2018, 4:21 am

≫ Next: The Best Alert and Notification Tools for PostgreSQL

≪ Previous: Getting Started with ProxySQL - MySQL & MariaDB Load Balancing Tutorial

Choosing your HA topology

There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

The Virtual IPs need either an application to manage them (MHA, Orchestrator), some scripting (Keepalived, Pacemaker/Corosync) or an engineer to manually fail over and the decision making in the process can become complex. The Virtual IP failover is a straightforward and simple process by removing the IP address from one host, assigning it to another and use arping to send a gratuitous ARP response. In theory a Virtual IP can be moved in a second but it will take a few seconds before the failover management application is sure the host has failed and acts accordingly. In reality this should be somewhere between 10 and 30 seconds. Another limitation of Virtual IPs is that some cloud providers do not allow you to manage your own Virtual IPs or assign them at all. E.g., Google does not allow you to do that on their compute nodes.

Resource managers like Zookeeper and Etcd can monitor your databases and (re)configure your applications once a host fails or a slave gets promoted to master. In general this is a good idea but implementing your checks with Zookeeper and Etcd is a complex task.

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and ProxySQL and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and ProxySQL both have their own use cases, we will describe them in this post as well.

Why do you need a load balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why.

If you have virtual IPs setup, all you have to do is point your application to the correct (virtual) IP address and everything should be fine connection wise. But suppose you have scaled out the number of read replicas, you might want to provide virtual IPs for each of those read replicas as well because of maintenance or availability reasons. This might become a very large pool of virtual IPs that you have to manage. If one of those read replicas had a failure, you need to re-assign the virtual IP to another host or else your application will connect to either a host that is down or in worst case, a lagging server with stale data. Keeping the replication state to the application managing the virtual IPs is therefore necessary.

Also for Galera there is a similar challenge: you can in theory add as many hosts as you’d like to your application config and pick one at random. The same problem arises when this host is down: you might end up connecting to an unavailable host. Also using all hosts for both reads and writes might also cause rollbacks due to the optimistic locking in Galera. If two connections try to write to the same row at the same time, one of them will receive a roll back. In case your workload has such concurrent updates, it is advised to only use one node in Galera to write to. Therefore you want a manager that keeps track of the internal state of your database cluster.

Both HAProxy and ProxySQL will offer you the functionality to monitor the MySQL/MariaDB database hosts and keep state of your cluster and its topology. For replication setups, in case a slave replica is down, both HAProxy and ProxySQL can redistribute the connections to another host. But if a replication master is down, HAProxy will deny the connection and ProxySQL will give back a proper error to the client. For Galera setups, both load balancers can elect a master node from the Galera cluster and only send the write operations to that specific node.

On the surface HAProxy and ProxySQL may seem to be similar solutions, but they differ a lot in features and the way they distribute connections and queries. HAProxy supports a number of balancing algorithms like least connections, source, random and round-robin while ProxySQL distributes connections using the weight-based round-robin algorithm (equal weight means equal distribution). Since ProxySQL is an intelligent proxy, it is database aware and is also able to analyze your queries. ProxySQL is able to do read/write splitting based on query rules where you can forward the queries to the designated slaves or master in your cluster. ProxySQL includes additional functionality like query rewriting, caching and query firewall with real-time, in-depth statistics generation about the workload.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

By default the HAProxy instance will be configured to send connections to the server instances receiving the least number of connections, but you can change that policy to either round robin or source.

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

Now your Galera cluster is also available via the newly deployed HAProxy node on port 3307. Don’t forget to GRANT your application access from the HAProxy IP, as now the traffic will be incoming from the proxy instead of the application hosts. Also, remember to point your application connection to the HAProxy node.

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a secondary HAProxy node

Now that we have moved the responsibility of retaining high availability over the database connections from the client to HAProxy, what if the proxy node dies? The answer is to create another HAProxy instance and use a virtual IP controlled by Keepalived as shown in this diagram:

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

After we have deployed a secondary HAProxy node, we need to add Keepalived:

And after Keepalived has been added, your nodes overview will look like this:

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

In the example here, we used separate hosts to run HAProxy on, but you could easily add them to existing server instances as well. HAProxy does not bring much overhead, although you should keep in mind that in case of a server failure, you will lose both the database node and the proxy.

Deploying ProxySQL

Deploying ProxySQL to your cluster is done in a similar way to HAProxy: "Add Load Balancer" in the cluster list under ProxySQL tab.

In the deployment wizard, specify where ProxySQL will be installed, the administration user/password, the monitoring user/password to connect to the MySQL backends. From ClusterControl, you can either create a new user to be used by the application (the user will be created on both MySQL and ProxySQL) or use the existing database users (the user will be created on ProxySQL only). Set whether are you are using implicit transactions or not. Basically, if you don’t use SET autocommit=0 to create new transaction, ClusterControl will configure read/write split.

After ProxySQL has been deployed, it will be available under the Nodes tab:

Opening the ProxySQL node overview will present you the ProxySQL monitoring and management interface, so there is no reason to log into ProxySQL on the node anymore. ClusterControl covers most of the ProxySQL important stats like memory utilization, query cache, query processor and so on, as well as other metrics like hostgroups, backend servers, query rule hits, top queries and ProxySQL variables. In the ProxySQL management aspect, you can manage the query rules, backend servers, users, configuration and scheduler right from the UI.

Check out our ProxySQL tutorial page which covers extensively on how to perform database Load Balancing for MySQL and MariaDB with ProxySQL.

Deploying Garbd

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes (50% + 1 node), so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. The added benefit by adding the Galera Arbitrator is that you can now do with only two nodes in your cluster.

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or ProxySQL hosts to deploy garbd on, like in the example below:

Take note that starting from ClusterControl 1.5.1, garbd cannot be installed on the same host as ClusterControl due to risk of package conflicts.

After installing garbd, you will see it appear next to your two Galera nodes:

Final thoughts

We showed you how to make your MySQL master-slave and Galera cluster setups more robust and retain high availability using HAProxy and ProxySQL. Also garbd is a nice daemon that can save the extra third node in your Galera cluster.

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Tags:

↧

The Best Alert and Notification Tools for PostgreSQL

March 21, 2018, 3:59 am

≫ Next: Migrating from Oracle to PostgreSQL - What You Should Know

≪ Previous: Updated: Become a ClusterControl DBA: Making your DB components HA via Load Balancers

As part of their enterprise monitoring system, organizations rely on alerts and notifications as their first line of defense to achieving high availability and consequently lowering outage costs.

Alerts and notifications are sometimes used interchangeably, for example we can say “I have received a high load system alert”, and replacing “alert” with “notification” will not change the message meaning. However, in the world of management systems it is important to note the difference: alerts are events generated as a result of a system trouble and notifications are used to deliver information about system status, including trouble. As an example the Severalnines blog Introducing the ClusterControl Alerting Integrations discusses one of the ClusterControl’s integration features, the notification system which is able to deliver alerts via email, chat services, and incident management systems. Also see PostgreSQL Wiki — Alerts and Status Notifications.

In order to accurately monitor the PostgreSQL database activity, a management system relies on the database activity metrics, custom features or monitor advisors, and monitoring log files.

In this article I review the tools listed in the PostgreSQL Wiki, the Monitoring and PostgreSQL GUI sections, skipping those that aren’t actively maintained, or do not provide alerting and notifications either within the product or with a free trial account. While not an exhaustive review, each tool was installed and configured up to the point where I could understand its alerting and notification capabilities.

Nagios

Nagios is a popular on-premise, general purpose monitoring system that offers an wide range of plugins. While Nagios Core is open source, the recommended solution for monitoring PostgreSQL is Nagios XI.

Notification settings are per user, and in order to change them the administrator must “login as” the user — Nagios uses the term masquerade as. Once on the account setting page, the user can choose to enable or disable the notification methods:

Nagios XI Notification Preferences

In order to configure the types of notifications, head to the “Notification Methods” page:

Nagios XI Notification Methods

See the Nagios XI User Guide for more details.

To configure alerts, log in as administrator and select the database configuration wizard:

Nagios XI Database Configuration Wizard

Once configured, the alerts can be viewed by selecting any of the default views, dashboards, or we can configure a custom one. Out of the box, Nagios XI provides the following PostgreSQL monitors:

Nagios XI PostgreSQL monitors

Note that out of the box Nagios XI doesn’t provide any metrics based on the PostgreSQL Statistics Collector, instead each metric must be defined using the “Postgres Query” configuration wizard:

Nagios XI Postgres Query

Datadog

Datadog is a general purpose SaaS monitoring tool featuring a very large set of integrations with a variety of services. To start monitoring, select the PostgreSQL integration, and then choose the notifications integrations such as email, chat (e.g. Slack), or incident response systems such as PagerDuty:

Datadog Integrations

In order to receive notifications via the integration channels configured earlier, we need to create at least one Datadog monitor, in the case of PostgreSQL monitoring an “integration” monitor type:

Datadog PostgreSQL Integration

The first step in configuring the monitor is selecting an alert type:

Datadog Detection Method

Next, configure one or more metrics:

Datador Metrics Configuration

Configure the conditions for triggering the alert:

Datadog Alert Trigger

Notifications can be customized using template variables:

Datadog Postgres Integration

Finally provide a list of recipients to receive notifications:

Datadog Notification Recipients

The events Datadog can monitor on are listed under the PostgreSQL integration “Metrics” section, and are based on the PostgreSQL Statistics Collector predefined views:

Datadog Postgres Integration Metrics

In order to monitor for events not provided with the default integration, Datadog provides customers with the option of creating custom metrics limited to the Datadog plan.

Okmeter

Okmeter is also part of the SaaS general purpose monitoring family, and just as other SaaS tools, requires an agent on the monitored host. Once the agent is installed, a set of default event triggers are enabled, including a PostgreSQL connection check:

Okmeter Autotriggers

Getting more PostgreSQL metrics requires adding a PostgreSQL “server”:

Okmeter - Adding a server

In order to monitor PostgreSQL statistics, similarly to Nagios and Datadog, we must configure custom metrics as explained in the Okmeter Documentation — Sending Custom metrics. Or, edit the “PostgreSQL server” metric above to include for views in the “okmeter.pg_stats” function.

The Okmeter query statistics documentation page explains how to enable tracking of execution statistics for the SQL statements. Note that there are a few limitations in using the “pg_stat_statements” views e.g. maximum number of distinct statements that can be recorded by a module — see the PostgreSQL documentation on pg_stat_statements for details.

The notification contacts page is where notifications are configured for each user:

Okmeter Contact Notification

Notification messages can be further customized using templates:

Okmeter Notification Message Template

Circonus

Circonus, another SaaS general monitoring product, features a PostgreSQL “check” which can be enabled individually or added as part of the one-step install:

Circonus Check setup

According to Circonus PostgreSQL documentation the check is performed from a remote location via direct SQL statements. After configuring the PostgreSQL host to accept connections from a Circonus broker, the wizard will present a list of available metrics:

Circonus PostgreSQL check

In order to configure alerts, each metric is associated with a set of rules and a list of contacts to be notified.

Circonus Metric Details

Alerts are categorized based on severity levels:

Circonus Rulesets Severity Levels

Notification channels include SMS, OpsGenie, Slack, VictorOps, and PagerDuty (no email). The screenshot below shows a Slack integration:

Circonus Contact Groups

In order to configure notifications, each metric in the check must be assigned rules and contacts. Note that contacts must be created prior to editing the metric:

Circonus Rulesets

New Relic

New Relic is another SaaS general monitoring system. When it comes to PostgreSQL there are (as of this writing) three available plugins. The most recent one is the Blue Medora plugin:

New Relic PostgreSQL plugin from Blue Medora

Once the plugin is working it becomes visible on the plugins page and we are ready to configure alerts:

New Relic Alerts Setup

New Relic uses the concept of alert policies to group alerts into incidents. Before configuring a policy we must setup the notifications channels. Out of the box, New Relic integrates with all popular incident response systems, as well as email:

New Relic Channel Types

Note that the integration must be first enabled in the notification application. For example selecting Slack from the list of channel types:

New Relic Slack Integration

Next create an “alert policy”:

New Relic Alert Policy

An alert policy requires an “alert condition”. The next set of screenshots show the steps to achieve just that:

New Relic PostgreSQL Condition Category

New Relic PostgreSQL Condition Entity

New Relic PostgreSQL Condition Threshold

Finally select the notification channels tab in order to modify the default:

New Relic PostgreSQL Notification Channels

Optionally, add the alert condition to New Relic Insights (requires additional subscription):

New Relic Insights

Postgres Enterprise Manager

PEM or Postgres Enterprise Manager is a tool for managing, tuning, and monitoring PostgreSQL.

It comes with a very rich set of predefined metrics:

Postgres Enterprise Manager Predefined Metrics

In order to modify the default alerts, or create custom ones, use the alert templates:

Postgres Enterprise Manager Custom Alert Template

PEM relies on email and SNMP for notifications, so it can easily integrate with monitoring systems such as Nagios, but there aren’t any integrations with the popular incident management systems (PagerDuty, VictorOps, OpsGenie), or chat services (Slack) found in the other products.

Postgres Enterprise Manager Email & SNMP alerting

pgwatch2

pgwatch2 is another PostgreSQL centric monitoring tool, self-hosted solution.

In order to define alerts, we must first create a custom dashboard and define the metric:

pgwatch2 Dashboard Metrics

Next, configure the alert:

pgwatch2 Dashboard Alert Config

Once configured, the alerts will show up on the Alerts List page:

pgwatch2 Dashboard Alert List

pgwatch2 integrates with all popular notification systems. Here’s an example of adding a Slack channel:

pgwatch2 Slack Integration

To view the notification channels configured in the system, open up the “Notification channels” page:

pgwatch2 Notification Channels

Additional metrics can be added as documented in the pgwatch2 Features section.

ClusterControl

ClusterControl is an on premise database oriented management system with support for PostgreSQL, MySQL, MariaDB, and MongoDB.

First step is adding a notification integration. More information about available integrations is available at Introducing the ClusterControl Alerting Integrations:

ClusterControl Integrations

For the purpose of this demo, I’ve configured Slack:

ClusterControl Slack Integration

ClusterControl also offers the option of notifying via email:

ClusterControl Notifications via Email

Once notifications are in place, create custom advisors in order to trigger alerts based on specific criteria:

ClusterControl Custom Advisors

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Conclusion

The article wasn’t intended to be a deep dive into the functionality of each tool, rather I attempted to outline what I considered to be the important features related to alerting and notifications for PostgreSQL, specifically.

One of the lessons learned is that the selection process should take several factors in consideration:

on premise or SaaS
agent-based or remote check
integration with incident management systems and chat services
availability of monitored metrics, out of the box, and plugins
ability to add custom metrics
alert management features (e.g. grouping)
complexity vs granularity in the user interface
additional functionality (management, tuning, API, etc.)

Also, if one solution doesn’t meet all the business and/or technical requirements, it is always possible to use a combination of services.

Tags:

↧

Migrating from Oracle to PostgreSQL - What You Should Know

March 22, 2018, 3:59 am

≫ Next: An Overview of Logical Replication in PostgreSQL

≪ Previous: The Best Alert and Notification Tools for PostgreSQL

Whether migrating a database or an application from Oracle to PostgreSQL with only one type of database knowledge, there are few things to know about the differences between the two database systems.

PostgreSQL is the world’s most advanced open source database. PostgreSQL community is very strong and they are continuously improving existing PostgreSQL features and also add new features. As per the db-engines.com, PostgreSQL is the DBMS of the year 2017.

There are some incompatibilities in Oracle and PostgreSQL. The behaviour of some functions is different between Oracle and PostgreSQL.

Why Migrate from Oracle to PostgreSQL

Cost: As you may know Oracle licence cost is very expensive and there is additional cost for some features like partitioning and high availability. So overall it's very expensive.
Flexible open source licensing and easy availability from public cloud providers like AWS.
Benefit from open source add-ons to improve performance.

Preliminary Check

As you may know migration from Oracle to PostgreSQL is a costly and time consuming task. It is important to understand which part is to migrate. Do not waste time for migrating objects that are no longer required. Also, check whether there is any historical data required or not. Do not waste time replicating data that you don’t need, for example backup data and temporary table from past maintenance.

Migration Assessment

After preliminary check, the first step of migration is to analyze the application and database object, find out the incompatibilities between both the databases and estimate the time and cost required for migration.

Ora2pg tool is very helpful for migration assessment. It connects to the Oracle database, scan it automatically and extracts the data, generating the database migration report. You can check a sample report in Ora2pg.

What You Should Know

Understand the differences between Oracle and PostgreSQL and convert it using any tool. There is no any tool that can convert 100% Oracle database into PostgreSQL, some manual changes are required. Please check below some of the important differences you should know before migrating.

Data Type Mapping

PostgreSQL has rich set of data types. Some of the important Data type conversion between Oracle and PostgreSQL is as follow.

Oracle	PostgreSQL	Comment
VARCHAR2(n)	VARCHAR(n)	In Oracle ‘n’ is number of bytes whereas in PostgreSQL ‘n’ is number of characters
CHAR(n)	CHAR(n)	In Oracle ‘n’ is number of bytes whereas in PostgreSQL ‘n’ is number of characters
NUMBER(n,m)	NUMERIC(n,m)	NUMBER type can be converted to NUMERIC but if you use SMALLINT, INT and BIGINT then performance would be better.
NUMBER(4)	SMALLINT
NUMBER(9)	INT
NUMBER(18)	BIGINT
NUMBER(n)	NUMERIC(n)	NUMERIC(n) ,If n>=19
DATE	TIMESTAMP(0)	Both databases has DATE type but Oracle DATE type returns date and time whereas PostgreSQL DATE type return only date no time.
TIMESTAMP WITH LOCAL TIME ZONE	TIMESTAMPTZ	The PostgreSQL type Timestamptz(Timestamp with time zone) is different from the Oracle Timestamp with time zone. It is equivalent to Oracle’s Timestamp with local time zone, but this small difference can cause performance issue or application bug.
CLOB	TEXT	PostgreSQL TEXT type can store up to 1 GB of text.
BLOB RAW(n)	BYTEA(1 GB limit) Large object	In Oracle, BLOB datatype stores unstructured binary data in the database. BLOB type can store up to 128 terabytes of binary data. PostgreSQL BYTEA stores binary data but only upto 1 GB. If the data if above 1 GB then use Large object.

Transactions

Oracle database always uses transactions but in PostgreSQL you have to activate that. In Oracle, the transaction starts when executing any statement and ends when COMMIT statement executed. In PostgreSQL, transaction starts when execute BEGIN and end when COMMIT statement executed. Even the isolation levels also have no problem. PostgreSQL database knows all the isolation levels that Oracle database knows. The default isolation level of PostgreSQL is Read committed.

Example:

Oracle:

DELETE FROM table_name WHERE id = 120;
COMMIT;

PostgreSQL:

BEGIN;
DELETE FROM table_name WHERE id  = 120;
COMMIT;

Dual Table

In Oracle FROM clause is mandatory for every SELECT statement so Oracle database uses DUAL table for SELECT statement where table name is not required. In PostgreSQL, FROM clause is not mandatory so DUAL table is not necessary. The Dual table can be created in PostgreSQL as a view to eliminate the porting problem. Orafce tool have implemented this so you can use Orafce also.

Example:

postgres=# SELECT CURRENT_TIMESTAMP FROM DUAL;
ERROR:  relation "dual" does not exist
LINE 1: SELECT CURRENT_TIMESTAMP FROM DUAL;
                                      ^
postgres=# SELECT CURRENT_TIMESTAMP;
       current_timestamp
-------------------------------
 2018-03-16 09:36:01.205925+00
(1 row)

After installing Orafce module:

postgres=# SELECT CURRENT_TIMESTAMP FROM DUAL;
       current_timestamp
-------------------------------
 2018-03-16 09:36:01.205925+00
(1 row)

SYSDATE

Oracle's SYSDATE function returns date and time. The behaviour of SYSDATE function is different in different places. PostgreSQL does not have any function corresponding to SYSDATE function. In PostgreSQL there are multiple methods to get the date and time and it is based on the application purpose.

Time retrieval method	Function to be used
SQL start time	Statement_timestamp()
Transaction start time	now() or Transaction_timestamp()
Time when the function is implemented	Clock_timestamp()

Time retrieval method

Function to be used

SQL start time

Statement_timestamp()

Transaction start time

now() or

Transaction_timestamp()

Time when the function is implemented

Clock_timestamp()

In the below example clock_timestamp() returns the time when actual function is executed and other statement_timestamp() returns the time when the SQL statement started it’s execution.

postgres=# SELECT now(), statement_timestamp(), current_timestamp, transaction_timestamp(), clock_timestamp();
              now              |      statement_timestamp      |       current_timestamp       |     transaction_timestamp     |        clock_timestamp
 
-------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
 2018-03-16 09:27:56.163154+00 | 2018-03-16 09:27:56.163154+00 | 2018-03-16 09:27:56.163154+00 | 2018-03-16 09:27:56.163154+00 | 2018-03-16 09:27:56.163281+00
 (1 row)

TO_DATE(two argument)

Oracle’s TO_DATE function return DATE type value(year, month, day, hour, minute, second) while PostgreSQL’s TO_DATE(two_argument) return DATE type value(year, month, day).

The solution for this incompatibility is to convert TO_DATE() to TO_TIMESTAMP(). If you use Orafce tool then not necessary to change anything because Orafce implemented this function so we get the same result sa Oracle.

Oracle:

SELECT TO_DATE ('20180314121212','yyyymmddhh24miss') FROM dual;

PostgreSQL:

SELECT TO_TIMESTAMP ('20180314121212','yyyymmddhh24miss')::TIMESTAMP(0);

SYNONYM

CREATE SYNONYM is not supported in PostgreSQL. In Oracle CREATE SYNONYM is used to access remote objects while in PostgreSQL we can use SET search_path to include the remote definition.

Oracle:

CREATE SYNONYM abc.table_name FOR pqr.table_name;

PostgreSQL:

SET search_path TO 'abc.table_name';

Behaviour of Empty String and NULL

In Oracle, empty strings and NULL values in string context are the same. The concatenation of NULL and string obtain string as a result. In PostgreSQL the concatenation result is null in this case. In Oracle IS NULL operator is used to check whether string is empty or not but in PostgreSQL result is FALSE for empty string and TRUE for NULL.

Sequences

There is a slight difference in the syntax of sequence in Oracle and PostgreSQL.

Oracle:

Sequence_name.nextval

PostgreSQL:

Nextval(‘sequence_name’)

To change this syntax you can create a script or you can change it manually.

SUBSTR

The behaviour of SUBSTR function in Oracle and PostgreSQL is different. The SUBSTR function works in PostgreSQL without error but returns a different result. This difference can cause application bugs.

Oracle:

SELECT SUBSTR(‘ABC’,-1) FROM DUAL;
Returns ‘C’

PostgreSQL:

postgres=# SELECT SUBSTR('ABC',-1);
 substr
--------
 ABC
(1 row)

The solution for this is to use Orafce SUBSTR function which returns the same result as Oracle in PostgreSQL.

DELETE Statement

In Oracle, DELETE statement can work without FROM clause but in PostgreSQL it is not supported. We need to add FROM clause in PostgreSQL DELETE statement manually.

Oracle:

DELETE table_name WHERE column_name = 'Col_value';

PostgreSQL:

DELETE FROM table_name WHERE column_name = 'Col_value';

External Coupling +

Oracle uses + operator for left and right join but PostgreSQL does not use it.

Oracle:

SELECT a1.name1, a2.name2
     FROM a1, a2
     WHERE a1.code = a2.code (+);

PostgreSQL:

SELECT a1.name1, a2.name2
    FROM a1
    LEFT OUTER JOIN a2 ON a1.code = a2.code;

START WITH..CONNECT BY

Oracle uses START WITH..CONNECT BY for hierarchical queries. PostgreSQL does not support START WITH..CONNECT BY statement. PostgreSQL have WITH RECURSIVE for hierarchical queries so translate CONNECT BY statement into WITH RECURSIVE statement.

Oracle:

SELECT 
    restaurant_name, 
    city_name 
FROM 
    restaurants rs 
START WITH rs.city_name = 'TOKYO' 
CONNECT BY PRIOR rs.restaurant_name = rs.city_name;

PostgreSQL:

WITH RECURSIVE tmp AS (SELECT restaurant_name, city_name
                                 FROM restaurants
                                WHERE city_name = 'TOKYO'
                                UNION
                               SELECT m.restaurant_name, m.city_name
                                 FROM restaurants m
                                 JOIN tmp ON tmp.restaurant_name = m.city_name)
                  SELECT restaurant_name, city_name FROM tmp;

PLSQL to PLPGSQL Conversion

PostgreSQL’s PL/pgSQL language is similar to Oracle’s PL/SQL language in many aspects. It is a block-structured, imperative language, and all variables have to be declared. In both the databases assignments, loops, conditionals are similar.

The main differences you should keep in mind when porting from Oracle’s PL/SQL to PostgreSQL’s PL/pgSQL

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Migration Tools

There are some tools which are very helpful for an Oracle to PostgreSQL migration. You can also create your own tool as an extension and use it inside PostgreSQL.

Orafce

Oracle compatible functions, data type and packages can be use as it is in PostgreSQL. This is an open source tool with BSD licence so anyone can use this tool.

Most of the major functions are covered in Orafce.

Applications usually use those functions with multiple occurrences. You can reduce the modification cost of SQL by using this tool.

All the functions and packages are implemented correctly and it is well tested.

Some of the functions:

Dbms_output
dbms_random
utl_file – filesystem related functions
Dbms_pipe and dbms_alert
PLVdate,PLVstr, PLVchr
Oracle compatible DATE data type and functions like ADD_MONTHS, LAST_DAY,NEXT_DAY and so on.
NVL function
SUBSTR and SUBSTRB function
VARCHAR2 and NVARCHAR2 support
TO_DATE()

Ora2pg

Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema.

It connects to the Oracle database, scans it automatically, extracts its structure or data and then generates SQL scripts that you can load into your PostgreSQL database.

The cost estimation in an Oracle to PostgreSQL migration is not easy.

Ora2Pg inspects all database objects, all functions and stored procedures to detect if there’s still some objects and PL/SQL code that cannot be automatically converted by Ora2Pg.

This tool is very helpful for the following conversions:

Schema conversion
PLSQL to PLPGSQL conversion

Testing

Testing the whole application and the migrated database is very important because some of the functions are the same in both databases, however the behaviour is different.

Some common scenarios need to be checked:
- Check whether all the objects are correctly converted or not.
- Check whether all the DMLS’s are working correctly or not.
- Load some sample data in both databases and check the result. The result of SQL from both database should be same.
- Check the performance of the DML and improve it if necessary.

Tags:

oracle to postgresql migration

↧

An Overview of Logical Replication in PostgreSQL

March 23, 2018, 4:50 am

≫ Next: A Guide to Using pgBouncer for PostgreSQL

≪ Previous: Migrating from Oracle to PostgreSQL - What You Should Know

PostgreSQL is one of the most advanced open source databases in the world with a lot of great features. One of them is Streaming Replication (Physical Replication) which was introduced in PostgreSQL 9.0. It is based on XLOG records which get transferred to the destination server and get applied there. However, it is cluster based and we cannot do a single database or single object (selective replication) replication. Over the years, we have been dependent on external tools like Slony, Bucardo, BDR, etc for selective or partial replication as there was no feature at the core level until PostgreSQL 9.6. However, PostgreSQL 10 came up with a feature called Logical Replication, through which we can perform database/object level replication.

Logical Replication replicates changes of objects based on their replication identity, which is usually a primary key. It is different to physical replication, in which replication is based on blocks and byte-by-byte replication. Logical Replication does not need an exact binary copy at the destination server side, and we have the ability to write on destination server unlike Physical Replication. This feature originates from the pglogical module.

In this blog post, we are going to discuss:

How it works - Architecture
Features
Use cases - when it is useful
Limitations
How to achieve it

How it Works - Logical Replication Architecture

Logical Replication implements a publish and subscribe concept (Publication & Subscription). Below is a higher level architectural diagram on how it works.

Basic Logical Replication Architecture

Publication can be defined on the master server and the node on which it is defined is referred to as the "publisher". Publication is a set of changes from a single table or group of tables. It is at database level and each publication exists in one database. Multiple tables can be added to a single publication and a table can be in multiple publications. You should add objects explicitly to a publication except if you choose the "ALL TABLES" option which needs a superuser privilege.

You can limit the changes of objects (INSERT, UPDATE, and DELETE) to be replicated. By default, all operation types are replicated. You must have a replication identity configured for the object that you want to add to a publication. This is in order to replicate UPDATE and DELETE operations. The replication identity can be a primary key or unique index. If the table does not have a primary key or unique index, then it can be set to replica identity "full" in which it takes all columns as key (entire row becomes key).

You can create a publication using CREATE PUBLICATION. Some practical commands are covered in the "How to achieve it" section.

Subscription can be defined on the destination server and the node on which it is defined is referred to as the "subscriber". The connection to the source database is defined in subscription. The subscriber node is the same as any other stand alone postgres database, and you can also use it as a publication to further subscriptions.

The subscription is added using CREATE SUBSCRIPTION and can be stopped/resumed at any time using the ALTER SUBSCRIPTION command and removed using DROP SUBSCRIPTION.

Once a subscription is created, Logical replication copies a snapshot of the data on the publisher database. Once that is done, it waits for delta changes and sends them to the subscription node as soon as they occur.

However, how are the changes collected? Who sends them to the target? And who applies them at the target? Logical replication is also based on the same architecture as physical replication. It is implemented by “walsender” and “apply” processes. As it is based on WAL decoding, who starts the decoding? The walsender process is responsible to start logical decoding of the WAL, and loads the standard logical decoding plugin (pgoutput). The plugin transforms the changes read from WAL to the logical replication protocol, and filters the data according to the publication specification. The data is then continuously transferred using the streaming replication protocol to the apply worker, which maps the data to local tables and applies the individual changes as they are received, in correct transactional order.

It logs all these steps in log files while setting it up. We can see the messages in "How to achieve it" section later in the post.

Features Of Logical Replication

Logical Replication replicates data objects based upon their replication identity (generally a
primary key or unique index).
Destination server can be used for writes. You can have different indexes and security definition.
Logical Replication has cross-version support. Unlike Streaming Replication, Logical Replication can be set between different versions of PostgreSQL (> 9.4, though)
Logical Replication does Event-based filtering
When compared, Logical Replication has less write amplification than Streaming Replication
Publications can have several subscriptions
Logical Replication provides storage flexibility through replicating smaller sets (even partitioned tables)
Minimum server load compared with trigger based solutions
Allows parallel streaming across publishers
Logical Replication can be used for migrations and upgrades
Data transformation can be done while setting up.

Use Cases - When is Logical Replication Useful?

It is very important to know when to use Logical Replication. Otherwise, you will not get much benefit if your use case does not match. So, here are some use cases on when to use Logical Replication:

If you want to consolidate multiple databases into a single database for analytical purposes.
If your requirement is to replicate data between different major versions of PostgreSQL.
If you want to send incremental changes in a single database or a subset of a database to other databases.
If giving access to replicated data to different groups of users.
If sharing a subset of the database between multiple databases.

Limitations Of Logical Replication

Logical Replication has some limitations on which the community is continuously working on to overcome:

Tables must have the same full qualified name between publication and subscription.
Tables must have primary key or unique key
Mutual (bi-directional) Replication is not supported
Does not replicate schema/DDL
Does not replicate sequences
Does not replicate TRUNCATE
Does not replicate Large Objects
Subscriptions can have more columns or different order of columns, but the types and column names must match between Publication and Subscription.
Superuser privileges to add all tables
You cannot stream over to the same host (subscription will get locked).

How to Achieve Logical Replication

Here are the steps to achieve basic Logical Replication. We can discuss about more complex scenarios later.

Initialize two different instances for publication and subscription and start.

C1MQV0FZDTY3:bin bajishaik$ export PATH=$PWD:$PATH
C1MQV0FZDTY3:bin bajishaik$ which psql
/Users/bajishaik/pg_software/10.2/bin/psql
C1MQV0FZDTY3:bin bajishaik$ ./initdb -D /tmp/publication_db

C1MQV0FZDTY3:bin bajishaik$ ./initdb -D /tmp/subscription_db

Parameters to be changed before you start the instances (for both publication and subscription instances).

C1MQV0FZDTY3:bin bajishaik$ tail -3 /tmp/publication_db/postgresql.conf
listen_addresses='*'
port = 5555
wal_level= logical


C1MQV0FZDTY3:bin bajishaik$ pg_ctl -D /tmp/publication_db/ start
waiting for server to start....2018-03-21 16:03:30.394 IST [24344] LOG:  listening on IPv4 address "0.0.0.0", port 5555
2018-03-21 16:03:30.395 IST [24344] LOG:  listening on IPv6 address "::", port 5555
2018-03-21 16:03:30.544 IST [24344] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5555"
2018-03-21 16:03:30.662 IST [24345] LOG:  database system was shut down at 2018-03-21 16:03:27 IST
2018-03-21 16:03:30.677 IST [24344] LOG:  database system is ready to accept connections
 done
server started

C1MQV0FZDTY3:bin bajishaik$ tail -3 /tmp/subscription_db/postgresql.conf
listen_addresses='*'
port=5556
wal_level=logical

C1MQV0FZDTY3:bin bajishaik$ pg_ctl -D /tmp/subscription_db/ start
waiting for server to start....2018-03-21 16:05:28.408 IST [24387] LOG:  listening on IPv4 address "0.0.0.0", port 5556
2018-03-21 16:05:28.408 IST [24387] LOG:  listening on IPv6 address "::", port 5556
2018-03-21 16:05:28.410 IST [24387] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5556"
2018-03-21 16:05:28.460 IST [24388] LOG:  database system was shut down at 2018-03-21 15:59:32 IST
2018-03-21 16:05:28.512 IST [24387] LOG:  database system is ready to accept connections
 done
server started

Other parameters can be at default for basic setup.

Change pg_hba.conf file to allow replication. Note that these values are dependent on your environment, however, this is just a basic example (for both publication and subscription instances).

C1MQV0FZDTY3:bin bajishaik$ tail -1 /tmp/publication_db/pg_hba.conf
 host     all     repuser     0.0.0.0/0     md5
C1MQV0FZDTY3:bin bajishaik$ tail -1 /tmp/subscription_db/pg_hba.conf
 host     all     repuser     0.0.0.0/0     md5

C1MQV0FZDTY3:bin bajishaik$ psql -p 5555 -U bajishaik -c "select pg_reload_conf()"
Timing is on.
Pager usage is off.
2018-03-21 16:08:19.271 IST [24344] LOG:  received SIGHUP, reloading configuration files
 pg_reload_conf
----------------
 t
(1 row)

Time: 16.103 ms
C1MQV0FZDTY3:bin bajishaik$ psql -p 5556 -U bajishaik -c "select pg_reload_conf()"
Timing is on.
Pager usage is off.
2018-03-21 16:08:29.929 IST [24387] LOG:  received SIGHUP, reloading configuration files
 pg_reload_conf
----------------
 t
(1 row)

Time: 53.542 ms
C1MQV0FZDTY3:bin bajishaik$

Create a couple of test tables to replicate and insert some data on Publication instance.

postgres=# create database source_rep;
CREATE DATABASE
Time: 662.342 ms
postgres=# \c source_rep
You are now connected to database "source_rep" as user "bajishaik".
source_rep=# create table test_rep(id int primary key, name varchar);
CREATE TABLE
Time: 63.706 ms
source_rep=# create table test_rep_other(id int primary key, name varchar);
CREATE TABLE
Time: 65.187 ms
source_rep=# insert into test_rep values(generate_series(1,100),'data'||generate_series(1,100));
INSERT 0 100
Time: 2.679 ms
source_rep=# insert into test_rep_other  values(generate_series(1,100),'data'||generate_series(1,100));
INSERT 0 100
Time: 1.848 ms
source_rep=# select count(1) from test_rep;
 count
-------
   100
(1 row)

Time: 0.513 ms
source_rep=# select count(1) from test_rep_other ;
 count
-------
   100
(1 row)

Time: 0.488 ms
source_rep=#

Create structure of the tables on Subscription instance as Logical Replication does not replicate the structure.

postgres=# create database target_rep;
CREATE DATABASE
Time: 514.308 ms
postgres=# \c target_rep
You are now connected to database "target_rep" as user "bajishaik".
target_rep=# create table test_rep_other(id int primary key, name varchar);
CREATE TABLE
Time: 9.684 ms
target_rep=# create table test_rep(id int primary key, name varchar);
CREATE TABLE
Time: 5.374 ms
target_rep=#

Create publication on Publication instance (port 5555).

source_rep=# CREATE PUBLICATION mypub FOR TABLE test_rep, test_rep_other;
CREATE PUBLICATION
Time: 3.840 ms
source_rep=#

Create subscription on Suscription instance (port 5556) to the publication created in step 6.

target_rep=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=source_rep host=localhost user=bajishaik port=5555' PUBLICATION mypub;
NOTICE:  created replication slot "mysub" on publisher
CREATE SUBSCRIPTION
Time: 81.729 ms

From log:

2018-03-21 16:16:42.200 IST [24617] LOG:  logical decoding found consistent point at 0/1616D80
2018-03-21 16:16:42.200 IST [24617] DETAIL:  There are no running transactions.
target_rep=# 2018-03-21 16:16:42.207 IST [24618] LOG:  logical replication apply worker for subscription "mysub" has started
2018-03-21 16:16:42.217 IST [24619] LOG:  starting logical decoding for slot "mysub"
2018-03-21 16:16:42.217 IST [24619] DETAIL:  streaming transactions committing after 0/1616DB8, reading WAL from 0/1616D80
2018-03-21 16:16:42.217 IST [24619] LOG:  logical decoding found consistent point at 0/1616D80
2018-03-21 16:16:42.217 IST [24619] DETAIL:  There are no running transactions.
2018-03-21 16:16:42.219 IST [24620] LOG:  logical replication table synchronization worker for subscription "mysub", table "test_rep" has started
2018-03-21 16:16:42.231 IST [24622] LOG:  logical replication table synchronization worker for subscription "mysub", table "test_rep_other" has started
2018-03-21 16:16:42.260 IST [24621] LOG:  logical decoding found consistent point at 0/1616DB8
2018-03-21 16:16:42.260 IST [24621] DETAIL:  There are no running transactions.
2018-03-21 16:16:42.267 IST [24623] LOG:  logical decoding found consistent point at 0/1616DF0
2018-03-21 16:16:42.267 IST [24623] DETAIL:  There are no running transactions.
2018-03-21 16:16:42.304 IST [24621] LOG:  starting logical decoding for slot "mysub_16403_sync_16393"
2018-03-21 16:16:42.304 IST [24621] DETAIL:  streaming transactions committing after 0/1616DF0, reading WAL from 0/1616DB8
2018-03-21 16:16:42.304 IST [24621] LOG:  logical decoding found consistent point at 0/1616DB8
2018-03-21 16:16:42.304 IST [24621] DETAIL:  There are no running transactions.
2018-03-21 16:16:42.306 IST [24620] LOG:  logical replication table synchronization worker for subscription "mysub", table "test_rep" has finished
2018-03-21 16:16:42.308 IST [24622] LOG:  logical replication table synchronization worker for subscription "mysub", table "test_rep_other" has finished

As you can see in the NOTICE message, it created a replication slot which ensures the WAL cleanup should not be done until initial snapshot or delta changes are transferred to the target database. Then the WAL sender started decoding the changes, and logical replication apply worked as both pub and sub are started. Then it starts the table sync.

Verify data on Subscription instance.

target_rep=# select count(1) from test_rep;
 count
-------
   100
(1 row)

Time: 0.927 ms
target_rep=# select count(1) from test_rep_other ;
 count
-------
   100
(1 row)

Time: 0.767 ms
target_rep=#

As you see, data has been replicated through initial snapshot.

Verify delta changes.

C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5555 -d source_rep -c "insert into test_rep values(generate_series(101,200), 'data'||generate_series(101,200))"
INSERT 0 100
Time: 3.869 ms
C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5555 -d source_rep -c "insert into test_rep_other values(generate_series(101,200), 'data'||generate_series(101,200))"
INSERT 0 100
Time: 3.211 ms
C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5556 -d target_rep -c "select count(1) from test_rep"
 count
-------
   200
(1 row)

Time: 1.742 ms
C1MQV0FZDTY3:bin bajishaik$ psql -d postgres -p 5556 -d target_rep -c "select count(1) from test_rep_other"
 count
-------
   200
(1 row)

Time: 1.480 ms
C1MQV0FZDTY3:bin bajishaik$

These are the steps for a basic setup of Logical Replication.

Tags:

↧

A Guide to Using pgBouncer for PostgreSQL

March 26, 2018, 2:59 am

≫ Next: Comparing Database Proxy Failover Times - ProxySQL, MaxScale and HAProxy

≪ Previous: An Overview of Logical Replication in PostgreSQL

When reading PostgreSQL getting started, you see the line: “The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this, it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go.”

Brilliant idea. And yet it means that every new connection spins a new process, reserving RAM and possibly getting too heavy with multiple sessions. To avoid problems, postgres has max_connectionssetting with default 100 connections. Of course you can increase it, but such action would require restart (pg_settings.context is ‘postmaster’):

t=# select name,setting,short_desc,context from pg_settings where name = 'max_connections';
-[ RECORD 1 ]--------------------------------------------------
name       | max_connections
setting    | 100
short_desc | Sets the maximum number of concurrent connections.
context    | postmaster

And even after increasing - at some point you might need more connections (of course urgently as always on running prod). Why increasing it is so uncomfortable? Because if it was comfy, you would probably end up with uncontrolled spontaneous increasing of the number until the cluster starts lagging. Meaning old connections are slower - so they take more time, so you need even more and more new. To avoid such possible avalanche and add some flexibility, we have superuser_reserved_connections - to be able to connect and fix problems with SU when max_connections is exhausted. And we obviously see the need of some connection pooler. As we want new connection candidates to wait in a queue instead of failing with exception FATAL: sorry, too many clients already and not risking the postmaster.

Connection pooling is offered at some level by many popular “clients”. You could use it with jdbc for quite a while. Recently node-postgres offered it’s own node-pg-pool. More or less the implementation is simple (as the idea is): pooler starts the connections towards the database and keeps them. The client connecting to db only gets a “shared” existing connection and after closing it, the connection goes back to the pool. We also have much more sophisticated software, like pgPool. And yet pgbouncer is an extremely popular choice for the task. Why? Because it does only the pooling part, but does it right. It’s free. It’s fairly simple to set up. And you meet it at most biggest service providers as recommended or used, eg citusdata, aws, heroku and other highly respected resources.

So let us look closer at what it can and how you use it. In my setup I use default pool_mode = transaction ([pgbouncer] section) which is a very popular choice. This way we not just queue the connections exceeding max_connections, but rather reuse sessions without waiting for the previous connection to close:

[databases]
mon = host=1.1.1.1 port=5432 dbname=mon
mons = host=1.1.1.1 port=5432 dbname=mon pool_mode = session pool_size=2 max_db_connections=2
monst = host=1.1.1.1 port=5432 dbname=mon pool_mode = statement
[pgbouncer]
listen_addr = 1.1.1.1
listen_port = 6432
unix_socket_dir = /tmp
auth_file = /pg/pgbouncer/bnc_users.txt
auth_type = hba
auth_hba_file = /pg/pgbouncer/bnc_hba.conf
admin_users = root vao
pool_mode = transaction
server_reset_query = RESET ALL; --DEALLOCATE ALL; /* custom */
ignore_startup_parameters = extra_float_digits
application_name_add_host = 1
max_client_conn = 10000
autodb_idle_timeout = 3600
default_pool_size = 100
max_db_connections = 100
max_user_connections = 100
#server_reset_query_always = 1 #uncomment if you want older global behaviour

Short overview of the most popular settings and tips and tricks:

server_reset_query is very handy and important. In session pooling mode, it “wipes” previous session “artifacts”. Otherwise you would have problems with same names for prepared statements, session settings affecting next sessions and so on. The default is DISCARD ALL, that “resets” all session states. Yet you can choose more sophisticated values, e.g., RESET ALL; DEALLOCATE ALL; to forget only SET SESSION and prepared statements, keeping TEMP tables and plans “shared”. Or the opposite - you might want to make prepared statements “global” from any session. Such configuration is doable, though risky. You have to make pgbouncer reuse the session for all (thus making either very small pool size or avalanching the sessions), which is not completely reliable. Anyway - it is a useful ability. Especially in setups where you want client sessions to eventually (not immediately) change to configured pooled session settings. Very important point here is session pool mode. Before 1.6 this setting affected other pool modes as well, so if you relied on it, you need to use the new settingserver_reset_query_always = 1. Probably at some point people will want server_reset_query to be even more flexible and configurable per db/user pair (and client_reset_query instead). But as of current writing, March 2018, it’s not an option. The idea behind making this setting valid by default for session mode only was - if you share connection on transaction or statement level - you cannot rely on the session setting at all.
Auth_type = hba. Before 1.7, the big problem with pgbouncer was the absence of host based authentication - “postgres firewall”. Of course you still had it for postgres cluster connection, but pgbouncer was “open” for any source. Now we can use the same hba.conf to limit connections for host/db/user based on connection network.
connect_query is not performed on every client “connection” to pgbouncer, but rather when pgbouncer connects to a Postgres instance. Thus you can’t use it for setting or overriding “default” settings. In session mode, other sessions do not affect each other and on disconnect, reset query discards all - so you don’t need to mess with it. In transaction pooling mode, you would hope to use it for settings overriding erroneously set by other sessions, but it won’t work, alas. Eg. you want to share prepared statement between “sessions” in transaction mode, so you set something like
```
trns = dbname=mon pool_mode = transaction connect_query = 'do $$ begin raise warning $w$%$w$, $b$new connection$b$; end; $$; prepare s(int) as select $1;'
```
and indeed - every new client sees the prepared statements (unless you left server_reset_query_always to on, so pgbouncer discards it on commit). But if some client runs DISCARD s; in its session, it affects all clients on this connection and new clients connecting to it won’t see prepared statements anymore. But if you want to have some initial setting for postgres connections coming from pgbouncer, then this is the place.
application_name_add_host was added in 1.6, it has similar limitation. It “puts” the client IP to application_name, so you can easily get your bad query source, but is easily overridden by simple set application_name TO ‘wasn’’t me’; Still you can “heal” this using views - follow this post to get the idea or even use these short instructions. Basically idea is that show clients; will show the clients IP, so you can query it directly from pgbouncer database on each select from pg_stat_activity to check if it’s reset. But of course using a simple setting is much simpler and cosier. Though it does not guarantee the result...
pool_mode can be specified both as default, per database and per user - making it very flexible. Mixing modes makes pgbouncer extremely effective for pooling. This is a powerful feature, but one has to be careful when using it. Often users use it without understanding the results to absolutely atomic mixes of per transaction/per session/per user/per database/global settings working differently for the same user or database, due to the different pooling modes with pgbouncer. This is the box of matches you don’t give to children without supervision. Also many other options are configurable for default and per db and per user.
Please don’t take it literally, but you can “compare” different sections of ini with SET and ALTER: SET LOCAL affects transactions and is good to use when poll_mode=transaction , SET SESSION affects sessions and is safe for use when poll_mode=session , ALTER USER SET affects roles and will interfere with pgbouncer.ini part of section [users], ALTER DATABASE SET affects databases and will interfere with pgbouncer.ini part of section [databases], ALTER SYSTEM SET or editing postgres.conf globally affects defaults and is comparable by effect to the default section of pgbouncer.ini.
Once again - use pool mode responsibly. Prepared statements or session wide settings will be a mess in transaction pooling mode. Same as SQL transaction makes no sense in statement pooling mode. Choose a suitable pooling mode for suitable connections. A good practice is creating roles with the idea that:
- some will run only fast selects, thus can share one session without transactions for a hundred of concurrent tiny not important selects.
- Some role members are safe for session level concurrency, and ALWAYS use transactions. Thus they can safely share several sessions for hundreds of concurrent transactions.
- Some roles are just too messy of complicated to share their session with others. So you use session pooling mode for them to avoid errors on connection when all “slots” are already taken.
Don’t use it instead of HAProxy or some other load balancer. Despite the fact that pgbouncer has several configurable features addressing what a load balancer addresses, like dns_max_ttl and you can set up a DNS configuration for it, most prod environments use HAProxy or some other load balancer for HA. This is because HAProxy is really good at load balancing across live servers in round robin fashion, better than pgbouncer. Although pgbouncer is better for postgres connection pooling, it might be better to use one small daemon that perfectly performs one task, instead of a bigger one that does two tasks, but worse.
Configuration changes can be tricky. Some changes to pgbouncer.ini require restart (listen_port and such), while others such as admin_users require reload or SIGHUP. Changes inside auth_hba_file require reload, while changes to auth_file do not.

The extremely short overview of settings above is limited by the format. I invite you to take a look at the complete list. Pgbouncer is the kind of software with very small amount of “boring settings” - they all have huge potential and are of amazing interest.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

And lastly, moving from a short enthusiastic review to something where you might be less happy - the installation. The process is clearly described in this section of documentation. The only option described is building from git sources. But everybody knows there are packages! Trying both most popular:

sudo yum install pgbouncer
sudo apt-get install pgbouncer

can work. But sometimes you have to do an extra step. E.g., when no pgbouncer package is available, try this.

Or even:

sudo yum install pgbouncer
Loaded plugins: priorities, update-motd, upgrade-helper
amzn-main                                                                                                                    | 2.1 kB  00:00:00
amzn-updates                                                                                                                 | 2.5 kB  00:00:00
docker-ce-edge                                                                                                               | 2.9 kB  00:00:00
docker-ce-stable                                                                                                             | 2.9 kB  00:00:00
docker-ce-test                                                                                                               | 2.9 kB  00:00:00
pgdg10                                                                                                                       | 4.1 kB  00:00:00
pgdg95                                                                                                                       | 4.1 kB  00:00:00
pgdg96                                                                                                                       | 4.1 kB  00:00:00
pglogical                                                                                                                    | 3.0 kB  00:00:00
sensu                                                                                                                        | 2.5 kB  00:00:00
(1/3): pgdg96/x86_64/primary_db                                                                                              | 183 kB  00:00:00
(2/3): pgdg10/primary_db                                                                                                     | 151 kB  00:00:00
(3/3): pgdg95/x86_64/primary_db                                                                                              | 204 kB  00:00:00
50 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package pgbouncer.x86_64 0:1.8.1-1.rhel6 will be installed
--> Processing Dependency: libevent2 >= 2.0 for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Processing Dependency: c-ares for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Processing Dependency: libcares.so.2()(64bit) for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Running transaction check
---> Package c-ares.x86_64 0:1.13.0-1.5.amzn1 will be installed
---> Package pgbouncer.x86_64 0:1.8.1-1.rhel6 will be installed
--> Processing Dependency: libevent2 >= 2.0 for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Finished Dependency Resolution
Error: Package: pgbouncer-1.8.1-1.rhel6.x86_64 (pgdg10)
           Requires: libevent2 >= 2.0
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

Of course adding pgdg to /etc/yum.repos.d/ won’t help anymore. Neither the --skip-broken or rpm -Va --nofiles --nodigest. A simple

sudo yum install libevent2
Loaded plugins: priorities, update-motd, upgrade-helper
50 packages excluded due to repository priority protections
No package libevent2 available.
Error: Nothing to do

would be too easy. So you have to build libevent2 yourself, bringing you back to the position when you have to compile things yourself. Either it is pgbouncer or one of its dependencies.

Again - digging too deep with the particularities of installation is out of scope. You should know you have a big chance to install it as package.

Lastly - questions like“why postgres does not offer a native session pooler” comes over and over. There are even very fresh suggestions and thoughts on it. But so far the most popular approach here is using pgbouncer.

Tags:

↧

Agenda

Speaker

Hosting

On-Premise Hosting

Cloud

General Host Setup

Choosing an Operating System

Installation

CPU

Memory

Disk

Initial Configuration Settings

Memory Settings

Checkpoint Settings

Security

Logging

On PostgreSQL

On MongoDB

What is High Availability?

Continuous Recovery

Standby databases

Clusters

Conclusion

Date, Time & Registration

Europe/MEA/APAC

North America/LatAm

Agenda

Speaker

Requirements

Deploying Keepalived

VRRP Instance

Health Checks

Simulating Failover

Deploying HAProxy using the ClusterControl

Deploying ProxySQL using ClusterControl

AGENDA

Program

Location

Learn More

About Severalnines

About OptimaData BV

About VidaXL

Overview of the Community, How Development Works

Top Websites to Get Information or Learn PostgreSQL

The Few Tools You Can’t Live Without

Tips and Tricks

Key Things to Monitor in PostgreSQL - Analyzing Your Workload

System Monitoring Basics

PostgreSQL Monitoring Basics

PostgreSQL Statistics Collector

Dynamic Statistics Views

Collected Statistics Views

Pg_stat_[user|sys|all]_tables

Pg_stat_[user|sys|all]_indexes

Pg_statio_[user|sys|all]_tables

Pg_statio_[user|sys|all]_indexes

Pg_stat_database

Locks

Architecture overview

Data storage key concepts

Cluster nodes communication and cache

Failover and data recovery

Locking Schemes

Hardware & Software requirements

Monitoring and management

Restrictions on workload

Multi data-center redundancy

Using Galera and RAC in Cloud

Licenses and hidden costs

Conclusion

What PostgreSQL Needs

On Premise

In The Cloud

Cloud Services

Concepts / term comparisons

Database Terms

“Cluster”

“Database”

Interfacing with the PostgreSQL

Language Support