How to Set up PostgreSQL Database Replication

October 10th, 2022
How to Set up PostgreSQL Database Replication

Database replication is the process of copying data from a central or primary server to multiple servers known as replicas. The primary server accepts read and write operations whereas the replicas serve read-only transactions. The primary server and replicas collectively form a database cluster.

The goal of database replication is to ensure redundancy, consistency, high availability, and accessibility of data, especially in high-traffic, mission-critical applications.

PostgreSQL offers two methods for replications: physical (i.e. streaming) and logical replication. Both are ideal for different use cases, and a user may choose one over the other depending on the end goal.

Let’s have a look at each of these replication methods.

Physical PostgreSQL Replication

This is the most common type of replication in PostgreSQL. Physical replication maintains a full copy of the entire data of a cluster. It uses exact block addresses and employs byte-by-byte replication. In simpler terms, the entire set of data on the primary server is copied to the replica which acts as a standby node.

Physical replication does not replicate a specific object of the primary database cluster such as a single row of data in a table. Instead, it works on a disk block level and mirrors all data to replica nodes; including all the tables in each database. This replication requires all replicas to be identical.

Use cases:

  • It is mostly used for disaster recovery setups and backups since all replicas are identical.
  • Recommended when dealing with large volumes of data

Pros

  • It is easy to implement since all the database clusters are identical.
  • It ensures data consistency and high availability at any point since all the replicas hold identical copies of data.
  • It is Ideal for read-only operations on the replicas.
  • It’s very efficient since it does not require any special handling.

Cons

  • It is bandwidth-intensive since the entire data is copied and not just small sections of the primary cluster.
  • It does not offer multi-master database replication.

Logical PostgreSQL Replication

Logical replication was first introduced in PostgreSQL 9.0. It works by replicating data objects and their changes based on a unique identifier such as a primary key. In simpler terms, logical replication copies database objects in a row-based model as opposed to physical replication which sends everything to the replica nodes.

Thus, logical replication offers fine-grained control over data replication as opposed to physical replication.

Use cases:

Typical use cases for logical replication include:

  • Replicating between different major versions of PostgreSQL.
  • Replicating between PostgreSQL instances hosted on different platforms e.g from Linux to Windows.
  • Sending incremental changes in a database to replicas as they happen in real-time.
  • Granting access to replicated data to various groups of users.

Pros

  • It is Ideal for backing up incremental data.
  • Most recommended for high availability clusters thanks to better performance and low data loss.
  • Bandwidth optimization since only row changes for committed transactions of data is sent to the replicas instead of the entire block of data.
  • Used in multi-master replication which is not possible using physical replication.
  • Supports replication across various OS platforms e.g Linux to windows and vice versa.

Cons

  • It cannot stream large volumes of transactions as they happen in real-time.
  • The replication process is more complex than physical replication.
  • High resource utilization on the replica nodes.

Setup Physical PostgreSQL Replication on Ubuntu 22.04

To demonstrate the replication process, you need a primary node and a replica node on which the replication will occur. Below is our lab setup and we are going to use Ubuntu 22.04 as the preferred operating system.

Primary node: IP: 194.195.208.82 OS: Ubuntu 22.04 server

Replica node: IP: 139.144.169.79 OS: Ubuntu 22.04 server

Step 1: Install PostgreSQL Server

The first step is to install PostgreSQL on both Primary and Replica nodes. Take note that you need to install the same version of PostgreSQL on both nodes for logical replication to take place.

At the time of writing this guide, the latest version of PostgreSQL is PostgreSQL 14.

First, log in to your server via SSH and refresh the repositories.

sudo apt update

Next, install the PostgreSQL database server as follows.

sudo apt install postgresql postgresql-contrib -y

By default, the PostgreSQL daemon automatically runs once installed. You can verify this by using the command:

sudo systemctl status postgresql

systemctl status postgresql

The output below confirms that the PostgreSQL service is up and running.

In addition, consider enabling the service to start on system startup as follows.

sudo systemctl enable postgresql

Step 2: Configure Primary Node

Next, log in to the primary node (194.195.208.82) as postgres user which is the default user that is created with every new PostgreSQL installation.

sudo -u postgres psql

You need to create a replication user that will be used to initiate the replication process from the primary node.

Therefore, run the following command to create the replication user and assign replication privileges. In this command, replica_user is the replication user while P@ssword321 is the user’s password. Be sure to provide a strong password unlike the one we have used which is purely for demo purposes.

CREATE ROLE replica_user WITH REPLICATION LOGIN PASSWORD 'P@ssword321';

Create new PostgreSQL role

Then log out from the PostgreSQL prompt

postgres-# \q

Next, you need to make a few tweaks to the main configuration file. Use your preferred text editor to access the following configuration file:

sudo vim /etc/postgresql/14/main/postgresql.conf

With the file open, scroll down and locate the listen_addresses directive. The directive specifies the host under which the PostgreSQL database server listens for connections.

Uncomment the directive by removing the # symbol and replace localhost with your server’s IP address enclosed in single quotation marks as shown:

postgresql.conf listen_addresses

Next, locate the wal_level directive. The setting specifies the amount of information to be written to the Write Ahead Log (WAL) file.

Uncomment the line and set it to logical as shown.

postgresql.conf wal_level

Next, locate the wal_log_hints directive. By default, it is set to off.

When set to on the value allows the PostgreSQL server to write the entire content of each disk page to the WAL file during the first modification of the page.

Uncomment it and set it to on.

postgresql.conf wal_log_hints

That’s all for the changes needed in this configuration file. Save the changes and exit.

Next, access the /etc/postgresql/14/main/pg_hba.conf configuration file.

sudo vim /etc/postgresql/14/main/pg_hba.conf

Append this line at the end of the configuration file. This allows the replica ( 139.144.169.79 ) to connect with the master node using the replica_user.

host  replication   replica_user  139.144.169.79/24   md5

Save the changes and close the file. The restart PostgreSQL service.

sudo systemctl restart postgresql

Step 3: Configure Replica Node

Before the replica node can start replicating data from the master node, you need to create a copy of the primary node’s data directory to the replica’s data directory. To achieve this, first, stop the PostgreSQL service on the replica node.

sudo systemctl stop postgresql

Next, remove all files in the replica’s data directory in order to start on a clean slate and make room for the primary node data directory.

sudo rm -rv /var/lib/postgresql/14/main/

Now run the pg_basebackup utility as shown to copy data from the primary node to the replica node.

sudo pg_basebackup -h 194.195.208.82 -U replica_user -X stream -C -S replica_1 -v -R -W -D /var/lib/postgresql/14/main/

Let's take a look at the options used in the command:

  • -h: This option specifies the host, in this case, the IP address of the master node.
  • -U: The option specifies the replication user. This is the user that was configured on the primary node and which will be used by the replica node to connect to it. In our case, the replication user is called replica_user.
  • -X: The option along with the stream value instructs the pg_basebackup utility to stream and include the WAL files in the backup.
  • -C: The option allows you to create a replication slot before the backup gets underway. The option is used along with the -S option to specify the slot name. In this case, our replication slot is called replica_1.
  • -v: This prints out verbose output that indicates the progress of the backup process from the primary node to the replica.
  • -R: The option creates two files; an empty recovery configuration file called standby.signal and a primary node connection settings file called postgresql.auto.conf. The standby.signal file contains connection information about the primary node and the postgresql.auto.conf file informs your replica cluster that it should operate as a standby server.
  • -W: This prompts you to provide a password for the replica_user replication user.
  • -D: Lastly, the -D option allows you to include the directory that you want to export the backup files. In this example, we are placing the data under the/var/lib/postgresql/14/main/ directory in the replica node.

Here is the output from running the replication command.

pg_basebackup

Afterward, execute the following command on the replica to grant ownership of the data directory to the postgres user.

sudo chown postgres -R /var/lib/postgresql/14/main/

Now, start the PostgreSQL server. The replica will now be running in hot standby mode.

sudo systemctl start postgresql

Step 4: Test The Replication Setup

At this juncture, the primary node’s data directory has successfully been backed up on the replica and the replica started in a hot standby mode.

The remaining part is to test out if the replication is working as intended. If the replica is successfully running in hot standby mode, then it should be connected to the primary database cluster on the primary server.

To verify that the replica is connected to the primary node and that the primary is streaming, log into the primary server and switch to the postgres user.

sudo -u postgres psql

Next, query the pg_stat_replication table which contains vital information about the replication. In this command, we are retrieving information about the replica’s IP address and the state of the primary server.

SELECT client_addr, state
FROM pg_stat_replication;

SELECT * FROM pg_stat_replication;

You should get the following output confirmation that your setup is working.

To corroborate this, we are going to create a test database and add a table with a few records and then verify if the database has been replicated on the replica.

we will create a sample database called students.

CREATE DATABASE students_db;

Next, switch to the database.

\c students_db;

Output

You are now connected to database "students_db" as user "postgres".

Once in the database, create a table called student_details with the following schema.

CREATE TABLE student_details (first_name VARCHAR(15), last_name VARCHAR(15) , email VARCHAR(40) );

Then insert the following record into the table.

INSERT INTO  student_details (first_name, last_name, email)
VALUES  ('Arthur', 'Spencer', 'arthurspencer@gmail.com');

SQL INSERT INTO table

You can query the table to confirm the inserted records as shown.

SELECT *
FROM student_details;

SELECT * FROM table

Now head over to the replica node and switch to the postgres user.

sudo -u postgres psql

On the shell, verify the existence of the database that was created in the primary node moments ago.

postgres=# \l

PostgreSQL list databases

Now, switch to the database.

\c students_db;

Query the table for all the records stored.

SELECT *
FROM student_details;

You should get exactly the same record as you created in the master node as shown below.

SELECT * FROM table

This confirms that replication was successful! You can continue adding more records on the primary server which will then be copied or replicated to the replica node.

Conclusion

In this tutorial, we have looked at the two replication methods provided by PostgreSQL alongside their pros and cons. We went a step further and created a replication cluster using the PostgreSQL logical model and demonstrated the replication of data from the master to the replica node. Head over to the official documentation for a more comprehensive look at replication in PostgreSQL.

Helping engineers learn 💡 about new technologies and ingenious IT automation use cases to build better systems 💻

Join Cherry Servers Community

Get monthly practical guides about building more secure, efficient and easier to scale systems on an open cloud ecosystem.

We use cookies to ensure seamless user experience for our website. Required cookies - technical, functional and analytical - are set automatically. Please accept the use of targeted cookies to ensure the best marketing experience for your user journey. You may revoke your consent at any time through our Cookie Policy.