Fork me on GitHub

Postgres-XL (or just XL, for short) is an open source project from 2ndQuadrant. It is a massively parallel database built on top of PostgreSQL, and it is designed to be horizontally scalable and flexible enough to handle various workloads.

In this chapter, we will use a cluster with 4 virtual machines: 1 GTM, 1 coordinator and 2 data nodes.

Machine IP Role
xlgtm 10.33.1.114 GTM
xlcoord 10.33.1.115 coordinator
xldata1 10.33.1.116 data node
xldata2 10.33.1.117 data node

On each machine, you need to clone Postgres-XL repository and compile it. You also need to set specific XL parameters on file postgresql.conf and make sure all machines are communicating to each other by adjusting file pg_hba.conf. More information on how Postgres-XL works and how to install it on Postgres-XL documentation. You can also refer to this blog post.

Creating a test environment

OmniDB repository provides a 4-node Vagrant test environment. If you want to use it, please do the following:

git clone --depth 1  https://github.com/OmniDB/OmniDB 
cd OmniDB/OmniDB_app/tests/vagrant/xl-9.5/
vagrant up

It will take a while, but once finished, 4 virtual machines with IP addresses 10.33.1.114, 10.33.1.115, 10.33.1.116 and 10.33.1.117 will be up and each of them will have Postgres-XL 9.5 up and listening to port 5432, with all settings needed. To create all nodes, please do:

vagrant ssh xlcoord -c '/vagrant/setup.sh 10.33.1.115 10.33.1.116 10.33.1.117'
vagrant ssh xldata1 -c '/vagrant/setup.sh 10.33.1.115 10.33.1.116 10.33.1.117'
vagrant ssh xldata2 -c '/vagrant/setup.sh 10.33.1.115 10.33.1.116 10.33.1.117'

Then connect to the coordinator and define a password for the postgres user:

$ vagrant ssh xlcoord -c 'sudo su - postgres -c /usr/local/pgsql/bin/psql'
psql (PGXL 9.5r1.6, based on PG 9.5.12 (Postgres-XL 9.5r1.6))
Type "help" for help.

postgres=# ALTER USER postgres PASSWORD 'omnidb';
ALTER ROLE
postgres=#

Now the XL cluster will be ready for you to use.

Install OmniDB XL plugin

OmniDB core does not support XL by default. You will need to download and install XL plugin. If you are using OmniDB server, these are the steps:

wget  https://omnidb.org/dist/plugins/omnidb-xl_1.0.0.zip 
unzip omnidb-xl_1.0.0.zip
sudo cp -r plugins/ static/ /opt/omnidb-server/OmniDB_app/
sudo systemctl restart omnidb

And then refresh the OmniDB web page in the browser.

For OmniDB app, these are the steps:

wget  https://omnidb.org/dist/plugins/omnidb-xl_1.0.0.zip 
unzip omnidb-xl_1.0.0.zip
sudo cp -r plugins/ static/ /opt/omnidb-app/resources/app/omnidb-server/OmniDB_app/

And then restart OmniDB app.

If everything worked correctly, by clicking on the "plugins" icon in the top right corner, you will see the plugin installed and enabled:

Connecting to the cluster

Let's use OmniDB to connect to the coordinator node. First of all, fill out connection info in the connection grid:

Then select the connection. You will see OmniDB workspace window. Expand the tree root node. Note that OmniDB identifies it is connected to a Postgres-XL cluster and shows a specific node called Postgres-XL just inside the tree root node. Expand this node to see all the nodes we have in our cluster:

Creating a HASH table

From the root node, expand Schemas, then public, then right click on the Tables node. Click on Create Table. Name your new table, add some columns to it and do not forget to add a primary key too:

When done, click on the Save Changes button. Now right click on the Tables node and click on Refresh. You will see the new table created. Expand it to see that there is also a Postgres-XL node inside of it. Check its properties.

By default, Postgres-XL always try to create a table distributed by HASH. It means that the data will be split into the nodes regularly, through a hash function applied on the specified column. If present, it will use the primary key, or a unique constraint otherwise. If there is no primary key nor unique constraint, Postgres-XL uses the first eligible column. If not possible to distribute by HASH, then Postgres-XL will create the table distributed by ROUNDROBIN, which means that the data will be split in a way that every new row will be added to a different data node.

Now let's add some rows in our new table. Right click on the table, then go to Data Actions and then click on Edit Data. Add some rows and then click on the Save Changes button:

Right click on the table again, Data Actions, Query Data. You will see that cluster-wide the table has all data inside.

But how the data was distributed in the data nodes? In the Postgres-XL main node, right click on each node and click on Execute Direct. Adjust the query that will be executed directly into the data node, as you can see below.

Creating a REPLICATION table

While HASH distribution is great for write-only and write-mainly tables, REPLICATION distribution is great for read-only and read-mainly tables. However, a table distributed by REPLICATION will store all data in all nodes it is located.

In order to create a REPLICATION table, let us create a new table like we did before:

Note how by default it was created as a HASH table:

Let us change the distribution type of the table by right-clicking on the Postgres-XL node inside the table, and then clicking on Alter Distribution. Uncomment the "REPLICATION" line and execute the command:

You can check the distribution was successfully changed by right-clicking on the Postgres-XL node and clicking on Refresh. The properties will now show Distributed by: replication.

Now add some data to the table:

And then check that all data exist on all data nodes: