Use DRBD in a cluster with Corosync and Pacemaker on CentOS 7

When configuring a cluster, you want tot keep managing the server as simple as possible. Theoretically, the results given by any node in the cluster should be equal as you want the cluster to be transparent to the end-user. Part of doing this, is having the same data available on every node of the cluster when it’s active. One way to do this, is using a central file-share, for example over NFS but this also has disadvantages. Another way is to have a distributed file system that stays on the nodes itself. DRBD is one of them. This post explains how to integrate DRBD in a cluster with Corosync and Pacemaker.

DRBD stands for Distributed Replicated Block Device and the name already explains what it is. DRBD presents a layer on top of a normal block device and is responsible for keeping it synchronized over multiple nodes. Simplified, you can compare DRBD with a RAID1-array over multiple devices in different nodes instead of over multiple devices on the same node.

In this post, I will continue with the setup which was created earlier in Building a high-available failover cluster with Pacemaker, Corosync & PCS. So if you’re looking for the basic configuration of a cluster, have a look here. I assume, for this post, that you got a working cluster with Corosync and Pacemaker.

The goal of these actions is to have the data for the Apache webserver synchronized over both nodes. In the example, the configured webserver was presenting the local data which we even used to identify the nodes.

Since RHEL 7, Red Hat doesn’t officially support DRBD anymore. Support for DRBD is still available via an external partner. This also means that CentOS, one of the RHEL derivatives doesn’t have the DRBD packages available. Fortunately, ELRepo still provides what we need to get going with DRBD.

Installing DRBD

More info about ELRepo can be found here: http://elrepo.org/tiki/drbd83-utils.

The first step in adding DRBD to the existing cluster is to configure the nodes to use the ELRepo-repository:

After adding the repository, we can install drbd as before:

Create a (logical) volume for DRBD

DRBD provides a way to distribute a block device over multiple nodes. In order to do so, we need a block device to distribute (sound logical, doesn’t it). The block device can be any available device, it doesn’t need to be a logical volume but can be a pure physical partition too. For this post, I’ll add a new logical volume on both nodes to use as my distributed block device. The size of the devices needs to be equal on all nodes.

First I’ll check if I still have some free space in my volume group to create the logical volume:

After checking what is available, I’ll create a new logical volume, named lv_drdb0 in that volume group:

After these steps, both nodes contain a logical volume called lv_drbd0 that are equal in size (1GB).

Configuring DRBD for Single-primary mode

Now that we have a block device to share, we’ll configure DRBD to do so. Our DRBD setup will be in single-primary mode. This means that only one of the nodes can be the primary. This means that the data is only manipulated from one point at a time. For other configurations, you will need non-standard file systems. The goal of my setup is to use a standard file system like EXT3, EXT4 or XFS.

Before we will start configure DRBD, we’ll have to open two TCP-ports on our firewall in order for the nodes to communicate with eachother. DRBD uses, by default, port 7788 and 7799.

The next step is to create the DRBD configuration files. These files need to be completely equal on all nodes in the cluster.

File /etc/drbd.d/global_common.conf (on node01 and node02):

File /etc/drbd.d/drbd0.res (on node01 and node02):

Now that we have the configuration in place, it’s time to initialize our data:

At this point, we’re ready to start DRBD on both nodes and bring the drbd0 resource up. After bringing the resource up, let’s check the status:

As you can see in the contents of /proc/drbd, DRBD marks this resource as inconsistent. This is because we didn’t tell DRBD who is the primary node. In the above output you can see that both nodes are thinging that they are the secondary node. All configuration files are identical on both nodes so at this point, they’re considered equal.

Let’s configure node01 as the primary and check the status again:

As you can see in the above output now, the data is synced from node01 (which we set as the primary node) to node02 (which is now the secondary node. The volume which we used (lv_drbd0) was empty but since DRBD is not looking at the contents (blocks) it still needs to synchronize the blocks on the volume.

After a while, the synchronization finished and the resource is considered UpToDate:

To be sure, we can check this in another way. Here you can also clearly see who’s the primary and who’s the secondary:

 Create a file system on the DRBD resource

Now that the resource is synced over both nodes, we can start creating the actual file system for the block device. For DRBD, the contents of the file system isn’t important, DRBD only cares that the “blocks” are equal on both sides. This will also happen with the file system which we create:

After creating a filesystem on the DRBD resource, we can start putting data on the FS:

Test the failover

To do a (clean) manual failover, we can simply switch the primary and secondary nodes and check if the data got replicated to the second node and back:

In case you try to mount the resource on a node which is considered secondary, you should a message similar to this:

Add the DRBD resource to our previously configured Pacemaker/Corosync cluster

In my previous post, I created a cluster with Apache to serve webpages in a high available setup. both nodes had to have an identical set of webpages in order to server the exact same content regardless of which node was the active one. Now we’ll add the DRBD-resource to the cluster and move the data for the website to the resource.

To configure DRDB on our cluster, we’ll first edit the configuration as we want it to be and then push it to the actual, running configuration. To accomplish this, we can create a new CIB (Cluster Information Base). Basically it’s a file that contains the complete cluster configuration.

If you would look at the contents of the file, you would find the complete, currently active, configuration in there.

Now, let’s add the changes to the cib-file to include our DRBD resource:

We can query the CIB as we would do with the normal active configuration:

When the configuration is as want it to be, we can activate it by pushing it to the cluster:

When looking at the status of our cluster now, we see that something clearly went wrong:

After a little investigation, it seems that the permissions on file /var/lib/pacemaker/cores do not allow write from DRBD. It found this by looking at /var/log/audit/audit.log:

The message saying that dac_read_search is denied means that the user doesn’t have access to the file but tries to elevate it’s permissions to get it.

We’ll change the permissions to world writable (not secure) to fix this on both nodes:

After restarting the cluster, it’s still not working and now we can stil see very similar messages in the audit.log:

The difference is that DRBD does have access to the file now but SELinux is trying to prevent access. To see which SELinux boolean we need to enable, if there is one to allow access:

As the output of audit2allow states, we can set daemons_enable_cluster_mode to enabled on both nodes to fix this:

When restarting the cluster, things should look a little better. Unfortunately, there is still a problem which doesn’t allow us to stop the cluster. Again SELinux is not allow us to take this action. Which is clear when looking at /var/log/audit/audit.log:

We can allow this action by creating a new SELinux module and to load it:

We’ll copy the module to node02 and load it there too:

Now we can stop and start the cluster again and the output of the status command looks better:

The above output shows that the resource is successfully added and is started but since our master was still node02, this isn’t really how we want it. The master should be the node that is owning the virtual IP and is running the webserver.

First, we’ll create a filesystem resource on the cluster, using a new cib-file:

Then we’ll create some constraints for the added resource:
The filesystem of the webserver should be made available on the master:

DRBD should first be started and then the file system should be made available:

Apache and the file system should be running on the same node

The file system needs to be made available before Apache is started:

Next step is to acutally apply the changes which we made in the cib-file on the actual running configuration:

Normally, the actions should be performed immediately but I had to stop and start the cluster in order to get things working but this should be the result:

As the output shows, the DRBD master has become node01 and all resources were started on that node. Our constraints made sure that before the webserver was started, the virtual IP was available and that the DRBD-resource was mounted and available.

To be sure that the DRBD configuration went fine, we can check if our filesystem-resource did it’s work:

Finally, we can add data for our website to the mount: Let’s start with creating an index.html on our DRBD file system:

When stopping node01, everything should switch to node02 including the data which we just modified on node01:

[jensd@node01 ~]$ sudo pcs cluster stop
Stopping Cluster…

When checking the website, we get the modified webpage:

The above example was made for a webserver but can be used for various other services too. Unfortunately DRBD isn’t really supported as it should on RHEL or CentOS (at least not for free) so we need to perform some additional steps in order to get things working.

22 thoughts on “Use DRBD in a cluster with Corosync and Pacemaker on CentOS 7

  1. Hi, I followed your tutorial, but I am stucking now when initializing the data.

    # sudo drbdadm create-md drbd0
    ‘drbd0’ not defined in your config (for this host).

    Any idea?

    Thanks

    Maik

    • Maik,

      Check the config in /etc/drbd.d/drbd0.res. Maybe a syntax error? My best guess is that the problem is situated there.

      Let me know if that helps.

      • Thanks for your reply.
        I fixed it myself. Next time I should take more care about the hostnames at the config…..
        I got it working exactly the way you explained, but I had to stop iptables to get the disk sync working. Maybe you made a mistake with the ports, you opened?

  2. Nice … all is working well, I added virtual IP and FileSystem (start file system, then VirtualIP …. and all I need is NFS configuration to have complete HA storage)

    is there tutorial or something to add NFS share (to be high-avaliable) on configuration in tutorial here …. I need PCS cib for NFS, all I can find is crm_shell

    • Don’t know if a CIB exists for NFS but I would think that the IP failover in combination with shared storage (with DRBD) should be enough to make your NFS-share high available. The NFS-client shouldn’t really experience problems in case of failover and the IP and data stays the same.

      • I actually try to adopt https://www.howtoforge.com/high_availability_nfs_drbd_heartbeat_p3 to your HA tutorial …. To add NFS failover to another node …. I stuck at creating pcs resource for NFS server …. I try to start drbd_fs then VirtualIP than NFS share on primary node…. And on stoping primary node that all services failover on another node…. Drbd_fs and VirtualIP are automaticly started od another node, …. But I cannot figure NFS failover….

  3. Fantastic Tutorial! Worked like a charm. Any chance you might have a tutorial on how to setup drbd in a primary/primary pcs cluster?

  4. Im setting up a ESXi cluster web/file server with a shared HDD. It was working until I went to setup the DRBD. I had to restore the 2nd server from backup now I’m getting this:

    [root@ATCServer01 ~]# sudo pcs status
    Cluster name: cluster_web
    Last updated: Fri Feb 6 09:05:47 2015
    Last change: Fri Feb 6 06:38:20 2015 via cibadmin on ATCServer01
    Stack: corosync
    Current DC: ATCServer01 (1) – partition with quorum
    Version: 1.1.10-32.el7_0.1-368c726
    2 Nodes configured
    4 Resources configured

    Online: [ ATCServer01 ATCServer02 ]

    Full list of resources:

    virtual_ip (ocf::heartbeat:IPaddr2): Started ATCServer01
    webserver (ocf::heartbeat:apache): Started ATCServer01
    Master/Slave Set: webserver_data_sync [webserver_data]
    webserver_data (ocf::linbit:drbd): FAILED ATCServer02 (unmanaged)
    Masters: [ ATCServer01 ]

    Failed actions:
    webserver_data_stop_0 on ATCServer02 ‘not installed’ (5): call=18, status=complete, last-rc-change=’Fri Feb 6 09:00:55 2015′, queued=15281ms, exec=0ms

    PCSD Status:
    ATCServer01: Online
    ATCServer02: Online

    Daemon Status:
    corosync: active/enabled
    pacemaker: active/enabled
    pcsd: active/enabled
    [root@ATCServer01 ~]# sudo crm_verify -L -V
    error: unpack_rsc_op: No further recovery can be attempted for webserver_data:1: stop action failed with ‘not installed’ (5)

    I ran: sudo tail -100 /var/log/audit/audit.log|grep AVC|tail -1|audit2allow -M drdb_1
    The reply is there is nothing to do

    [root@ATCServer02 ~]# sudo ls -al /var/lib/pacemaker/cores
    total 0
    drwxrwxrwx. 2 hacluster haclient 6 Feb 6 08:36 .
    drwxr-x—. 6 hacluster haclient 57 Feb 5 12:09 ..

    [root@ATCServer02 ~]# sudo drbd-overview
    0:drbd0/0 Unconfigured . .

    [root@ATCServer02 ~]# sudo drbdadm up drbd0
    –== Thank you for participating in the global usage survey ==–
    The server’s response is:
    node already registered

    I have been working on this for 2 days and can’t figure it out. I should let you know I’m noob with Linux. any help would be much appreciated, Thank you!!

    • Still nothing on HA NFS. But it would be nice to add similar tutorial for creating active/active DRBD with GFS2… That configuration would be more usuable in cluster enviroment for KVM/XEN live migrations (I use KVM) … and also, better solution than Primary/Secondary solution with NFS since there is less network operations since every host access shared storage as “local” disk, not as NFS (so no need for shared IP, no need to setup NFS, less network dataflow, and performace for two-node wold be better that way). Data on disks would be qcow images accessed only form one server (no metter which one), and both host can have live KVM machines (ex. 5 KVM on node1, 5 on node2, if needed I can migrate all to one host, perform restart or upgrade on another host, afther restart live migrate all or some to restarted/upgraded host).

      That way two node host would be more usuable, and not just backup copy of disk. Network bandwich is better utilised since KVM machines are on both hosts (each have network on their host) and only disk data is synchronized.

      Primary/Secondary have most network on one host, and other host only get disk synchronized. If I setup NFS on that scenario all network for NFS shares goes to primary. Having KVM machines on secondary host that access that NFS disk make additional NFS network data to primary which sync that bata back to secondary. Lots of unneccecary network data back and forth…. Using GFS saves bandwich for other better uses. So I’ll rather try to setup Active/Active setup….

      It would be nice that someone more expirienced than I create tutorial….

      • I forgot to mention that I already tryed GlusterFS and for some time it was working fine, but I had serious loose of data after upgrade of Gluster packages on CentOS7 (could not mount Gluster disk, all blame on forums was on CentOS gluster packages and their versioning but no any usuable solution for fixing that and get running again), …

        Never loost byte using DRBD so Gluster will never see me again.

      • As for Active/Active, I tried this quite intensive with Corosync and Pacemaker but in a corporate environment, I always seems to struggle with the network.

        Active/Active works with multicast MAC (yes, this isn’t a typo, such dinosaurs actually exist). The packages destined for the virtual IP will be forwarded to both nodes in the cluster but an iptables rule (CLUSTERIP) will filter these based on a hash of the source ip/port. The multicast MAC seems to confuse a lot of switches on ARP level. In theory enabling IGMP snooping should fix this but I never really got it working to a level where I fully trust it.

  5. webserver (ocf::heartbeat:apache): Stopped

    Failed actions:
    webserver_start_0 on centos01 ‘unknown error’ (1): call=22, status=Timed Out, last-rc-change=’Thu Mar 26 19:31:45 2015′, queued=40002ms, exec=0ms

    can anyone help? which part I should look for solution? The apache is running fine…

    • This looks like something is wrong with the status page which you configured for you webserver. Are you able to fetch the page yourself (with wget or curl) using the correct IP?

  6. As per DRBD

    For the purposes of this guide, we assume a very simple setup:
    • No other services are using TCP ports 7788 through 7799 on either host.

    Should:
    sudo iptables -I INPUT -p tcp -m state –state NEW -m tcp –dport 7788 -j ACCEPT

    be:
    sudo iptables -I INPUT -p tcp -m tcp -m state –dport 7788:7799 –state NEW -j ACCEPT

  7. Thank you for your great tutorial. I unfortunately made some mistakes and now i can not go ahead:
    1- I format lv_drbd0 with ext4 filesystem just after creating it (i then removed that lv and created without formatting)
    2- Despite u didnt mentioned i ran the below command for both node01 and node02.
    sudo drbdadm create-md drbd0
    3- I didnt stop pcsd.
    The result is that i can not make drbd0 “up” on node02 and the below messages is shown:
    No valid meta data found
    Common ‘drbdmeta 0 v08 /dev/vg_drbd/lv_drbd0 internal apply-al’ terminated with exit code 255
    Please help me to solve the problem.
    Thanks a lot

  8. Thanks for your tutorial. It’s really helpful! I searched many websites to find how to setup DRDB with pacemaker, but none of them were detailed enough to follow. It’s really a wonderful post of your blog.

  9. Pingback: DRBD – CentOS 7 – LVM – Free Software Servers

  10. Hello , What a create tutorial. Do you know how to set this up so that a back end storage like LUN on a SAN is inactive on the passive node on the cluster?

Leave a Reply

Your email address will not be published. Required fields are marked *