Orchestrating Couchbase on OneOps

couchbase

Couchbase is an open-source distributed, NoSQL document-oriented database that typically is well suited for powering the high performance web, mobile & IoT applications. For authoritative Couchbase use cases and notable users, please visit link1 and link2.

OneOps orchestrated the Couchbase pack, both “community” and “enterprise” versions. Please see here for a comparison. Recently, Couchbase, Inc. announced to support Couchbase (Enterprise) that is deployed and managed by OneOps, which was a win-win strategy to both parties and provided a good example of how the technology vendors play a crucial role in the OneOps ecosystem.

In this post, I plan to introduce the Couchbase OneOps pack from three aspects:

  • Deployment
  • Operation
  • Monitoring

Deployment

As we will see later in this blog that Couchbase will emit metric data to Graphite for monitoring purpose, we need a running Graphite instance upfront. In my previous blog about Graphite on OneOps, we could follow the steps there to deploy a Graphite first (possibly in a different OneOps assembly).

Then in the Design phase, create a Couchbase platform by choosing “CouchBase” pack.

Screen Shot 2016-08-11 at 11.38.06 PM.png

After creating a Couchbase platform, there may be several parameters to review:

  • couchbase component: by default, it will deploy “community” version, but “enterprise” version is also provided if commercial supports and more features are needed. Change the Admin User and Password if the default ones do not work perfectly (the password is “password” by default). In future blog, we may review how to set up Email server to deliver alerts. For demo we do not need to change these now.

Screen Shot 2016-08-12 at 12.57.06 AM.png

  • bucket component: 1 default bucket will be created after the Couchbase deployment. Here the bucket name, password, number of replicas could be tuned up. (The default password for Bucket is “password”). In case we want to have more buckets, we could create them now, or later when we actually need them.

Screen Shot 2016-08-12 at 12.52.24 AM.png

  • diagnostic_cache component: please add the list of graphite servers (ip or FQDN) in a Graphite cluster , such as graphite_url_1:2003,graphite_url_2:2003 Note that the metric data will be sent to the first working graphite server in the list, so these URL should belong to only one Graphite cluster. In case we use FQDN for accessing a Graphite cluster, it is a plus to put multiple times of graphite_fqdn:2003 on purpose (graphite_fqdn:2003,graphite_fqdn:2003). The benefit of doing this: in case the FQDN -> ip resolution fails the first time (e.g. network transient), it gives more few chances to re-try the resolution.

Also we could add our own SSH key to the user-app component so that we could log into the VM later on.

After committing the Design, create a new environment with “Availability Mode = redundant” and choose 1 cloud as “primary cloud”. Regarding how to set up a cloud in OneOps, please refer to one of my previous blogs.

By default, a Couchbase cluster with 3 VMs will be created. The deployment plan will resemble the following: (number of compute instances is 3, denoting 3 VMs will be created)

Screen Shot 2016-08-12 at 1.53.37 AM.png

After the deployment, we could open a web browser and  visit Couchbase Web Console (your_couchbase_platform_fqdn:8091) to verify the cluster information and even do some operational work (will cover later). To get the couchbase platform FQDN, go to Operate phase, click your_couchbase_platform_name on the right,  find fqdn component and click into, the shorter URL is your_couchbase_platform_fqdn

Screen Shot 2016-08-12 at 2.01.37 AM.png
By Default, Username: Administrator, Password: password

Operation

Typical operations for Couchbase could be done at Couchbase Web Console:

  1. Add Server
  2. Fail over
  3. Remove Server
  4. Rebalance and etc.

Screen Shot 2016-08-12 at 9.51.43 AM

Interestingly, some of above operations could also be done on OneOps UI as well. For example, go to Operate phase, click your_couchbase_platform_name on the right,  findcouchbase component and click into, we may find multiple instances of couchbase.

Choose any one of them, then click Choose Action To Execute, we will see a drop-down list of actions that could run on this couchbase instance.

Screen Shot 2016-08-12 at 10.07.36 AM.png

One distinction between OneOps and some Automation tool is that: OneOps provides full flexibility to define operational actions associated with the pack. Take Couchbase cookbook for an example, in the “recipe” folder we could find the corresponding recipes for each operational action, for instance “add-to-cluster.rb“. The magic to present those operational actions on the front-end is the cookbook metadata file. (Typically defined at bottom of the metadata file).

Another operational highlight is the cluster-wise operation. Go to “Operate” tab, click your_couchbase_platform_name on the right,  find couchbase-cluster component and click into, then we will see only one couchbase cluster instance. The following picture shows the list of operational actions that could run on the cluster-wise.

Screen Shot 2016-08-12 at 11.24.40 AM

For this demo, we could run cluster-health-check which will check the following items to make sure the cluster is running in good state:

  1. if automatic fail over is enabled
  2. if the node (VM) is in healthy state
  3. if data is highly available in each bucket (e.g. replica exists and spread evenly over all nodes)
  4. if the nodes seen by OneOps are the same ones that are seen by Couchbase
  5. if the buckets seen by OneOps are the same ones that are seen by Couchbase
  6. if quota reset is not needed
  7. if multiple nodes  (VMs) are not sitting on the same hypervisor

If any of the answer to the above question is NO, the cluster-health-check will show fail status and will point out at which step it got failed. For example, more than one node/VM could be launched on the same hypervisor, leading to a higher risk of when a hypervisor is down, multiple VM will be offline at same time.

Screen Shot 2016-08-12 at 1.33.03 PM

If everything looks good, we will not see the red color output from this cluster-health-check operation.

Monitoring

A production-driven system can not live without extensive monitoring. Couchbase pack is a great example of monitoring and alerting.

The monitoring part of Couchbase will be introduced in 2 parts:

Graphite

Remember that we mentioned about Couchbase deployment needs a Graphite instance upfront to present the Couchbase performance metrics. Now let’s look at the Graphite and see what we could get from it.

After opening the Graphite dashboard, we could navigate to the folder that contains Couchbase metrics. See below for an example.

Screen Shot 2016-08-12 at 3.01.34 PM

In the root directory, it contains many metrics about disk, memory usage, healthy node info and rebalance. Two sub-directories are buckets and nodes, which contains the metrics about all buckets and all nodes that we could further drill down. Let’s take a node for an example, if we want to visualize the number of operation (ops) on a certain Couchbase node, we could pick up a node and click “ops” icon and visualize the metrics over the time.

Graphite_couchbase

OneOps UI Monitor

Couchbase pack also emits some metrics to OneOps Monitor on UI. To visualize those, go to “Operate” tab, click your_couchbase_platform_name on the right,  find diagnostic_cache component, choose any one of the multiple diagnostic_cache instances (where each one corresponds to a Couchbase node, identified by the tailing numbers). Then click monitors tab which will show a list of monitored metrics on OneOps UI:

Screen Shot 2016-08-17 at 11.08.21 AM

For example, we want to look for the Disk Performance, so we just click Cluster Health Info and scroll down to find the corresponding chart about Disk Performance.

Screen Shot 2016-08-17 at 11.14.48 AM.png

Alerting could be optionally associated to some monitored metric. For example, if Cache Miss Ratio is too high (e.g. over 50%), the alert will be fired – an alerting message will show up on OneOps “Operate” UI (and will be sent to the sign-up email account after email notification is enabled).

Another metric to check if Couchbase is effectively used is Docs Resident. By default, if 100% of documents can not reside in memory over 5 minutes, the alert will be fired. On the other hand, the alert will “buzz off” after all documents sit in memory over 5 minutes.

Screen Shot 2016-08-12 at 4.05.15 PM
Alerting Message about “High Active Doc Resident”
Screen Shot 2016-08-12 at 4.03.04 PM
Recovery Message about “High Active Doc Resident”

 

Also it is very flexible to customize the criteria to trigger the alert case-by-case, as shown in the picture below.

Screen Shot 2016-08-17 at 11.17.44 AM

Summary

Couchbase pack is a great example of the application packs in the OneOps ecosystem which achieves:

  • fully automated deployment
  • “one-click” operational supports on node-level and cluster-wise
  • extensive monitoring and alerting

One huge benefit of OneOps is not only automating the deployment, similar as what other automation tools that already did, but to provide:

  1. an interface to use “code” to implement operational work once and simply present  as a button on UI. Anyone (e.g. Ops team, engineer) could repeatedly launch the operational work by “one-click” of button.
  2. 100% flexibility to define and customize any monitor that only a specific application or people care about. Visualize the metrics on-demand on the OneOps UI.
  3. seamless integration between alerting and monitoring,  so that any metric being monitored could also be optionally alerted by defining a threshold.

Given that many infrastructure technologies are based on the open-source offerings nowadays, the challenges for many organizations become: (1) pick up the right technology, and (2) operate it well in production.

I hope to see more OneOps application packs with rich set of monitoring and operational support, which are the “must-have” for a system live in production!

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s