Tag: Kibana

Building an ElasticSearch, Logstash, Kibana (ELK) Stack on OneOps

elk.png

ElasticSearch, Logstash, Kibana are massively popular open source projects that can compose an end-to-end stack which delivers actionable insights in real time from almost any type of structured and unstructured data source.

In short sentences:

  • Logstash is a tool for collecting, parsing, and transporting the logs for downstream use.
  • Kibana is a web interface that can be used to search and view the logs that Logstash has indexed.
  • ElasticSearch connects Logstash and Kibana, which is used for storing logs in a highly scalable, durable and available manner.

The following picture simply illustrate the relationship among them:

howELKworks

In OneOps application repository, we have all three of Logstash, Kibana, ElasticSearch, so in this blog I would like to introduce how to build ElasticSearch, Logstash, Kibana (ELK) stack on OneOps, by reproducing the demo shown in Visualizing Data with ELK.

Deploy Logstash and ElasticSearch

In fact Logstash has been shipped as an optional component of every application pack on OneOps, because though Logstash is not required for most applications, it is so generic to collect and transport the application logs, and could be conveniently enabled when this is needed.

For purposes of conciseness and demonstration, I will show the deployment of ElasticSearch together with Logstash, so that Logstash will run on every ElasticSearch node.

First, in Design phase, create a new ElasticSearch platform.

Screen Shot 2016-08-26 at 4.25.29 PM

After this, we may need to configure elasticsearchdownload and logstash components.

(1) elasticsearch component: if using the small compute (e.g. memory is less than 2GB), we may need to set Allocated Memory(MB)  to 512, otherwise ElasticSearch may run into JVM Out-of-Memory issue, because Logstash also needs to run in the same box (virtual machine) which additionally requires 512 MB heap size for launching its JVM.

Screen Shot 2016-08-26 at 4.28.38 PM

(2) download component: since we want to reproduce the demo in Visualizing Data with ELK, the data set used by that demo should be downloaded in advance. Fortunately, OneOps provides the download component so that anything hosted on internet could be automatically downloaded on every VM during the deployment. (Generally,  when we need to install some package, library or dependency, right after the VM is boot up, download component will do this job.)

Screen Shot 2016-08-29 at 9.35.31 AM

Save the download component and overall it should resemble:

Screen Shot 2016-08-26 at 4.54.57 PM.png

(3) logstash component: as we will run Logstash in the same box as ElasticSearch, we need to add a logstash component so that it will be deployed together with ElasticSearch. Note that the configuration steps described here also apply to other application who may want Logstash.

  • add a new logstash component
  • set Inputs to file {path => "/app/data.csv" start_position => "beginning" sincedb_path => "/app/sincedb.iis-logs"}
  • set Filters to csv {separator => "," columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]} mutate {convert => ["High", "float"]} mutate {convert => ["Open", "float"]} mutate {convert => ["Low", "float"]} mutate {convert => ["Close", "float"]} mutate {convert => ["Volume", "float"]}
  • set Outputs to elasticsearch {action => "index" host => "localhost" index => "stock" workers => 1} stdout {}

 

Finally looks like:

Screen Shot 2016-08-26 at 5.05.00 PM.png

Save the logstash component and overall it should resemble:
Screen Shot 2016-08-26 at 5.09.53 PM.png

Last we could add our own SSH key to the user-app component so that we could log into the VM later on.

Now we are ready to deploy ElasticSearch and Logstash. Create a new environment with Availability Mode = redundant and choose 1 cloud as primary cloud. Regarding how to set up a cloud in OneOps, please refer to one of my previous blogs.

By default, an ElasticSearch cluster with 2 VMs will be created. For serious use cases, a cluster with 3 nodes are needed. This is because discovery.zen.minimum_master_node should be set to 2 to avoid split brain and also could tolerate 1 node loss. The number of node could be adjusted in Scaling section after clicking  your_elasticsearch_platform_name in Transition phase.

Screen Shot 2016-08-26 at 11.56.13 PM

The deployment plan will resemble the following: (number of compute instances is 3, denoting 3 VMs and 3 ElasticSearch instances will be created)

Screen Shot 2016-08-27 at 12.03.46 AM.png

After the deployment, ElasticSearch and Logstash should be running automatically.

Deploy Kibana

As introduced, Kibana typically pairs with ElasticSearch to provide a virtualization dashboard of the search results.

First, choose to create a Kibana platform in the Design phase.

Screen Shot 2016-08-27 at 12.15.32 AM.png

Then we need to configure the kibana component and the only thing we need to take care is ElasticSearch Cluster FQDN including PORT. We could get your_elasticsearch_platform_fqdn from following steps:

In Transition phase, first choose the ElasticSearch environment, then go to Operate phase, click your_elasticsearch_platform_name on the right,  find fqdn component and click into, the shorter URL is your_elasticsearch_platform_fqdn

Prefix with “http://” and postfix with “:9200/”, ElasticSearch Cluster FQDN including PORT will look like:

http://your_elastic_search_platform_fqdn:9200/

The entire section of configuring kibana component may resemble as the following:

Screen Shot 2016-08-27 at 12.21.41 AM

Again we could add our own SSH key to the user-app component in order to log into the VM later on.

After saving the platform, we could start to create an environment followed by the deployment. Same as before, Availability Mode = redundant and choose 1 cloud as primary cloud.

By default, two independent Kibana instances  will be deployed, which will provide some redundancy when 1 Kibana goes down.

The deployment plan will resemble the following: (number of compute instances is 2, denoting 2 VMs and 2 Kibana instances will be created)

Screen Shot 2016-08-27 at 12.54.32 AM.png

After the deployment, we could check the platform-level FQDN of Kibana and use it for accessing the Kibana dashboard.

Open a web browser and go to: http://your_kibana_platform_fqdn:5601

By following the steps in Visualizing Data with ELK to create the visualization dashboards on Kibana.

Note that the data set used in Visualizing Data with ELK is historical, so we may need to increase the search span in timestamp on Kibana, in order to pull the historical data from ElasticSearch and present it. This change could be done at top-right corner. For example,

Screen Shot 2016-08-27 at 1.44.12 AM
Click “Last 15 minutes” to change search span

In the following picture, we set the search span to 30 years ago relative to today, so that the similar visualization graph will be shown as the one in Visualizing Data with ELK.

Screen Shot 2016-08-24 at 12.01.35 AM.png

Summary

In this blog, I introduced how to build a ELK stack on OneOps and verify it works end-to-end by reproducing a demo in Visualizing Data with ELK. The ELK stack discussed in this blog is still preliminary, as in a production environment, it is more scalable and practical to include Filebeat (previously logstash-forwarder) and Redis into the pipeline.

Filebeat is a lightweight tool and installed on every node for tailing the system or application log files and forwarding them to Logstash. Redis could serve as a buffer to cache the aggregated huge volume of logs collected from all nodes.

Also ElasticSearch could follow a better deployment architecture, which separates the master-eligible nodes from the data nodes, and potentially have dedicated client nodes for routing the requests and aggregating the search results. (In this blog, every ElasticSearch instance in the cluster acts as both master-eligible and data node.)

 

 

Advertisements