Sep 19, 2016

Setting up a DCOS-Marathon-Cluster on CentOS 7.2

Setting up a DCOS-Marathon-Cluster on CentOS 7.2


In the two weeks of my student internship by the QAware GmbH it was my turn to set up a test environment for cloud Applications, similar to the two already existing Cloud Suitcases. The novelty was that the cluster should run on DC/OS with Marathon on CentOS 7.2 to manage the running tasks on multiple machines.


finished cloud suitcase

Hardware


The hardware the QAware GmbH provided was similar to the predecessors of the Cloud Suitcase V3 and consisted of five Intel NUCs with each 32GByte RAM and a 250GByte SSD on it as well as a FritzBox Router and a LAN-Switch. An innovation was the FritzBox, which can be used as an interface between the QAware-LAN and the NUCs. This provides Internet access from each NUC and makes SSH access from the LAN to the NUCs easy. This is especially helpful when changes on the system are necessary or programs have to be installed.


CentOS


First of all CentOS 7.2 had to be installed on all computers. The easiest way to do this was to run a bootable USB-Flash-Drive. While the operating system was installing, I could set up the user- accounts. Each node on the Cluster gets a „Cloud“-User account additionally to the root User on the System.


Network Communication


All PCs are wired as shown in the picture aside: Each NUC is connected with the LAN-Switch. The FritzBox is connected to the Switch, too, allowing the NUCs to access the local LAN-network. This makes it also possible to access the Internet and is very useful to install packages. Furthermore the FritzBox is configured that over its own IP in local LAN the Master can be reached on Ports 80 for the DC/OS web-interface, on 22 for SSH to make efficiently changes on the system and on 8181 to watch the exhibitor service UI.

The FritzBox is also very helpful because for the DC/OS installation all nodes need a static IP address. Under CentOS 7.2 this can be done by editing etc/sysconfig/network-scripts/ifcfg-eno1 (The network interface is no more called eth0!) on all NUCs. Then the attribute “BOOTPORTO” has to be changed from dhcp to static and following lines have to be added:

IPADDR=192.168.1.100 -> static IP-address (100-104)
NETMASK=255.255.255.0 -> netmask
GATEWAY=192.168.1.1 -> gateway (FritzBox)
DNS1=192.168.1.1 -> DNS (FritzBox)
DNS2=8.8.8.8 -> reserve DNS


The FritzBox is adjusted to allow static IPs from 192.168.1.100 to 255, and to distribute IP-addresses over DHCP in the range from 192.168.1.20 to 99. Due to this configuration it is now possible to join the Network and access the NUCs just by connecting to the LAN-Switch and without a static IP.

SSH Access


To run DC/OS it is necessary that the nodes reach each other with passwordless SSH. To make it easier I decided to assign each static IP-address a hostname by editing the /etc /hosts file. Therefore I added these lines to the already existing ones on all NUCs:


192.168.1.100 node1
192.168.1.101 node2
192.168.1.102 node3
192.168.1.103 node4
192.168.1.104 node5


Subsequent I generated a private/public key pair on each node, a .ssh directory and copied the public keys to the other nodes. I also ensured that all access rights on the authorized_keys folder are correct.

Example (node1 → node2)
$ ssh-keygen -t rsa
$ ssh cloud@node2 mkdir -p .ssh
$ cat .ssh/id_rsa.pub | ssh cloud@node2 'cat >> .ssh/authorized_keys'
$ ssh cloud@node2 "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"


The next step for the DC/OS installation was to install Docker, because all DC/OS services are launched in docker-containers. Because only docker versions up to an including 1.11.2 are currently supported by DC/OS it is useful to pin the version, so that an automated update can’t change it. Even the DC/OS Installer itself tries to update docker. To fix this I used the yum plugin for version pinning:

$ sudo yum install yum-plugin-versionlock $ sudo yum versionlock add docker-1.11.2

To run the DC/OS Installer successfully the cloud user also needs passwordless sudo access. This can be changed in /etc/sudoers by adding following rule:

cloud ALL=(ALL) NOPASSWD: ALL

DC/OS installation


Finally I could install DC/OS by downloading the shell script for the GUI-Installer for DC/OS with Marathon on my bootstrap node and start it. It generates a web Interface in the local network, which can be watched in the browser. At this time the roles of the nodes in the cluster have to be defined:
 

  • (Node1: Bootstrap Node)
  • Node2: Master
  • Node3: Agent
  • Node4: Agent
  • Node5: Agent

The SSH username is, in our case, “cloud” and has to exist on all nodes of the cluster. SSH is default with port 22. In the “Private SSH Key” I copy-pasted the masters key. I also had to define the FritzBox IP as DNS upstream server.

The IP-detect scripts you can choose from in the installer, return each computer of the cluster its own IP-address. None of the suggested scripts like the ones from AWS or Azure worked for me, so I grabbed a script from the DC/OS Website and modified it. It now uses “ip addr sh” instead of “netstat” (which is not supported under CentOS 7.2 by default) and consideres that the network interfaces name is now eno1 and not eth0:

#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show eno1 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]
{1,3}\.[0-9]{1,3}' | head -1)


The advantage to make the bootstrap node permanently part of the cluster environment, instead of running it in a virtual machine is, that we can use it to install the DC/OS command line interface after GUI installation permanently on it. The CLI is very useful for configuring the cluster or for installing own marathon apps. Also the web UI doesn’t support all functions jet. To install it I simply started a shell script from the DC/OS website. From now I can use the “dcos” command in shell.

After finishing the GUI-installer I could reach the DC/OS web interface by the masters’ IP-address in the browser. Here you can find all information about system utilization, installed services and nodes, as well as you can access the services own web UI like the marathon web interface. In addition you can download new services over the universe tab and install it on the cluster. 




 

Apache Spark on our cluster


To test if DC/OS is able to run services, I installed Apache Spark on the cluster. Because I installed the service via universe I had to manually install the spark CLI on my bootstrap node. After that I got all commands of the family „$dcos spark“. Thereupon I tested the spark installation with an example code from mesophere: 


$ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi
https://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar"

Kubepad 

 

To visualize and control the tasks on the cluster I installed the KubePad application written by Mario-Leander Reimer (https://github.com/qaware/kubepad) on my Laptop which uses a novation launchpad to change shares and task numbers of DC/OS services. To run the App successfully it has to read the dcos.toml file. So I copied the file from my bootstrap node. If I now change the running services on the launchpad I can directly watch these changes over my browser.






written by 
Jonathan Richter






Aug 19, 2016

Analyzing performance issues in a large scale system with ELK

Application overview

I’m working on a large project here at QAware. Besides a lot of source code, we have our project running on an extensive amount of servers. 
The following figure gives you a brief overview. The rectangles are servers and the strange A* items are the best magnifying glasses I can paint by myself (and they represent analysing points).





All servers exist at least as pairs (up to larger clusters). Ignoring the relational database servers this leaves us at about 54 servers in production.
The customer has a separate operations team running the application, but in the following case we were asked for help.


Problem report

Users of a 3rd party application using our services reported performance issues. It was not clear on which occasions this happens or which services are affected.

Unfortunately the operations team does not run performance monitoring.


Analysing the problem

Fortunately we were prepared for these kinds of problems:
  • We have log files on each server and include them centrally in Elasticsearch.
  • We have Kibana running to search and visualize the data.
  • We log performance information on all servers.
  • We can relate log statements from each server to one specific request.
Visualizing this information with charts in Kibana was a huge help to track down the problem. Here are some key figures (I left out some steps).

A1 - Check if we have a problem

I searched for incoming service calls on point A1 (see application overview) and created a pie chart. Each slice represents a duration range for how long the request took.
A1 is our access point for service calls and it is therefore the best spot to determine if services are slow. I chose the pie chart to get a fast overview of all request and the distribution of their runtime duration.






Only the large green slice represents service calls below 5s duration.

Steps in Kibana:

  • Choose Visualize
  • Choose 'Pie chart'
  • Choose 'From a new search'
  • Enter <Query for all Service on A1>
  • Under 'buckets' click on 'Split slices'
  • Set up slices as follows
    • Aggregation: Histogram
    • Field: <duration in ms field>
    • Interval: 5000
  • Press Play

Result:
We clearly had a problem. There are complex business services involved, but a response time above 5s is unacceptable. In the analysed time slot 20% of the service calls took longer!

A2 - Show the search performance

I choose the most basic search (more or less an ID lookup) which is performed inside the application (see point A2 in the application overview) and created a line chart for the request time.
By choosing a point between application server and database, I basically split the application in half and checked where the time was lost.

This time I used a line chart with date histogram, to show if there is any relation between slow service calls and the time of the day.




Steps in Kibana:
  • Choose Visualize
  • Choose 'Line chart'
  • Choose 'From a new search'
  • Enter <Query for the basic search on A2>
  • Set up 'metrics' -> 'Y-Axis' as follows
    • Aggregation: Average
    • Field: <duration field>
  • Under 'buckets' click on 'X-Axis'
  • Set up X-Axis as follows
    • Aggregation: Date Histogram
    • Field: @timestamp
    • Interval: Auto
  • Press Play


Result:
As you can see the duration time skyrockets in some hours and you could see the same graph on every work day. Conclusion: There is a load problem.

A3 – Check the search performance of different SOLRs

I made another visualization for the different SOLRs we run. We have one for each language. I basically took the line chart from A2 and added a sub bucket. This way you can split up the graph by a new dimension (in our case the language) and see if it is related to the problem.



Steps in Kibana:

  • Choose Visualize
  • Choose 'Line chart'
  • Choose 'From a new search'
  • Enter <Query for all searches on A3> 
  • Set up 'metrics' -> 'Y-Axis' as follows
    • Aggregation: Average
    • Field: <duration field>
  • Under 'buckets' click on 'X-Axis'
  • Set up X-Axis as follows
    • Aggregation: Date Histogram
    • Field: @timestamp
    • Interval: Auto
  • Click on 'Add sub-buckets'
  • Click on 'Split Lines'
  • Set up 'Split Lines' as follows
    • Aggregation: Terms
    • Field: <language field>
    • Top: 20 (in our case)
  • Press Play


Results:
We could see the load problem equally distributed among all languages. Which makes no sense, because we have minor languages that never get much load. A quick look on some query times in the SOLRs confirmed that. The queries itself were fast.

Result

We knew it was a load problem and it was not a problem of the SOLRs or the application itself. Possible bottlenecks left were the apache reverse proxy or the network itself. Both of them wouldn’t have been my initial guess.

Shortly afterwards we helped the operations team to track down a misconfigured SOLR reverse proxy. It used file caching on a network device!


Conclusion 


  • Visualizing the data was a crucial help for us to locate the problem. If you only look at a list of independent log entries in text form, it is much harder to make the correct conclusions.
  • Use different charts depending on the question you want to answer.
  • Use visual log analysing tools like Kibana (ELK stack). You can use them for free and they can definitely help a lot.

Jun 15, 2016

Locking alternatives in Java 8


Abstract

To provide synchronized data cache access, I discuss three alternatives in Java 8: synchronized() blocks, ReadWriteLock and StampedLock (new in Java 8). I show code snippets and compare the performance impact on a real world application.

The use case

Consider the following use case: A data cache that holds key-value pairs and needs to be accessed by several threads concurrently.

One option is to use a synchronized container like ConcurrentHashMap or Collections.synchronizedMap(map). Those have their own considerations, but will not be handled in this article.

In our use case, we want to store arbitrary objects into the cache and retrieve them by Integer keys in the range of 0..n. As memory usage and performance is critical in our application, we decided to use a good, old array instead of a more sophisticated container like a Map.
A naive implementation allowing multi-threaded access without any synchronization can cause subtle, hard to find data inconsistencies:
  • Memory visibility: Threads may see the array in different states (see explanation).
  • Race conditions: Writing at the same time may cause one thread's change to be lost (see explanation)
Thus, we need to provide some form of synchronization.

To fix the problem of memory visibility, Java's volatile keyword seems to be the perfect fit. However, making an array volatile has not the desired effect because it makes accesssing the array variable atomic, but not accessing the arrays content.

In case the array's payload is Integer or Long values, you might consider AtomicIntegerArray or AtomicLongArray. But in our case, we want to support arbitrary values, i.e. Objects.

Traditionally, there are two ways in Java to do synchronization: synchronized() blocks and ReadWriteLock. Java 8 provides another alternative called StampedLock. There are propably more exotic ways, but I will focus on these three relatively easy to implement and well understood ways.

For each approach, I will provide a short explanation and a code snippet for the cache's read and write methods.

Synchronized

synchronized is a Java keyword that can be used to restrict the execution of code blocks or methods to one thread at a time. Using synchronized is straight forward - just make sure to not miss any code that needs to be synchronized. The downside is, you can't differentiate between read and write access (the other two alternatives will). If one thread enters the synchronized block, everyone else will be locked. On the upside, as a core language feature, it is well optimized in the JVM.

public class Cache {
  private Object[] data;
  private final Object lock = new Object();

  public Object read(int key) {
    synchronized (lock) {
      if (data.length <= key) {
        return null;
      }

      return data[key];
    }
  }

  public void write(int key, Object value) {
    synchronized (lock) {
        ensureRange(key); // enlarges the array if necessary
        data[key] = value;
    }
  }
}

 

ReadWriteLock

ReadWriteLock is an interface. If I say ReadWriteLock, I mean its only standard library implementation ReentrantReadWriteLock. The basic idea is to have two locks: one for write access and one for read access. While writing locks out everyone else (like synchronized), multiple threads may read concurrently. If there are more readers than writers, this leads to less threads being blocked and therefor better performance.

public class Cache {
  private Object[] data;
  private final ReadWriteLock lock = new ReentrantReadWriteLock();

  public Object read(int key) {
    lock.readLock().lock();
    try {
      if (data.length <= key) {
        return null;
      }

      return data[key];
    } finally {
      lock.readLock().unlock();
    }
  }
 

public void write(int key, Object value) {
  lock.writeLock().lock();
    try {
      ensureRange(key); // enlarges the array if necessary
      data[key] = value;
    } finally {
      lock.writeLock().unlock();
    }
  }
}


StampedLock

StampedLock is a new addition in Java 8. It is similiar to ReadWriteLock in that it also has separate read and write locks. The methods used to aquire locks return a "stamp" (long value), that represents a lock state. I like to think of the stamp as the "version" of the data in terms of data visibility. This makes a new locking strategy possible: the "optimistic read". An optimistic read means to aquire a stamp (but no actual lock), read without locking and afterwards validate the lock, i.e. check if it was ok to read without a lock. If we were too optimistic and it turns out someone else wrote in the meantime, the stamp would be invalid. In this case, we have no choice but to acquire a real read lock and read the value again.

Like ReadWriteLock, StampedLock is efficient if there is more read than write access. It can save a lot overhead to not have to acquire and release locks for every read access. On the other hand, if reading is expensive, reading twice from time to time may also hurt.

public class Cache {
  private Object[] data;
  private final StampedLock lock = new StampedLock();

  public Object read(int key) {
    long stamp = lock.tryOptimisticRead();

    // Read the value optimistically (may be outdated).
    Object value = null;
    if (data.length > key) {
      value = data[key];
    }

    // Validate the stamp - if it is outdated,

    // acquire a read lock and read the value again.
    if (lock.validate(stamp)) {
      return value;
    } else {
      stamp = lock.readLock();

      try {
        if (data.length <= key) {
          return null;
        }

        return data[key];
      } finally {
        lock.unlock(stamp);
      }
    }
  }

  public void write(int key, Object value) {
    long stamp = lock.writeLock();
    try {
      ensureRange(key); // enlarges the array if necessary
      data[key] = value;
    } finally {
      lock.unlock(stamp);
    }
  }
}


Benchmark

All three alternatives are valid choices for our cache use case, because we expect more reads than writes. To find out which is best, I ran a benchmark with our application. The test machine is a Intel Core i7-5820K CPU which has 6 physical cores (12 logical cores with hyper threading). Our application spawns 12 threads that access the cache concurrently. The application is a "loader" that imports data from a database, makes calculations and stores the results into a database. The cache is not under stress 100% of the time. However it is vital enough to show a significant impact on the application's overall runtime.

As benchmark I executed our application with reduced data. To get a good average, I ran each locking strategy three times. Here are the results:



In our use case, StampedLock provides the best performance. While 15% difference to Synchronized and 24% difference to ReadWriteLock may not seem much, it is relevant enough to make the difference between making the nightly batch time frame or not (using full data). I want to stress that by no means this means that StampedLock is *the* best option in all cases. Here is a good article that has more detailed benchmarks for different reader/writer and thread combinations. Nevertheless I believe measuring the actual application is the best approach.

Summary

In Java 8, there are at least three good alternatives to handle locking in a concurrent read-write scenario: Synchronized, ReadWriteLock and StampedLock. Depending on the use case, the choice can make a substantial performance difference. As all three variants are quite simple to implement, it is good practice to measure and compare the performance.

Apr 23, 2016

How to use Docker within intelliJ

A short tutorial on how to use Docker within intelliJ and with a little help from Gradle. You can find the sample code & the full description here: https://github.com/adersberger/intellij-docker-tutorial

Prerequisites

  • install intelliJ Docker plugin (https://plugins.jetbrains.com/plugin/7724)
  • check that there is a default Docker Machine: docker-machine ls. If there is no default machine create one: docker-machine create --driver virtualbox default.
  • start the default Docker Machine: docker-machine start default.
  • bind the env vars to the shell: eval "$(docker-machine env default)"
  • check if everything is correct: docker ps

Using Docker within intelliJ

1) Setup Docker cloud provider in intelliJ global preferences as shown below.


Tip: You can get the API URL by executing docker-machine ls and using the shown IP & port for the defaultmachine.

2) Check connection to Docker daemon in intelliJ "Docker" tab



3) Create new project from version control using github (https://github.com/adersberger/intellij-docker-tutorial.git)

4) Create a new run configuration to deploy application to Docker as shown on the following screenshots:


Tips: 
  • Be sure not to forget to add the Gradle tasks as a "before launch" action as shown at the very bottom of the screenshot.
  • The same is also possible for Docker Compose files. Just point the "Deployment" dropdown to the Docker Compose file.



5) Run the configuration and inspect the Docker container. A browser will open automagically and point to the REST endpoint. Within intelliJ you can access the containers console output, environment variables, port bindings etc.


Links



Mar 18, 2016

Building a Solr-, Spark-, Zookeeper-Cloud with Intel NUC PCs


Part 1 - Hardware

If you work with Cluster- / Grid- or Cloud technologies like Mesos, Spark, Hadoop, Solr Cloud or Kubernetes, as a developer, architect or technical expert, you need your own private datacenter for testing and developing. There are several ways to build such an environment, each with its own drawbacks. To test real world scenarios like a failsafe and resilient Zookeeper cluster or a clustered Spark/Hadoop installation, you should have at least three independent machines. For the installation of Mesos/DCOS it is recommended that you have five machines in minimal setup.
There a several ways to build such an environment, each with it own drawbacks:

1) A virtualized environment running on a workstation laptop or PC

You can easily create a bunch von virtual machines and run them an a desktop or workstation. This approach works fine, is fast and cheap but has some problems:
  1. Your laptop may have only 16 Gigabyte of RAM - so each VM could get only 2-3 Gigabyte. For frameworks like Apache Spark which heavily uses caching this does not work well. 
  2. The performance of a virtualized environment is not predictable. The problem is that some resources like disk, network or memory access are shared between all VMs. So even if you have a workstation with an octa-core Intel Xenon processor, IO will behave different.

2) A cloud environment like the AWS EC2

This is the way most people work with these technologies but has also some specific disadvantages. If you experience any performance problem, you are likely not able to analyze the details. Cluster software is normally very sensitive in terms of network latency and network performance. Since AWS can't guarantee that all your machines are in the same rack, the performance between some nodes can differ.  

3) A datacenter with real hardware

You can build your own cluster but it is normally far too expensive. But even if you can afford real server hardware, you will have the problem that this solution is not portable. In most enterprises, you will not be allowed to run such a cluster. For testing and development it is much better when you have your own private cluster like your own laptop. 


So what is a feasible solution?

I decided to build my own 4 node cluster on Intel NUC mini PCs. Here are the technical facts:
  • NUC 6th Generation - Skylake
  • Intel Core I5 - Dual Core with Hyper-threading 
  • 32 GB DDR4 RAM
  • 256 GB Samsung M.2 SSD
  • Gigabit Ethernet
The Intel NUC has to be equipped with RAM and a M.2 SSD disk. All these parts have to be ordered separately.

This gives you a cluster with amazing capabilities
  • 16 Hyper Threading Units (8 Cores)
  • 128 GB DDR4 RAM
  • 1 TB Solid State Disk
Since I needed a portable solution, everything should be packed into a normal business case.  I found a very slim aluminium attaché case at Amazon with the right dimensions to include the NUC PCs and the network switch.




I decided to include a monitor and a keyboard to get direct access to the first node in the cluster. The monitor is used for visualization and monitoring when the software runs. I ordered a Gechic HDMI monitor which has the right dimensions to include the monitor in front of the case.





The NUC package includes screws for mounting. This also works in such a case when you drill a small hole for each screw. For the internal wiring you have to use flexible network cables. Otherwise you will get problems with the wiring. You also have to have a little talent to mount connectors for power and network in the case, but with a little patience it works. 

You can see the final result here:



This case will be my companion for the next year on all conferences, fairs and even in my office. The perfect presenter for any cluster / cloud technology. 

In the next part I will describe how to get a DCOS, Solr/Spark/Zeppelin Cloud installed and what you can do on top of such a hardware.

Have fun. 

Johannes Weigend

Mar 11, 2016

KubeCon 2016: Recap

All things Kubernetes: KubeCon 2016 in London (https://kubecon.io) revealed how attractive Kubernetes is to the community and how fast Kubernetes and its ecosystem are emerging. Evidence: 500 participants, most of them using Kubernetes in dev & production; impressive stats of the open source project and community; and profound talks reflecting real life experiences. My takeaways:

Kubernetes Roadmap 

By the end of march Kubernetes version 1.2 will be released with the following highlights:
  • New abstraction "Deployment": A deployment groups pod/rc/service definitions with additional deployment metadata. A deployment describes the desired target state of an application on a k8s cluster. When a deployment is applied k8s drives the current cluster state towards the desired state. This is performed on the server side and not on the client side (unlike in k8s < 1.2).
  • ConfigMaps & Secrets: Kubernetes can now handle configuration files & parameters as well as secrets and certificates cluster-wide. It stores the configs inside of etcd and makes it accessible through the k8s API. The configs are exposed by mounting the files into the pods (as tmpfs) and via env vars. They can also be referenced in the YAML files. They are updated live and atomically.
  • Brand new web UI: The Kubernetes Dashboard.
  • Improved scalability and support for multiple regions.
  • Better support for third-party extensions.
  • DaemonSet to better support the Sidekick pattern.
In about 16 weeks there'll be Kubernetes 1.3 with:
  • Better support for legacy applications with mechanisms like IP persistence.
  • Cluster federation (project Ubernetes) to join multiple k8s clusters together.
  • Further improved scalability.
  • Cluster autoscaling (automatically acquiring & releasing resources from the cloud provider).
  • In-Cluster IAM (LDAP / AM integration).
  • Scheduled jobs to better support batch processing on k8s.
  • Public cloud dashboard for Kubernetes-as-a-Service scenarios.
  • ... and more to come / to be discussed in the community.

Hot topics

The hot topics in my opinion were:
  • Higher-level abstractions & tools: Despite Kubernetes is a great advance in bridging the gap between devs and ops, there is the need for higher-level abstractions & tools - especially for the devs (cite: "Kubernetes should be an implementation detail for devs"). This is addressed by k8s itself (deployment abstraction) as well as by different approaches like kdeploy (https://github.com/kubernauts/kploy), Puppet Kubernetes (https://forge.puppetlabs.com/garethr/kubernetes), dgr (https://github.com/blablacar/dgr) or DEIS (http://deis.io). From a high-level point of view the community is putting the bricks on Kubernetes towards PaaS.
  • Continuous Delivery: Kubernetes is an enabler of continuous delivery (CD) and developing cloud native applications on k8s demands for CD. There were some industrial experience reports on using Kubernetes as execution environment for their CD workflows. Kubernetes handles scaling the CI/CD server as well as the application itself. Best practice here is to separate different applications and stages by using k8s namespaces and to use ChatOps tools like Hubot (https://hubot.github.com) to provide fast feedback to the devs & ops.
  • Stateful services: Kubernetes is great in running stateless Microservices. But a lot of applications have to deal with (persistent) state. But how to run stateful services and even databases on Kubernetes without loosing its benefits or even loosing data in case of a re-scheduling? K8S to the rescue! The answer is persistent volumes providing cluster-wide non-ephemeral storage. A couple of different cluster storage providers are available for persistent volumes in k8s: More classic ones like NFS and SCSI; cloud native ones like GlusterFS and Ceph; cloud provider specific ones for GCE and AWS and storage abstraction layers like Flocker. The competition is open!
  • Diagnosability: As applications and infrastructure is getting more and more fine-grained and distributed with platforms like k8s the problem of diagnosing failures and optimization potentials arises. Time for cluster-aware diagnosis tools like sysdig (http://www.sysdig.org), Scope (https://github.com/weaveworks/scope) and Kubernetes Dashboard (https://github.com/kubernetes/dashboard)! 
Learn more about the Kubernetes and other cloud native technologies on April 21, 2016 at our Cloud Native Night Meetup (RSVP) taking place in Mainz beside the JAX conference (http://www.meetup.com/cloud-native-night). 

Dec 30, 2015

WireSpock - Testing REST service client components with Spock and WireMock

In a previous post I have written about using the Spock framework for the exploratory testing of open source software. In this post I want to showcase a neat technology integration between Spock and the WireMock framework for testing your REST service client components. This is especially useful when testing micro service based architectures, since you want to test the individual service integrations without firing up all the collaborators.

Introducing WireMock

As stated on it's webpage, WireMock is "a web service test double for all occasions". It supports stubbing and mocking of HTTP calls, as well as request verification, record and playback of stubs, fault injection and much more. It actually fires up a small embedded HTTP server, so your code and test interacts with it on the protocol level.

The most convenient way to use WireMock in your test cases is via a JUnit 4.x rule that handles the lifecycle of starting and stopping the mock server before and after each test. There also is a class rule available in case it is sufficient to use the same WireMock instance for the lifetime of the whole test case. The official documentation for the rule can be found here.

The good thing is that you can use the WireMock rule in your Spock specification just like in an ordinary JUnit based test. No magic here. Let's have a look at the following example.
class BookServiceClientSpec extends Specification {

    @Rule
    WireMockRule wireMockRule = new WireMockRule(18080)

    @Shared
    def client = new BookServiceClient("http://localhost:18080")

    def "Find all books using a WireMock stub server"() {
        given: "a stubbed GET request for all books"
        // TODO

        when: "we invoke the REST client to find all books"
        def books = client.findAll()

        then: "we expect two books to be found"
        books.size() == 2

        and: "the mock to be invoked exactly once"
        // TODO
    }
}
First, the JUnit WireMock rule is created and initialized to listen on port 18080. Next the REST client component under test is created and configured to access the local wire mock server. The test method itself does not do much yet. For it to work we need to stub the response for the findAll() query and we want to check that the mock has been invoked once. Before we continue, let's have a look at the test dependencies required to compile and run the example.
dependencies {
    testCompile 'junit:junit:4.12'
    testCompile 'org.spockframework:spock-core:1.0-groovy-2.4'

    testCompile 'com.github.tomakehurst:wiremock:1.57'
    testCompile 'com.github.tomjankes:wiremock-groovy:0.2.0'
}

Making WireMock Groovy

The last dependency is a small Groovy binding library for WireMock that plays together nicely with Spock. It allows for a more concise stubbing and verification syntax instead of using WireMock's default static imports API. Have a look at the following example to get the idea.
def wireMock = new WireMockGroovy(18080)

def "Find all books using a WireMock stub server"() {
    given: "a stubbed GET request for all books"
    wireMock.stub {
        request {
            method "GET"
            url "/book"
        }
        response {
            status 200
            body """[
                      {"title": "Book 1", "isbn": "4711"},
                      {"title": "Book 2", "isbn": "4712"}
                    ]
                 """
            headers { "Content-Type" "application/json" }
        }
    }

    when: "we invoke the REST client to find all books"
    def books = client.findAll()

    then: "we expect two books to be found"
    books.size() == 2

    and: "the mock to be invoked exactly once"
    1 == wireMock.count {
        method "GET"
        url "/book"
    }
}
First, we create the WireMock Groovy binding to create stubbed requests and responses. The stub closure takes the definitions of the REST request and response using the WireMock JSON API. As you can see we can even specify the response body as inline JSON multiline GString. Finally, we check that the invocation count for the expected request is correct.

Clearly, specifying the responses inline is not very maintenance friendly especially for large response structures. So a better alternative is to externalize the response body in a separate file. The file needs to be located in a directory named __files within src/test/resources.
The bodyFileName value is relative to the __files directory and contain any content. You could even return binary files like JPEGs using this mechanism.
response {
    status 200
    bodyFileName "books.json"
    headers { "Content-Type" "application/json" }
}
A further way of specifying the response body is by using plain Java or Groovy objects that get serialized to JSON automatically.
response {
    status 200
    jsonBody new Book(title: "WireSpock Showcase", isbn: "4713")
    headers { "Content-Type" "application/json" }
}
The stubbing capabilities of WireMock are quite powerful. You can perform different matchings on the URL, request headers, query parameters or the request body to determine the correct response. Have a look at the WireMock stubbing documentation for a complete description of all features.

So there is only one thing left to say: Test long and prosper with Spock!

References