Monitoring and Managing Istio Service Mesh
As application architectures evolve from monoliths to microservices, two areas that I have been working on a considerable amount of time is around Observability and Service Mesh. Speaking with customers and developers the main challenge that I’ve heard is around managing the many dependencies and communication pathways that could potentially exist between them. These can get pretty difficult to govern and observe as they grow in scale. To address the first issue is where Service Mesh comes into the picture. I’ll provide a brief introduction to Istio below and I will provide some blog links that you can refer to for more details.
On the topic of Observability, we often find that SRE teams often need to refer to more than more console or solution stack to get the visibility they need. We’ve seen instances where teams are using a combination of Promethus for metrics, Zipkin for tracking, and a set of infrastructure monitoring tools thrown in to get a view into the stack. The context switching and playbooks that SRE’s need to build and manage to keep track of all of these systems becomes extremely cumbersome over time. In addition to this, maintaining systems and infrastructure for these monitoring tools becomes a challenge over time.
In this blog, we’ll explore the following:
- What is Istio Service Mesh and resources to help you learn more?
- Monitoring Applications that are using Istio Service Mesh
What is Istio Service Mesh?
Istio is an open source service mesh project that makes it easy to secure, configure and monitor the services that make up the application. With Service Mesh and Istio, we have abstracted the non-differentiated services (think networking, security, load balancing, etc.) to sidecar proxies that are delivering these services. This layer is what ultimately takes care of the service-to-service communication, observability and resiliency elements.
With Istio as part of your application, you can collect metrics that will help with understanding both - the application-centric metrics and also how the service mesh itself is performing. This Service Mesh layer although doesn’t come with out of the box tools for visualization, monitoring have open-source projects that aid in these areas. You might have also heard of something called Envoy in this space. So how does that relate here? Envoy was originally built by Lyft, and it is a high-performing proxy that is used as the forwarding engine for all services in the service mesh. It is deployed as a sidecar to the service in the Kubernetes pod. To put it simplistically - Istio represents the control plane functionality whereas the Envoy proxy represents the data-plane components.
Istio and Envoy are ultimately building blocks which have to be combined with metrics platforms that provide a holistic look into the service mesh and beyond. Adopting this methodology is a deviation from traditional monitoring approaches, but this also presents an opportunity to dive into a more full-stack approach to observability.
Before we dive into the visualization topic, I’d recommend some further reading on this topic since we barely scratched the surface here. Below are some links for reference :
http://www.routetocloud.com/2019/01/the-service-mesh-mystery/
http://cloud-abstract.com/service-mesh-is-just-another-form-of-virtualization/
And of course the official documentation: https://istio.io/
Monitoring Applications that are using Istio Service Mesh
As I mentioned above, providing a singular view into infrastructure, application level metrics are essential for microservices operations. To that end, we require a platform that is able to ingest, analyze and visualize metrics, histograms and tracing capabilities for modern applications.
In this blog, we will be leveraging Wavefront by VMware to look at an application that is leverage Google Kubernetes Engine and is using Istio Service Mesh. One of the best-kept secrets about Wavefront is that there are several hundred out of the box integrations. This includes all major cloud providers (AWS, Azure, GCP), application frameworks and also has adapters for Istio.
We’re going to be following the steps that are outlined here in the Google Cloud CodeLabs series for Istio. https://codelabs.developers.google.com/codelabs/cloud-hello-istio/index.html?index=..%2F..index#0
The only change that I chose to make was in Step 5. Instead of using Istio 1.0.0, I chose Istio 1.1.2 which was the latest at the time of writing. Please check the latest version if you chose the use the same steps.
The steps that are outlined will deploy one of the most widely used applications - BookInfo. Details on the Bookinfo app and the different services is available here: https://codelabs.developers.google.com/codelabs/cloud-hello-istio/index.html?index=..%2F..index#4
To confirm that we have all the services, pods running you can see the output below :
Something that caught me by surprise was how many resources this little application with Istio actually deploys. If you run the command below, you’ll see a full list of all the resources including the istio components, kube-system, etc. Just keep an eye on the $$ :)
Once the application is deployed, we’ll primarily look at three items here:
-
How to setup Google Compute integration in Wavefront
-
Gather the Kubernetes Metrics
-
Use the Wavefront adapter for Istio
-
Setting up Google Compute Integration.
Simply follow the setup instructions in the GCP tile under Integrations. There are a few steps that are required in the GCP dashboard specifically to create a service account and secondly to generate a private key. Once you have those details, go through the setup wizard. It takes ~5 minutes post setup for the metrics to start streaming in from GCP.
Once you have the GCP integration setup, we’re ready to start leveraging one of the pre-created GCP Dashboards. As you can see we have a number of options to choose from, the one that interests me the most at this time is the Google Container Engine one that helps monitor GKE metrics.
The GKE Dashboard provides complete visibility into each level of GKE cluster including cluster, node, namespace, pods. As you can see, this really was as easy as it can get. You can get pretty granular into each area such as memory, CPU utilization and also look at the health of the underlying instances. Few screenshots below will help address these specific areas.
- Gather the Kubernetes Metrics
In order to gather Kubernetes metrics from GKE, we’ll be leveraging the new Wavefront Kubernetes Collector to grab detailed resource metrics about the containers, namespaces, pods and send them directly to the Wavefront service. We do have the option to send metrics to a proxy as well. We’re going to be also collecting kube-state metrics using the kube-state-metrics service. You can find detailed installation instructions here: https://github.com/wavefrontHQ/wavefront-kubernetes-collector
The TL;DR version is to clone the repo, and we have to set two parameters for direct ingestion. We provide details of the Wavefront instance and API token in the collector-deployment yaml file.
Once you’ve done that, deploy the yaml manifest. If you do not see metrics in the Kubernetes Dashboard, its good to check the logs from collector and proxy pods.
There are some out of the box Kubernetes dashboards that are readily available as you would expect. These are primarily to monitor Kubernetes metrics across all namespaces, you can filter down by namespaces, monitor kube-state metrics and also monitor the Wavefront Collector metrics
The Kubernetes metrics dashboard provides cluster level metrics which include nodes, namespaces, pods and container metrics.
Personally, I find myself more often in the Kube-state metrics dashboard to look at detailed metrics on the cluster such as deployment, node condition, how many containers per-pod. I’ve often found it useful to check on restarts per container to find containers that are stuck or restarting state. I find this Dashboard to be extremely useful to cloud operators who can have a holistic view of the cluster and infrastructure metrics around it.
- Use the Wavefront adapter for Istio
In the last section of this blog, we’ll look at how to set up the Wavefront Adapter for Istio to send metrics to the service either through the proxy or direct ingestion. In this example like before, we’re showing the example of direct ingestion, so we’re not going through the proxy installation. The preferred way to deploy the adapter is by using the Helm chart. If you prefer to use standard kubectl to manually deploy, that is totally fine as well. The steps for either approach is well-documented and outlined here: https://github.com/vmware/wavefront-adapter-for-istio/tree/master/install/wavefront#quick-start
As before, the two main things are the server instance name and API token. There are attributes which we should set to help identify the source cluster for example. Here I’ve set that to - hello-istio
Once you’ve gone through the deployment, head over the Istio dashboards to explore the Istio dashboard and also Istio Adapter dashboard that helps monitor the internal metrics.
The Istio dashboard provides real-time visibility into key metrics such as request/response and operational metrics.
Once we’ve collected Istio metrics, we can very easily create a dashboard across the board to grab relevant metrics from GCP platform metrics, Kubernetes metrics to Istio to visualize end-to-end how the requests are being made/services, network activity and details of resource consumption both from an application and infrastructure perspective.
Wavefront also provides scalable Distributed Tracking for Istio and Envoy. By using a few commands we can have the traces re-directed from Istio to Wavefront. Once they are re-directed, Wavefront can also provide a singular view into tracing as well. You can find more details on Distributed Tracing in the blog here; https://www.wavefront.com/scalable-distributed-tracing-for-istio-envoy-service-mesh/
Summary
In summary, Istio and Service Mesh certainly is helping offload the services to dedicated components that are responsible to deliver the services, while the business logic and code from the developers are decoupled. This also allows us to start collecting fine-grained metrics, dynamically modify routing flows without interfering with the pod software. This is very much in line with how Wavefront approaches the cloud-native monitoring space, with its minimalistic approach to instrumenting the application for the metrics.
Have you tried Istio/Service Mesh and what other topics would you like to see covered? Drop us a line below. Till next time…
Prabhu Barathi Native Cloud Advocate | VMware