The infrastructure drives the app architecture
A cloud native applications is typically designed as a bunch of little components that coordinate with each other over a network. They may use events instead, and while that isn’t the same as point-to-point network communications, it follows the same idea: you have a bunch of indepedent-ish bundles of code that work together, as needed, instead of just one big chunk of code that does all the work. This is, you know, a distributed application. “Message passing” is one of the dreams of object oriented programming and Internet apps.1
Why you use a service mesh
Anyhow, if you’re do all of that, you need a way to manage all that network traffic. Each little bit of code has to know how to contact the other bits of code and work with it - so called “east-west traffic.”2 You need a registry that catalogs all those bits of code. You need to know information about that chunk of code: the version, how to connect to it, how to authenticate with it. You need to somehow make a call over the network, that is, get a network connection. You want it to be secure and encrypted, like, always now-a-days (I don’t really know what mTLS is, but EBC decks are fucking rife with it, so it must be great). And then the people running that network want to manage it: if some chunks of code are too chatty and filling up your series of tubes with too much crap, you want to throttle them. You want to gather metrics about your series of tubes and the messages sent down them. You know: network management. And, when you’re using it with Kubernetes, you want it to all think like and work with Kubernetes: how you configure and deploy it (yaml!), how configuration is rolled out and drift is done. Etc. Etc. (Check out Ivan McPhee’s service mesh overview for a lot more details and the vendors in the space.)
What drives me bonkers about this is that, like, this is what the Internet does. Why don’t we just use Internet primitives to do all of this? Why do we need to layer a whole new network management layer on-top of all the layers. Even more maddening, when you go up the stack into the application layer: the developers there have written all of their own stuff that handles all this functionality. You look at something like the projects in Spring Cloud and they’re, you know, doing all of this too. I’ve started to think that each of these layers happens because the people in the layers above you don’t want to talk with the network admins.
Anyhow, back to service meshes. They are handy! They do important things! For example, help you run your applications across multiple clouds, Kubernetes clusters (is that the right phrasing?), add in customized layers of security, and so forth. Big ol’ enterprises need all of this. I mean, everyone does.
So, what’s up with the whole category of service mesh? Well, Gartner is not so hot on it:
The hype around service mesh software has mostly settled down, and the market has not grown as much as was once anticipated. This raises questions about the usefulness and ROI of service meshes for most organizations. “Market Guide for Service Mesh,” August 2nd, 2023, Gartner.
The report notes that service meshes are used outside of Kubernetes as well. It’s like a whole new marbling of a layer around and inside your existing layers, be they VMs or containers. Yay…? Ivan’s take a little less dire, simply urging taking it slow before choosing which service mesh to use:
Avoid adopting a service mesh based purely on consumer trends, industry hype, or widespread adoption. Instead, take the time to understand the problem you’re trying to solve. Explore the potential tradeoffs in terms of performance and resource consumption. Evaluate your support requirements against your in-house resources and skills (many open-source service meshes rely on community support). Once you’ve created a short list, choose a service mesh—and microservices-based application development partner—that works best with your software stack. Ivan McPhee, GigaOm, August 2023.
Filling in the gaps
When I first head about the notion of a service mesh long ago, my first reaction was basically “wait, I thought Kubernetes already did that?” This was the first in a long series of that reaction over the years. It turns out Kubernetes didn’t do a lot of the things I assumed it did. This was an instance of confusing outcomes with capabilities: for all the praise Kubernetes gets for improving operations and developer productivity, I’d assumed it, like, had those capabilities. But, in fact, many of the outcomes Kubernetes achieves are done by layering in all sorts of other projects, products, and ways of working.3 Ivan’s report does good job cataloging all those capabilities: your eyes can start to glaze over after awhile, so be sure to read the vendor profiles in reverse alphabetical order!
So, you need a service mesh to get all of that basic, distributed app functionality. This is fine! That’s how Kubernetes was designed, whether the overall community over the years treated it as such or not: “platform for building platforms,” “a life of it’s own,” and all that.
That Gartner report identifies a key trend in the ongoing rollout of Kubernetes. People don’t want to pay for things, and this leads to a lot of unplanned for work on their part of integrate all the free components together and deal with them:
The current service mesh market is largely dominated by open-source offerings such as Consul, Istio and Linkerd. However, Gartner client inquiries about service meshes consistently show open-source service meshes suffer from difficulty of use, and a lack of sufficient skills for effective engineering, administration and operational upkeep. The lack of mature DevOps practices can increase the operational burden. These challenges substantially increase as the number of deployed container pods and services grows exponentially, especially in a multicloud environment.
Hey, you get what you pay for. For vendors, this does mean one important product management and strategy decision: you need an easy to download, easy to get up and running, and totally free on-ramp to your paid-for product. I mean: that’s just late 2000’s, open core and early public cloud basics, right?
That Gartner report is good reading if you have access to it.
