Platform engineering problems: can ops actually do product management?
Product management is a core component of platform engineering, so we'd better make sure organizations can introduce that role into the IT department
Are you at a large organization doing platform engineering? Have you been building and/or using a platform? How are you introducing product management in your operations group?
I want to test a theory that’s come up in my conversations a lot this summer: introducing product management into ops and infrastructure organizations is too difficult. It won’t work. There are teams here and there that can do it, and they show up at conferences. But, when you're proposing that you're going to "change the culture" of thousands of large organizations, it's an impossible task. These detractors cite DevOps, even agile: after all these years, have we really done much, or have we just experienced the bad parts of Larman’s Laws?
That sentiment is pretty bleak!
If you're working in one of these large organizations, how do you start up the product management practices and roles for your platform? Is it working? A further filter is: how many apps are you running on your platform, following platform engineering practices?
I really would like to hear from you, even if it's just references to other people talking about. But, if you're working on platform engineering in a large organization, I especially want to hear from you. Hopefully, I can get enough responses to write-up how people are introducing and doing the product management part of platform engineering.
Now, here’s a long explanation of why I’d like to hear from you:
Product management is what makes platform engineering different than "what we're already doing"
I'm focusing on product management because I believe that practicing product management is what separates platform engineering from "what we're already doing." And, over the past year, this feels like it's emerged as the consensus. Once you make product management part of platform engineering, it moves the phrase past being a marketing-driven buzzword that relabels what we already have to focus people on buying new bottle for their old wine.
Product management wasn't always part of platform engineering. A few years ago when the thought leadership around platform engineering started, "platform engineering" just meant putting an internal developer portal in place (Backstage and friends). Then the platform engineering thought leadership train loaded up "making Kubernetes easier to use for developers." This caused a lot of existential angst from us DevOps and PaaS people, especially when Humanitec declared that DevOps was dead. We were all left wondering: how is this different than what we've been talking about for 15+ years? That is: what we’re already doing.
Once the 10+ year old idea of "platform as a product" was reintroduced into the platform conversation, "platform engineering" became a new enough thing that it was worthy of having its own name. It became a thing.
In my world of pre-Kubernetes PaaS, I've worked with large organizations who've been practicing platform as a product for many years, using Cloud Foundry as their platform. You could say that platform engineering is “just” a re-labeling of platform as a product, but I think it’s ended up being more than that. Platform engineering wants to do platform as a product with Kubernetes, not with existing PaaSes.1
Platform engineering focuses too much on Kubernetes
This is the second aspect of platform engineering that I think makes platform engineering a real thing: it means using Kuberntes as the basis for your platform. I don't think this is good, and I’d rather we change it so that it doesn't matter what CaaS/IaaS you use. But, that’s currently platform engineering as she is spoken.
Like I said, for 10+ years, there've been a lot of big enterprises that have used, and continue to use, Cloud Foundry to run thousands upon thousands of real-world applications. And, there's other platforms out there, not to mention all of the VM-based ways of running apps that seems to be the majority of how people run apps. There is always a platform, whether you know it or not. And if you don’t know it, it usually means its an accidental platform and hundreds of them in your organization, which is very much not good.
But, the people building and talking about platforms now want Kubernetes, it seems. They just assume there is nothing else. This has been a 7 year distraction from improving how organizations develop and run software, and a great example of how us tech people get too focused on using new and interesting tools for their own sake. And, in that way, a triumph of thought leadership and devrel.
Over those years, after finally focusing everyone on the PaaS layer, that re-focusing on the CaaS/IaaS layer has sacrificed the never ending task of improving the "business outcomes" of better developer productivity and improving all the -ility's in production. You know: becoming an "elite performer" in DORA terms.2
(Alright. I've tried to write this post a few times and it always goes dark and negative. So I'll stop it with the you kids get off my lawn existential crap.)
So, that's where we are with platform engineering: it's applying product management to building the platform, and building the platform with Kubernetes.
The two platform approaches we'll end up with
There's all sorts of people working on solving the Kubernetes problem(s): that it's complex, you don't want to expose developers to it, and you have to go to the buffet and assemble and then care for your own platform. Kubernetes is not a ready to use out of the box platform. Indeed, when you look at the CNCF cloud native platform definition, Kubernetes doesn’t even show up!
Following historic examples, the problem with building a platform from Kubernetes will resolve itself in two ways:
The Overlay: platform as abstraction layer
There will be a few winners in the "wrap Kubernetes/whatever in layer to hide it from the users" approach. This is what we're trying to do with the App Engine/Spaces framework, and, as I shallowly understand it, it's what Syntasso/Kratix is trying to do. You’re essentially saying “the APIs and config for Kubernetes aren’t tuned for the platform engineer needs, so we’re going to make the ones that are, and do all the glue work integrate that back to Kubernetes.” This is one of the most popular patterns in computering: adding an abstraction layer to make it easier to use the wrapped layer.
This platform building pattern is trying to give users (platform engineers) the ability to customize the platform to their special needs3 while still avoiding building everything from scratch, the “DIY platform.” To use PaaS-talk, this approach allows you to form your own opinions rather than (be forced to) use the opinions of your pre-built platform.
Platform engineers define the "API" of the platform components and can then build the platform out of those components. You can also throw in promise theory, contracts, and some aspects of negotiated platforms and SLOs from SRE-think. Colin Humphreys made a good pitch for this approach recently, and I’ve been dragging my feet on interviewing the Tanzu people who’re working on this approach.
We've seen what this ends up looking, good and bad, like in historic examples, so we can start to think through some long-horizon strategy moves.
DevOps 1.0 was all about creating an abstraction layer for the mess of configuration and release management with Chef, Puppet, Ansible, Salt, etc. Years later, DevOps got bogged down in solving ops tooling and culture problems (which is great!) and rarely got up to the application developer layer - but, the intention was (usually) always there!
Then there's our current example of Kubernetes. The point of Kubernetes was to displace AWS as the standard IaaS model by creating an open standard for how infrastructure was represented, managed, and used. To use the Kubernetes term, to creat a new “API” for IaaS. It worked!
Like DevOps, Kubernetes also stalled out on the way to delivering a better developer experience. And, in fact, the Kubernetes creators eventually backed off from that ever having been a goal.
For both Kubernetes and DevOps, making the “inner loop” (if you remember that term) better was, perhaps, never the point, and thinking otherwise resulted in over-inflated expectations.
I think what the people working on the platform as abstraction layer approach want is what we hoped the Kubernetes API would be: much higher up the stack, even developer facing, and including a governance framework. Essentially, a developer-ready system with all the enterprise grade blah-blah tools.
This is great! It'll be fantastic if it works!
What’s important is that you have to product manage all of this to figuring out (1) what those overlays are, (2) how you customize the overlays for your environment, (3) how you pick and choose which overlays to assemble into a platform, and (4) when you add in new functionality.
The vendor and open source community around the overlay will help with some of that, especially number one and four. The vendor/community will gladly tell you what it thinks the defaults should be and even provide you an out of the box, ready to use enterprise grade blah blah platform (see the next section) based on those overlays.
But, the whole point of the platform as abstraction layer is customizing the platform to the user’s needs, so, like, the user needs to do that. And product management is how you do that.
PaaS is cool again
Since 2007 we've been through several cycles of people trying to build their own Heroku.
It goes like this:
A new platform comes out that makes it easy for developers to build their app, connect the app to a database, etc., deploy the app to production, and scale it for performance needs all on their own, self-service, using the latest frameworks and services.
It only runs in the public cloud.
Large organizations initially reject it for two reasons: (a) it actually will not scale. (b) it needs to be on-premises for very important enterprise reasons. "We want that awesome developer experience and velocity," the large organizations say, "but, uh," looks down at notes, "I was told that we work in a 'highly regulated industry.'"
This brings in phase two of the cycle: vendors make on-premises platforms. Strangely, maybe even heroically, Heroku never entered this phase. But, you saw it with things like the container wars of the 2010's, which drove on-premises platforms like Cloud Foundry.
Our attention goes back down the stack to infrastructure instead of the fully build PaaS. PaaS is no longer cool. The last phase of the cycle is usually caused by a disruptive technology coming along and pulling the user and buyer's attention away form the now boring platform that just works. Docker was the first to disrupt PaaSes in the late 2010s, and then Kubernetes came in. In both cases, the user assumptions was that each was a viable PaaS replacement. But, as capital-D Disruption, in each case, neither was a full-on replacement for all the enterprise grade blah blah. And yet, these less feature-ful disruptors drove organizations away from the PaaSes that they once loved. E.g., Heroku and its children.
I've made a satire out of that cycle and the thousands of highly paid professionals who make those decisions along the way.
But, I mean: I'm not sure why people don't just use Heroku or its many on-premises focused descendants like Cloud Foundry.
There's "fashion." People wanting to use something new. I've heard this sentiment many times recently: "Our PaaS is great, but if I don't learn Kubernetes, it'll be harder to find a new job. So, we need to migrate to Kubernetes." What with enterprise Kubernetes build -out being just at the beginning, that sentiment is optimized for an individual, but not for the organization that already has something that works in place.
Other than "fashion," I think what drives people away from PaaSes is:
Pricing - The first grumblings about Heroku were that it was awesome, but expensive. This has been a common sentiment about all PaaSes. Showing the value of any platform is hard, so it always looks expensive when you start running thousands of applications. All you see is a price, and linking it to the revenue that those thousands of apps drive is not a normal way for traditional organizations to think. They treat IT as a cost center, not a part of the business. So when someone comes and says "we could build that platform ourselves and remove however many millions in licensing/cloud cost," management gets excited.
Difficulty in customizing or "swapping out" components, and lack of new features. The last is what introducing product management into platforms is trying to solve: you're actually supposed to talk with developers and deliver new platform features that solve their problems and make them happier.
In this part of the cycle, the Kubernetes problem is solved by hiding or even removing Kubernetes. It doesn’t matter what CaaS/IaaS is at the bottom of the PaaS. Maybe there’s even two, or three! You might even introduce a totally a new compute, uh, “paradigm.” Maybe serverless will finally fulfill those Wardley-dreams! Unikernal, WASM - whatever! Just a few weeks ago my mind was thoroughly exfoliated in the sauna with the idea of Isolate Cloud. ANYTHING COULD HAPPEN.
The role of product management here is different. If you’re using a PaaS, you’re not given a lot of tools to do all of your own customization. You more rely on the PaaS vendor and community to do much of that work. The overall community does most of the product managering, which mans they need to do a lot of it. It also means you need to upgrade the PaaS when there are new versions. That is a big challenge. People at large organizations don’t like upgrading their stacks.
Maybe one way of looking at it is that a PaaS outsources platform product management. It’s probably more that what you’re product managing is (like the platform as abstraction layer) the selection and assembly of pre-built components.
Without product management, platform engineering will struggle
This is why I've typed this far: the long-term success of platforms relies on product management. Once you stop adding new features to the platform, people look for new options. If you can't customize it to your needs (real or just made up enterprise blah blah), you'll start the platform cycle all over again. This means you don't get the full benefits of many years of platforming. You'll start neglecting what you have, focus on migrating your existing apps to new ones, and experience those ups and downs of benefits achieved when you see surveys of new tech usage. ROI depends on years of payback, and if you restart in the middle of that period, the math doesn't work.
So! That's what’s driving my interest in learning how organizations are introducing and sustaining product management in their platform groups.
How's it going for you?
Logoff
Colophon
Three things go me thinking on this above, which I want to call out:
First, this from Figma’s write-up of migrating to Kubernetes:
Having users define services directly in YAML can be confusing. Instead, we worked to define a golden path for users and allow customization for special cases. By being explicit about what users can and should customize—and otherwise enforcing consistency by default—you’ll save users time and energy while also simplifying maintenance and future changes.
That seems like a pretty compact explanation of the developer-facing goals of platform engineering.
Second, as referenced many times above, Colin’s excellent piece of “let’s make better mistakes this time” for platform engineering enthusiasts. I like that he managed to sneak in a link to an old version of Pivotal paper on platform engineering.
Third, Betty announced that she’s the new CMO at Heroku. When I was looking around at new job opportunities this summer, almost every person told me I should talk to the new people at Heroku. They’re trying to get the band back together, etc. I don’t really know anyone there, and I decided that my current job is just fine. But, it’s exciting to think that Heroku will start doing more in the platform area. I think the timing is right for a public PaaS only play - you’d be cutting out a lot of enterprise customers, but over the next five years, I think all that enterprise blah blah will be less of a barrier. I mean, that’s what all the CIOs are saying: in three years they’re planning to almost double the amount of workloads in public cloud. Maybe they’ll actually do it this time.
While writing this, I came across this huge write-up of platform engineering ahead of releasing a new book on the topic. It’s great! I’m looking forward to that book.
Also, we have our big annual conference next week. I’m thinking we’ll have some options about the above. You can watch some of it as a live stream, or scrounge for coverage and videos later. I’ll write-up anything relevant here. Probably.
There’s something I chopped out of my initial, thankfully thrown away draft of all this: we don’t really talk about “operator experience” in platform engineering. It’s very developer focused. This is fine! But, more than likely (as with DevOps), the story will turn inward to making the tools and lives of platform engineers better. Put another way, we’ve long had the 12 factor app manifesto, but what is the other side of that, the 12 factor platform manifesto?
I realize this distinction is weird and fuzzy. In. my version of things, if Kubernetes hadn’t disrupted the 2010’s PaaS cycle, those PaaSes would have persisted and we’d never have introduced “platform engineering” as a concept. Instead, we’d just go along calling it “platform as a product” (which is circa the late 2010s as estimated by one of the people who worked on it back then). Anyhow, let’s see where this weird fuzzy assertion takes us.
Here, I have the feeling I’m doing some throwing out some baby with the bathwater. I don’t have first hand experience with the baby to comment on it. As I get into, I think Kubernetes did what it set out to do. But, my theory is that the users made a mess out of the bathwater because they thought the baby did more than it actually set out to do. At this point, it doesn’t matter anymore. That’s all yesterday’s shit-posting.
…which I think are mostly made up - "customization" is a synonym for "tech debt.” When you talk with hundreds of enterprises who tell you about their custom needs, you soon learn that everyone has the same custom needs and, thus, they are not custom.