The compliance death curve is something I’ve been working on as an evolving concept that tries to explain how companies fight compliance governance and standardization efforts, something that is critical to platform team and infrastructure operations.
Today we try to decompose some of the mathematics that I’ve been using into more universal, more easily understood components. We built a compliance flywheel that I found really fascinating which you can see an example of that work in our podcast description.
It could also be helpful to check out my previously recorded compliance death curve talk that has been released.
TechOps series episode 3 covers how to automate against API’s. We discuss exactly the ways in which you can use API’s effectively, and ways you can run into trouble. We also discuss how we should be consuming API’s, both as a consumer but also in times when we have produced API’s. Many ideas discussed were pulled from learning how people consume our API’s and what we can do to help make them better and safer.
Enjoy this broader TechOps series where we are diving in deep in tips and techniques that improve your journey as an Automator.
Our mini episode today is a short discussion of API delineation and abstractions for platform engineering.
This was a short intro discussion, and it is especially interesting because platform is a major topic we will be exploring in the coming year. We highlight the challenges of finding the right abstraction points as well as building front end and back end automation.
Platform engineering is a topic that seems to be generating a lot of interest going into 2023. It’s sure to be one of those things that enterprises spend a lot of time arguing about and telling each other that they’re doing it wrong.
In this podcast, we dissect why platform engineering seems to be so controversial, and what we can do to help make it more understandable.
We break it down into DevOps components, team components, Dev components, operations components, and ultimately talk about long term trajectories of how all this stuff is going.
In the December 13th DevOps Lunch and Learn on the Cloud 2030 podcast, Rob Hirschfeld explores the concept of platform engineering emerging from enterprises grappling with the challenges of enabling developers while rationalizing operations. The discussion introduces the idea of operational entropy or infrastructure entropy, emphasizing how platform engineering teams can effectively manage the constant changes, security vulnerabilities, and evolving environments, relieving developers of this burden. By shifting entropy management to a shared and collaborative task, platform engineering teams have the potential to enhance how they function, offering opportunities for improvement across the industry. For those intrigued by these discussions, the full episode is available at the2030.cloud, inviting participation in ongoing conversations.
How do you build GitOps, infrastructure and systems relying on events and monitoring, when you need to revert to a polling loop, or augment a polling loop with an event system?
Today, we drill into concrete technical details about events and monitoring. We also suggest practical functional advice on how Git Ops works, how systems work, and how you can build a resilient system.
Stick around for a bonus at the end of the discussion, where we talk a little bit about complexity!
In the July 26th DevOps Lunch and Learn episode, Rob Hirschfeld delves into the intricacies of monitoring and events, highlighting the importance of eventing systems for scalability. The discussion explores the intersection between building a resilient standalone system using polling and enhancing responsiveness through eventing to create a comprehensive and adaptable solution. The key takeaway emphasizes the need for systems that can effectively integrate both polling and eventing to ensure durability and improved performance. For a detailed exploration of these concepts, tune in to the full podcast on monitoring and eventing from July 26th at the2030.cloud.
Making automation safe is essential to making it usable at scale. How do we make automation safe? We found a lot of great insights drawing from space craft design, aircraft, aircraft design and other systems where safety is super important.
Automation is a force multiplier. If we don’t factor in safety when we build it,then we could create a lot of harm in systems from wasteful spending to actual injury. These designs have very real implications.
In the Cloud 2030 Podcast on March 15th, Rob Hirschfeld underscores the critical importance of automation safety in system design. Emphasizing the need for thorough testing, he discusses how safety, especially in complex systems like airplanes and spacecraft, requires continuous testing and monitoring. The conversation delves into the significance of not just completing tasks but also exercising and testing systems in various scenarios to ensure their safety. To explore these insights further, listen to the full episode on March 15th at the2030.cloud and participate in the ongoing discussions.
What is platform engineering? And why is it necessary and how to make it work compared to DevOps.
In this conversation, we really hit on the challenges of creating automation teams for building automation in scalable ways. Frustratingly, we never really came up with a particularly good answer to “what is a platform team” and why you should care. Strangely, your organization is probably building one.
Rob Hirschfeld, CEO and co-founder of RackN and host of the Cloud 2030 Podcast, reflects on the November 9th DevOps Lunch and Learn session focused on platform engineering. He highlights the challenge of executing platform engineering initiatives despite the straightforward concept of improving automation and tooling at an architectural level. Hirschfeld emphasizes the importance of defining success metrics, empowering teams to enforce standards, and adopting consistent, repeatable patterns and practices to advance the industry’s maturity. He encourages listeners to explore the insightful discussion at the2030.cloud for a deeper understanding of platform engineering’s significance.
Today we announced the availability of Digital Rebar Provision, the industry’s first cloud-native physical provisioning utility. We’ve had this in the Digital Rebar community for a few weeks before offering support and response has been great!
By releasing their API-driven provisioning tool as a stand-alone component of the larger Digital Rebar suite, RackN helps DevOps teams break automation bottlenecks in their legacy data centers without disrupting current operations. The stand-alone open utility can be deployed in under 5 minutes and fits into any data center design. RackN also announced a $1,000 starter support and consulting package to further accelerate transition from tools like Cobbler, MaaS or Stacki to the new Golang utility.
“We were seeing SREs suffering from high job turnover,” said Rob Hirschfeld, RackN founder and CEO. “When their integration plans get gridlocked by legacy tooling they quickly either lose patience or political capital. Digital Rebar Provision replaces the legacy tools without process disruption so that everyone can find shared wins early in large SRE initiatives.”
The first cloud-native physical provisioning utility
Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols and firmware requirements that are difficult to disrupt. The heart of the system is a fickle combination of specific DHCP options, a firmware bootstrap environment (known as PXE), a very lightweight file transfer protocol (TFTP) and operating system specific templating tools like preseed and kickstart. Getting all these pieces to work together with updated APIs without breaking legacy support has been elusive.
By rethinking physical ops in cloud-native terms, RackN has managed to distill out a powerful provisioning tool for DevOps and SRE minded operators who need robust API/CLI, Day 2 Ops, security and control as primary design requirements. By bootstrapping foundational automation with Digital Rebar Provision, DevOps teams lay a foundation for data center operations that improves collaboration between operators and SRE teams: operators enjoy additional control and reuse and SREs get a doorway into building a fully automated process.
A pragmatic path without burning downing the data center
“I’m excited to see RackN providing a pragmatic path from physical boot to provisioning without having to start over and rebuild my data center to get there.” said Dave McCrory, an early cloud and data gravity innovator. “It’s time for the industry to stop splitting physical and cloud IT processes because snowflaked, manual processes slow everyone down. I can’t imagine an easier on-ramp than Digital Rebar Provision”
The RackN Digital Rebar is making it easy for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility. Interested users can evaluate the service in minutes on a laptop or engage with RackN for a more comprehensive trail with expert support. The open Provision service works both independently and as part of Digital Rebar’s full life-cycle hybrid control.
The key takeaways are that portability is hard and we’re still working out the impact of container architecture.
The benefit of the longer interview is that we really dig into the reasons why portability is hard and discuss ways to improve it. My personal SRE posts and those on the RackN blog describe operational processes that improve portability. These are real concerns for all IT organizations because mixed and hybrid models are a fact of life.
If you are not actively making automation that works against multiple infrastructures then you are building technical debt.
Of course, if you just want the snark, then jump forward to 24:00 minutes in where we talk future of Kubernetes, OpenStack and the inverted intersection of the projects.
Containers provide a less restrictive approach with more options.
17:00: What’s the impact on Enterprise? How are developers being impacted?
Service Orientation is a very important thing to consider
Encapsulation from services is very valuable
Companies don’t own all their IT services any more – it’s not monolithic
IT Service Orientation aligns with Business Processes
Rob says the API economy is a big deal
In machine learning, a business’ data may be more valuable than their product
19:30: Services impact?
Service’s have a business imperative
We’re not ready for all the impacts of a service orientation
Challenge is to mix configuration and services
Magic of Digital Rebar is that it can mix orchestration of both
22:00: We are having issues with simple, how are we going to scale up?
Barriers are very low right now
22:30: Will Kubernetes help us solve governance issues?
Kubernetes is doing a go building an ecosystem
Smart to focus on just being Kubernetes
It will be chaotic as the core is worked out
24:00: Do you think Kubernetes is going in the right direction?
Rob is bullish for Kubernetes to be the dominant platform because it’s narrow and specific
Google has the right balance of control
Kubernetes really is not that complex for what it does
Mesos is also good but harder to understand for users
Swarm is simple but harder to extend for an ecosystem
Kubernetes is a threat to Amazon because it creates portability and ecosystem outside of their platform
Rob thinking that Kubernetes could create platform services that compete with AWS services like RDS.
It’s likely to level the field, not create a Google advantage
27:00: How does Kubernetes fit into the Digital Rebar picture?
We think of Kubernetes as a great infrastructure abstraction that creates portability
We believe there’s a missing underlay that cannot abstract the infrastructure – that’s what we do.
OpenStack deployments broken because every data center is custom and different – vendors create a lot of consulting without solving the problem
RackN is creating composability UNDER Kubernetes so that those infrastructure differences do not break operation automation
Kubernetes does not have the constructs in the abstraction to solve the infrastructure problem, that’s a different problem that should not be added into the APIs
Digital Rebar can also then use the Kubernetes abstractions?
30:20: Can OpenStack really be managed/run on top of Kubernetes? That seems complex!
There is a MESS in the message of Kubernetes under OpenStack because it sends the message that Kubernetes is better at managing application than OpenStack
Since OpenStack is just an application and Kubernetes is a good way to manage applications
When OpenStack is already in containers, we can use Kubernetes to do that in a logical way
“I’m super impressed with how it’s working” using OpenStack Helm Packs (still needs work)
Physical environment still has to be injected into the OpenStack on Kubernetes environment
35:05 Does OpenStack have a future?
Yes! But it’s not the big “data center operating system” future that we expected in 2010. Rob thinks it a good VM management platform.
Rob provides the same caution for Kubernetes. It will work where the abstractions add value but data centers are complex hybrid beasts
Don’t “square peg a data center round hole” – find the best fit
OpenStack should have focused on the things it does well – it has a huge appetite for solving too many problems.
Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)
While the discussion is all about NETWORK DevOps, they do a good job of decrying WHY current state of system orchestration is so sad – in a word: heterogeneity. It’s not going away because the alternative is lock-in. They also do a good job of describing the difference between automation and orchestration; however, I think there’s a middle tier of resource “scheduling” that better describes OpenStack and Kubernetes.
Around 5:00 minutes into the podcast, they effectively describe the composable design of Digital Rebar and the rationale for the way that we’ve abstracted interfaces for automation. If you guys really do want to cash in by consulting with it (at 10 minutes), just contact Rob H. _____________ Digital Magazine Launch: Increment On-Call https://increment.com/on-call/
Increment is dedicated to covering how teams build and operate software systems at scale, one issue at a time. In this, our inaugural issue, we focus on industry best practices around on-call and incident response. _____________
INTRO We wanted to make open basic provisioning API-driven, secure, scalable and fast. So we carved out the Provision & DHCP services as a stand alone unit from the larger open Digital Rebar project. While this Golang service lacks orchestration, this complete service is part of Digital Rebar infrastructure and supports the discovery boot process, templating, security and extensive image library (Linux, ESX, Windows, … ) from the main project.
TL;DR: FIVE MINUTES TO REPLACE COBBLER? YES.
The project APIs and CLIs are complete for all provisioning functions with good Swagger definitions and docs. After all, it’s third generation capability from the Digital Rebar project. The integrated UX is still evolving. _____________
UPCOMING EVENTS
Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.