Compliance Death Curve [Working Session 1]

The compliance death curve is something I’ve been working on as an evolving concept that tries to explain how companies fight compliance governance and standardization efforts, something that is critical to platform team and infrastructure operations.

Today we try to decompose some of the mathematics that I’ve been using into more universal, more easily understood components. We built a compliance flywheel that I found really fascinating which you can see an example of that work in our podcast description.

It could also be helpful to check out my previously recorded compliance death curve talk that has been released.

Resources:
www.youtube.com/watch?v=4RUKsakKZI0

Transcript: otter.ai/u/k9q5ZZ81Hm-EAAtfkV…?utm_source=copy_url

API Consumption [TechOps 003]

TechOps series episode 3 covers how to automate against API’s. We discuss exactly the ways in which you can use API’s effectively, and ways you can run into trouble. We also discuss how we should be consuming API’s, both as a consumer but also in times when we have produced API’s. Many ideas discussed were pulled from learning how people consume our API’s and what we can do to help make them better and safer.

Enjoy this broader TechOps series where we are diving in deep in tips and techniques that improve your journey as an Automator.

otter.ai/u/5akxcG83FBS1m9PBUn…?utm_source=copy_url
Image by Dall-E

Platform Engineering on API Abstractions

https://soundcloud.com/user-410091210/pt1-devops-ll-230124

Our mini episode today is a short discussion of API delineation and abstractions for platform engineering.

This was a short intro discussion, and it is especially interesting because platform is a major topic we will be exploring in the coming year. We highlight the challenges of finding the right abstraction points as well as building front end and back end automation.

Transcript: otter.ai/u/5gzoEliQ5H7N5LnGSP6sdOFNjv8
Image: www.pexels.com/photo/white-paper…te-table-7897470/

Platform Engineering Makes You Angry?

Platform engineering is a topic that seems to be generating a lot of interest going into 2023. It’s sure to be one of those things that enterprises spend a lot of time arguing about and telling each other that they’re doing it wrong.

In this podcast, we dissect why platform engineering seems to be so controversial, and what we can do to help make it more understandable.

We break it down into DevOps components, team components, Dev components, operations components, and ultimately talk about long term trajectories of how all this stuff is going.

Image:www.pexels.com/photo/person-skat…ard-ramp-1527241/
Transcript: otter.ai/u/SAAMNdHZh9lEeHrBxwcWmUxuDhs

Cloud2030 DevOps Platform Engineering Automation Cloud Infrastructure IaC SRE

Rob’s Hot Take:

In the December 13th DevOps Lunch and Learn on the Cloud 2030 podcast, Rob Hirschfeld explores the concept of platform engineering emerging from enterprises grappling with the challenges of enabling developers while rationalizing operations. The discussion introduces the idea of operational entropy or infrastructure entropy, emphasizing how platform engineering teams can effectively manage the constant changes, security vulnerabilities, and evolving environments, relieving developers of this burden. By shifting entropy management to a shared and collaborative task, platform engineering teams have the potential to enhance how they function, offering opportunities for improvement across the industry. For those intrigued by these discussions, the full episode is available at the2030.cloud, inviting participation in ongoing conversations.

Events And Monitoring [bonus Complexity chat]

How do you build GitOps, infrastructure and systems relying on events and monitoring, when you need to revert to a polling loop, or augment a polling loop with an event system?

Today, we drill into concrete technical details about events and monitoring. We also suggest practical functional advice on how Git Ops works, how systems work, and how you can build a resilient system.

Stick around for a bonus at the end of the discussion, where we talk a little bit about complexity!

Image: www.pexels.com/photo/green-and-b…ug-on-air-905905/
Transcript: otter.ai/u/udK3y3upQMszo2IVtbrdGigmehE

Rob’s Hot Take:

In the July 26th DevOps Lunch and Learn episode, Rob Hirschfeld delves into the intricacies of monitoring and events, highlighting the importance of eventing systems for scalability. The discussion explores the intersection between building a resilient standalone system using polling and enhancing responsiveness through eventing to create a comprehensive and adaptable solution. The key takeaway emphasizes the need for systems that can effectively integrate both polling and eventing to ensure durability and improved performance. For a detailed exploration of these concepts, tune in to the full podcast on monitoring and eventing from July 26th at the2030.cloud.

Improving Automation Safety

Making automation safe is essential to making it usable at scale. How do we make automation safe? We found a lot of great insights drawing from space craft design, aircraft, aircraft design and other systems where safety is super important.

Automation is a force multiplier. If we don’t factor in safety when we build it,then we could create a lot of harm in systems from wasteful spending to actual injury. These designs have very real implications.

Transcript: otter.ai/u/p9w4aKOqm3rpHhbDtRTaLgN3GIA
Image: www.pexels.com/photo/toddler-usi…-on-road-1642055/

Rob’s Hot Take:

In the Cloud 2030 Podcast on March 15th, Rob Hirschfeld underscores the critical importance of automation safety in system design. Emphasizing the need for thorough testing, he discusses how safety, especially in complex systems like airplanes and spacecraft, requires continuous testing and monitoring. The conversation delves into the significance of not just completing tasks but also exercising and testing systems in various scenarios to ensure their safety. To explore these insights further, listen to the full episode on March 15th at the2030.cloud and participate in the ongoing discussions.

What is Platform Engineering?

What is platform engineering? And why is it necessary and how to make it work compared to DevOps.

In this conversation, we really hit on the challenges of creating automation teams for building automation in scalable ways. Frustratingly, we never really came up with a particularly good answer to “what is a platform team” and why you should care. Strangely, your organization is probably building one.

Transcript otter.ai/u/zJeQbqXIyD8kZUxfKQdvQAfQGog
Image: www.pexels.com/photo/building-co…chnology-9617733/

Rob’s Hot Take:

Rob Hirschfeld, CEO and co-founder of RackN and host of the Cloud 2030 Podcast, reflects on the November 9th DevOps Lunch and Learn session focused on platform engineering. He highlights the challenge of executing platform engineering initiatives despite the straightforward concept of improving automation and tooling at an architectural level. Hirschfeld emphasizes the importance of defining success metrics, empowering teams to enforce standards, and adopting consistent, repeatable patterns and practices to advance the industry’s maturity. He encourages listeners to explore the insightful discussion at the2030.cloud for a deeper understanding of platform engineering’s significance.

RackN Ends DevOps Gridlock in Data Center [Press Release]

Today we announced the availability of Digital Rebar Provision, the industry’s first cloud-native physical provisioning utility. We’ve had this in the Digital Rebar community for a few weeks before offering support and response has been great!

DR Provision By releasing their API-driven provisioning tool as a stand-alone component of the larger Digital Rebar suite, RackN helps DevOps teams break automation bottlenecks in their legacy data centers without disrupting current operations. The stand-alone open utility can be deployed in under 5 minutes and fits into any data center design. RackN also announced a $1,000 starter support and consulting package to further accelerate transition from tools like Cobbler, MaaS or Stacki to the new Golang utility.

“We were seeing SREs suffering from high job turnover,” said Rob Hirschfeld, RackN founder and CEO. “When their integration plans get gridlocked by legacy tooling they quickly either lose patience or political capital. Digital Rebar Provision replaces the legacy tools without process disruption so that everyone can find shared wins early in large SRE initiatives.”

The first cloud-native physical provisioning utility

Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols and firmware requirements that are difficult to disrupt. The heart of the system is a fickle combination of specific DHCP options, a firmware bootstrap environment (known as PXE), a very lightweight file transfer protocol (TFTP) and operating system specific templating tools like preseed and kickstart. Getting all these pieces to work together with updated APIs without breaking legacy support has been elusive.

By rethinking physical ops in cloud-native terms, RackN has managed to distill out a powerful provisioning tool for DevOps and SRE minded operators who need robust API/CLI, Day 2 Ops, security and control as primary design requirements. By bootstrapping foundational automation with Digital Rebar Provision, DevOps teams lay a foundation for data center operations that improves collaboration between operators and SRE teams: operators enjoy additional control and reuse and SREs get a doorway into building a fully automated process.

A pragmatic path without burning downing the data center

“I’m excited to see RackN providing a pragmatic path from physical boot to provisioning without having to start over and rebuild my data center to get there.” said Dave McCrory, an early cloud and data gravity innovator. “It’s time for the industry to stop splitting physical and cloud IT processes because snowflaked, manual processes slow everyone down. I can’t imagine an easier on-ramp than Digital Rebar Provision”

The RackN Digital Rebar is making it easy for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility. Interested users can evaluate the service in minutes on a laptop or engage with RackN for a more comprehensive trail with expert support. The open Provision service works both independently and as part of Digital Rebar’s full life-cycle hybrid control.

See specific features at http://rackn.com/provision/drsa.

Want help starting on this journey? Contact us and we can help.

How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

CaaPuccino: A frothy mix of containers and platforms.

Check out Krish Subramanian’s (@krishnan) Modern Enterprise podcast (audio here) today for a surprisingly deep and thoughtful discussion about how frothy new technologies are impacting Modern Enterprise IT. Of course, we also take some time to throw some fire bombs at the end. You can use my notes below to jump to your favorite topics.

The key takeaways are that portability is hard and we’re still working out the impact of container architecture.

The benefit of the longer interview is that we really dig into the reasons why portability is hard and discuss ways to improve it. My personal SRE posts and those on the RackN blog describe operational processes that improve portability. These are real concerns for all IT organizations because mixed and hybrid models are a fact of life.

If you are not actively making automation that works against multiple infrastructures then you are building technical debt.

Of course, if you just want the snark, then jump forward to 24:00 minutes in where we talk future of Kubernetes, OpenStack and the inverted intersection of the projects.

Krish, thanks for the great discussion!

Rob’s Podcast Notes (39 minutes)

2:37: Rob intros about Digital Rebar & RackN

4:50: Why our Kubernetes is JUST UPSTREAM

5:35: Where are we going in 5 years > why Rob believes in Hybrid

Should not be 1 vendor who owns everything
That’s why we work for portability
Public cloud vision: you should stop caring about infrastructure
Coming to an age when infrastructure can be completely automated
Developer rebellion against infrastructure

8:36: Krish believes that Public cloud will be more decentralized

Public cloud should be part of everyone’s IT plan
It should not be the ONLY thig

9:25: Docker helps create portability, what else creates portability? Will there be a standard

Containers are a huge change, but it’s not just packaging
Smaller units of work is important for portability
Container schedulers & PaaS are very opinionated, that’s what creates portability
Deeper into infrastructure loses portability (RackN helps)
Rob predicts that Lambda and Serverless creates portability too

11:38: Are new standards emerging?

Some APIs become dominate and create de facto APIs
Embedded assumptions break portability – that’s what makes automation fragile
Rob explains why we inject configuration to abstract infrastructure
RackN works to inject attributes instead of allowing scripts to assume settings
For example, networking assumptions break portability
Platforms force people to give up configuration in ways that break portability

14:50: Why did Platform as a Service not take off?

Rob defends PaaS – thinks that it has accomplished a lot
Challenge of PaaS is that it’s very restrictive by design
Calls out Andrew Clay Shafer’s “don’t call it a PaaS” position
Containers provide a less restrictive approach with more options.

17:00: What’s the impact on Enterprise? How are developers being impacted?

Service Orientation is a very important thing to consider
Encapsulation from services is very valuable
Companies don’t own all their IT services any more – it’s not monolithic
IT Service Orientation aligns with Business Processes
Rob says the API economy is a big deal
In machine learning, a business’ data may be more valuable than their product

19:30: Services impact?

Service’s have a business imperative
We’re not ready for all the impacts of a service orientation
Challenge is to mix configuration and services
Magic of Digital Rebar is that it can mix orchestration of both

22:00: We are having issues with simple, how are we going to scale up?

Barriers are very low right now

22:30: Will Kubernetes help us solve governance issues?

Kubernetes is doing a go building an ecosystem
Smart to focus on just being Kubernetes
It will be chaotic as the core is worked out

24:00: Do you think Kubernetes is going in the right direction?

Rob is bullish for Kubernetes to be the dominant platform because it’s narrow and specific
Google has the right balance of control
Kubernetes really is not that complex for what it does
Mesos is also good but harder to understand for users
Swarm is simple but harder to extend for an ecosystem
Kubernetes is a threat to Amazon because it creates portability and ecosystem outside of their platform
Rob thinking that Kubernetes could create platform services that compete with AWS services like RDS.
It’s likely to level the field, not create a Google advantage

27:00: How does Kubernetes fit into the Digital Rebar picture?

We think of Kubernetes as a great infrastructure abstraction that creates portability
We believe there’s a missing underlay that cannot abstract the infrastructure – that’s what we do.
OpenStack deployments broken because every data center is custom and different – vendors create a lot of consulting without solving the problem
RackN is creating composability UNDER Kubernetes so that those infrastructure differences do not break operation automation
Kubernetes does not have the constructs in the abstraction to solve the infrastructure problem, that’s a different problem that should not be added into the APIs
Digital Rebar can also then use the Kubernetes abstractions?

30:20: Can OpenStack really be managed/run on top of Kubernetes? That seems complex!

There is a MESS in the message of Kubernetes under OpenStack because it sends the message that Kubernetes is better at managing application than OpenStack
Since OpenStack is just an application and Kubernetes is a good way to manage applications
When OpenStack is already in containers, we can use Kubernetes to do that in a logical way
“I’m super impressed with how it’s working” using OpenStack Helm Packs (still needs work)
Physical environment still has to be injected into the OpenStack on Kubernetes environment

35:05 Does OpenStack have a future?

Yes! But it’s not the big “data center operating system” future that we expected in 2010. Rob thinks it a good VM management platform.
Rob provides the same caution for Kubernetes. It will work where the abstractions add value but data centers are complex hybrid beasts
Don’t “square peg a data center round hole” – find the best fit
OpenStack should have focused on the things it does well – it has a huge appetite for solving too many problems.

April 21 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

DigitalRebar Provision deploy Docker’s LinuxKit Kubernetes

_____________

Install Digital Rebar PXE Provision on a Mac OSX System and Test Boot using Virtual Box

_____________

Packet Pushers 333 Automation & Orchestration in Networking
http://packetpushers.net/podcast/podcasts/show-333-orchestration-vs-automation/

While the discussion is all about NETWORK DevOps, they do a good job of decrying WHY current state of system orchestration is so sad – in a word: heterogeneity. It’s not going away because the alternative is lock-in. They also do a good job of describing the difference between automation and orchestration; however, I think there’s a middle tier of resource “scheduling” that better describes OpenStack and Kubernetes.

Around 5:00 minutes into the podcast, they effectively describe the composable design of Digital Rebar and the rationale for the way that we’ve abstracted interfaces for automation. If you guys really do want to cash in by consulting with it (at 10 minutes), just contact Rob H.
_____________

Digital Magazine Launch: Increment On-Call
https://increment.com/on-call/

Increment is dedicated to covering how teams build and operate software systems at scale, one issue at a time. In this, our inaugural issue, we focus on industry best practices around on-call and incident response.
_____________

Need PXW? Try out this Cobbler Replacement
https://robhirschfeld.com/2017/04/11/provision-preview/

INTRO
We wanted to make open basic provisioning API-driven, secure, scalable and fast. So we carved out the Provision & DHCP services as a stand alone unit from the larger open Digital Rebar project. While this Golang service lacks orchestration, this complete service is part of Digital Rebar infrastructure and supports the discovery boot process, templating, security and extensive image library (Linux, ESX, Windows, … ) from the main project.

TL;DR: FIVE MINUTES TO REPLACE COBBLER? YES.

The project APIs and CLIs are complete for all provisioning functions with good Swagger definitions and docs. After all, it’s third generation capability from the Digital Rebar project. The integrated UX is still evolving.
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

DevOpsDays Austin : May 4-5, 2017 in Austin TX

CloudNative vs SRE vs DevOps: The Ultimate Server Cage Match
Not Actually a DevOps Talk with Michael Cote (May 4 at 4:50pm)

OpenStack Summit : May 8 – 11, 2017 in Boston, MA

OpenStack and Kubernetes. Combining the best of both worlds – Kubernetes Day

Interop ITX : May 15 – 19, 2017 in Las Vegas, NV

Open Source IT Summit – Tuesday, May 16, 9:00 – 5:00pm : Rob Hirschfeld to speak

Gluecon : May 24 – 25, 2017 in Denver, CO

Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #68