Balancing Architecture and Ease of Use

What is the architectural balance between learning curve, architecture, building things that can scale while acknowledging overhead, and the attitude of just get it done? Don’t make my tools complex and let me be very productive quickly. If it doesn’t scale, then we see this as an ongoing challenge.

Two engineers from RackN led today’s discussion in which we really talked about the balance that we try to achieve at RackN as we design our product, with the understanding that, ultimately, scale really does matter.

If users have trouble understanding how the product works, at first, that learning curve can push people away, so that they never actually get into the product. That’s where finding the right balance is absolutely essential to success.

Transcript: otter.ai/u/DAfKcHVBAiOY5EuReW1krDYsqso
Image: www.pexels.com/photo/anonymous-w…h-outfit-7148032/

Platform Engineering Makes You Angry?

Platform engineering is a topic that seems to be generating a lot of interest going into 2023. It’s sure to be one of those things that enterprises spend a lot of time arguing about and telling each other that they’re doing it wrong.

In this podcast, we dissect why platform engineering seems to be so controversial, and what we can do to help make it more understandable.

We break it down into DevOps components, team components, Dev components, operations components, and ultimately talk about long term trajectories of how all this stuff is going.

Image:www.pexels.com/photo/person-skat…ard-ramp-1527241/
Transcript: otter.ai/u/SAAMNdHZh9lEeHrBxwcWmUxuDhs

Cloud2030DevOpsPlatform EngineeringAutomationCloudInfrastructureIaCSRE

Rob’s Hot Take:

In the December 13th DevOps Lunch and Learn on the Cloud 2030 podcast, Rob Hirschfeld explores the concept of platform engineering emerging from enterprises grappling with the challenges of enabling developers while rationalizing operations. The discussion introduces the idea of operational entropy or infrastructure entropy, emphasizing how platform engineering teams can effectively manage the constant changes, security vulnerabilities, and evolving environments, relieving developers of this burden. By shifting entropy management to a shared and collaborative task, platform engineering teams have the potential to enhance how they function, offering opportunities for improvement across the industry. For those intrigued by these discussions, the full episode is available at the2030.cloud, inviting participation in ongoing conversations.

Are Platform Teams Good?

How do you build effective, productive platform teams? What should their mission be, and what type of tools and dangers do they have? 

We start by questioning if there are such things as platform teams and their roles, as well as how they can go awry in modern organizations. 

At the end, we recognize that they do and can provide a very important role. In this conversation, you will learn the right ways to form a platform.

Transcript: https://otter.ai/u/Kf-Hi9H6bTmhufavGaa1ae9w_R4

Image: https://www.pexels.com/photo/man-beside-woman-in-train-1970830/

Rob’s Hot Take:

In the August 23rd DevOps Lunch and Learn, Rob Hirschfeld discusses the evolving concept of platform teams as centers of excellence for corporate governance and controls in IT and operational environments. He notes the current diversity in approaches to solving problems in infrastructure but anticipates a consolidation phase where standardization becomes more prominent. Hirschfeld emphasizes the cyclical nature of IT innovation, suggesting that platform teams will play a crucial role in advocating for standards, processes, and best practices, ultimately contributing to the industry’s progress.

Events And Monitoring [bonus Complexity chat]

How do you build GitOps, infrastructure and systems relying on events and monitoring, when you need to revert to a polling loop, or augment a polling loop with an event system?

Today, we drill into concrete technical details about events and monitoring. We also suggest practical functional advice on how Git Ops works, how systems work, and how you can build a resilient system.

Stick around for a bonus at the end of the discussion, where we talk a little bit about complexity!

Image: www.pexels.com/photo/green-and-b…ug-on-air-905905/
Transcript: otter.ai/u/udK3y3upQMszo2IVtbrdGigmehE

Rob’s Hot Take:

In the July 26th DevOps Lunch and Learn episode, Rob Hirschfeld delves into the intricacies of monitoring and events, highlighting the importance of eventing systems for scalability. The discussion explores the intersection between building a resilient standalone system using polling and enhancing responsiveness through eventing to create a comprehensive and adaptable solution. The key takeaway emphasizes the need for systems that can effectively integrate both polling and eventing to ensure durability and improved performance. For a detailed exploration of these concepts, tune in to the full podcast on monitoring and eventing from July 26th at the2030.cloud.

Humans vs Code: Governance As Code

Human factors make governance as code a challenge – today we discuss why looking at things like audit and how we determine what has happened and respond to it in an automated way, may be a great first step to adding controls into a system.

We talk about a lot of human factors of what makes it hard to create a governance system, or what creates a biased system or an unevenly governed system.

We spent the first couple minutes of this podcast talking about our agenda, and those conversations spell out a lot of interesting topics that we will discuss. So hang in for those first couple of minutes, and then we will get straight to the governance.

Transcript: otter.ai/u/aqx5-wivDgPARqAXwXGCIm-bO5U
Image: www.pexels.com/photo/belgium-fla…-building-532864/

Infrastructure Governance As Code

We continue our Governance as Code discussions in today’s episode.

We started by very broadly looking at Governance as Code generally, but quickly drilled down into Infrastructure as Code meets Governance as Code focused discussion. Understanding that intersection is critical to building something that is both automated and governable.

The topic explored how we audit controls for systems. We also need to make sure that when we build infrastructure, it’s following our policies. The challenge here is making sure that what we’ve automated is conforming to our governance.

Image: www.pexels.com/photo/group-of-pe…tructure-2100942/
Transcript: otter.ai/u/-vI03TkWcLpvTIBRrrKE9DugYvw

Orchestration Automation Workflow [with Terraform]

Building reliable automation at scale for infrastructure presents challenges. In this episode, we discuss orchestration, workflow automation, and the reconciler pattern in the context of Terraform.

We refer to the pattern of Terraform, automation, and orchestration systems as “TACOS” and today we dig into how you test it and check it against drift. These are real topics of operational concern for anybody building any type of infrastructure.

Transcript: otter.ai/u/w-NA0HBsTc5NRaqWQQwlWUj4Whw
Image: www.pexels.com/photo/person-hold…ith-food-8448079/

Rob’s Hot Take:

In the April 5th Cloud 2030 Podcast episode, Rob Hirschfeld discusses orchestration, automation, and workflow, focusing on Terraform and introducing the “Terraform Automation and Orchestration” (TACO) pattern. The conversation emphasizes that while Terraform is a valuable tool, the broader patterns of reconciliation, GitOps, and event-driven automation are crucial for building and maintaining complex systems over time. Hirschfeld encourages listeners to view tools like Terraform and Ansible as initial steps in a journey, prompting consideration of scaling, building orchestration systems, and understanding the importance of comprehensive system development. For more in-depth discussions, explore the full episode on orchestration, automation, and workflow from April 5th, and join the ongoing conversations at the2030.cloud.

Everything As Code !

What makes Everything as Code and Infrastructure as Code interesting? In today’s episode, we discuss what makes something code-like and the idea of Everything as Code, based on Patrick Dubois’ article “In depth research and trends analyzed from 50+ different concepts as code.”

Reference: www.jedi.be/blog/2022/02/23/tre…0-as-code-concepts/

Some of our conclusions were practical, like if a concept is a process that is reproducible and auditable, that’s what makes it code-like. And some other possible conclusions were that it’s just marketing because it makes everything programmable. The reality is somewhere in the middle.

Transcript: otter.ai/u/E1TezO2XutwJyS-vCNetslwWO4A
Image: www.pexels.com/photo/man-in-grey…icky-note-879109/

Rob’s Hot Take:

In the Cloud 2030 Podcast episode on March 29th, Rob Hirschfeld provides insights on the “everything as code” discussion. While acknowledging the term’s playful exaggeration, Hirschfeld emphasizes the underlying desire for reproducibility, auditability, and code-like experiences in various aspects of operational and infrastructure activities. Despite the term’s potential for marketing hype, the aspiration to apply code principles to different facets of infrastructure management remains significant, influencing how we build and manage systems. To delve into this engaging discussion, check out the full episode on March 29th, available on the2030.cloud.

Improving Automation Safety

Making automation safe is essential to making it usable at scale. How do we make automation safe? We found a lot of great insights drawing from space craft design, aircraft, aircraft design and other systems where safety is super important.

Automation is a force multiplier. If we don’t factor in safety when we build it,then we could create a lot of harm in systems from wasteful spending to actual injury. These designs have very real implications.

Transcript: otter.ai/u/p9w4aKOqm3rpHhbDtRTaLgN3GIA
Image: www.pexels.com/photo/toddler-usi…-on-road-1642055/

Rob’s Hot Take:

In the Cloud 2030 Podcast on March 15th, Rob Hirschfeld underscores the critical importance of automation safety in system design. Emphasizing the need for thorough testing, he discusses how safety, especially in complex systems like airplanes and spacecraft, requires continuous testing and monitoring. The conversation delves into the significance of not just completing tasks but also exercising and testing systems in various scenarios to ensure their safety. To explore these insights further, listen to the full episode on March 15th at the2030.cloud and participate in the ongoing discussions.

Expanding GitOps Beyond K8s

GitOps is a really important way of collaborating and communicating about infrastructure.

But can GitOps escape from Kubernetes? While we did talk about Kubernetes too, we mainly talked about what it takes to implement GitOps outside of Kubernetes. We considered building a GitOps architecture and then having people understand and use it. We also cover the fundamental parts of GitOps like having a reconciler and a bunch of tools that drive clusters.

Transcript: otter.ai/u/oq4D06Sd_rtUvXBVXC0Wx3KA2sQ
Image: www.pexels.com/photo/people-with…popcorns-7234318/

Rob’s Hot Take:

In the March 8th DevOps Lunch and Learn session on GitOps, Rob Hirschfeld emphasizes the crucial role of immutability in operations. The concept of specifying a fixed state, configuration set, or resource transforms how automation, infrastructure building, and system maintenance are approached. The investment in immutable components enhances change resilience, making it easier to adapt and keep up with changes while ensuring stability. Join the ongoing conversations and roundtables at the2030.cloud to contribute to discussions on these transformative concepts.