Vibe Coding for Ops [TechOps]

In this episode, we do some live vibe coding– using AI to write code. We share tips and tricks on having the best vibe coding experience and avoiding some common pitfalls. You’ll get to hear what we do, how we discover what the steps are, just how easy it is to interact with the system, to set up a basic environment. We also start to explore the limitations of vibe coding. We encourage you to listen along and try on your own!

Transcript: otter.ai/u/CqKdtWZWYb3AdPtcb-…?utm_source=copy_url

TechOps Scaling Challenges

In this episode, we talk about scale and the hard realities of system failure in large tech operations. We explore why rare failures become common at scale, and what it takes to build systems that can handle that pressure. From predictive diagnostics to component redundancy, we share practical insights on keeping high-performance and AI infrastructure resilient. This is not theory, it is grounded in real-world lessons from managing complex environments and learning how to plan, isolate, and adapt when things go wrong.

Transcript: otter.ai/u/X8JYiADfPPLEfQ-gge…?utm_source=copy_url

HA Troubleshooting [Tech Ops]

This episode of the TechOps series goes into high availability troubleshooting. Not just high availability, not just troubleshooting, but actually talking through what it takes to manage and maintain and fix HA systems. This is part of a longer discussion we’ve been having and so there’s some really interesting ideas in the middle of these discussions that I hope will shape your thinking as you build high availability systems, diagnostics and troubleshooting for people who are in high availability very complex environments.

Transcript: otter.ai/u/wM__4w1YIzZnhVdgLu…?utm_source=copy_url

References:
status.openai.com/incidents/ctrsv3lwd797\

Kubernetes on Prem vs Cloud

We step back in this episode of our Tech Ops series and talk about cloud self managed infrastructure and how you balance the competing concerns. We started from a report that RackN had commissioned talking about on premises Kubernetes, and mixing that into your IT infrastructure.

Can you have a cloud broker? Can you do multi cloud, some sort of tried and true topics for cloud consideration, but through a new filter and through this repatriation idea of mixing and matching your IT Infrastructure?

Transcript: otter.ai/u/FKGuQpV-5bQFVASAYD…?utm_source=copy_url

Resources:
store.repebble.com/
rackn.com/2025/03/18/ready-for…netes-on-bare-metal/
www.reuters.com/technology/cybers…ports-2025-03-18/
gabrielsimmer.com/blog/kubernetes-plus-oneplus

Cloud2030KubernetesVMwareVirtualizationOn PremisesBare Metalautomationdevopsenterprise IT

Writing Great Test Scripts [TechOps]

We deep dive into something seemingly very small, but with a lot of repercussions for how you manage and run a data center, and that is test scripts for servers.

As you’re going through a production cycle or a provisioning cycle, how do you test? What do you test? This topic was from a Reddit thread that we answered and then had a whole hour conversation about just how important and impactful this type of script is.

Transcript: otter.ai/u/Cb3yac8JHvlM2yqh72…?utm_source=copy_url

High Availability Technology in DRP [TechOps]

Today we dive into RackN high availability technology and what we did to build consensus based raft HA capabilities directly into Digital Rebar. This is one of those episodes where we are talking specifically and only about Digital Rebar, so it is a vendored conversation from that perspective.

If you are building HA systems, or are interested in how HA systems work, this is a great session to learn firsthand from our experience!

Transcript: otter.ai/u/9lA9djczp5GkJbj12k…?utm_source=copy_url

Gitops and Immutability [TechOps Series]

The cloud2030 Tech Ops series is an ongoing discussion for us to create what I think of as 200 level content for tech and operations leaders, exploring really complex, deep topics in a thoughtful way to really extend your knowledge base and capabilities in the data center and infrastructure space.

Today’s episode talks about gitops and immutability, and what we’re doing here is connecting together the operational concepts between controls and desired state communications and how that gets executed in infrastructure in an operations sense. Rather than a developer approach, this takes an operations approach. So if you are interested in how to manage immutability and what that means in infrastructure, this discussion is for you.

Logging [TechOps Series]

We dive deep into logging, tracing, metrics, observability, with a specific filter for automation and systems and infrastructure.

There’s a real challenge here of how you capture information from a running system in a way that provides the right information at the right time. That fundamentally is the question that we are working to answer throughout this really fascinating discussion about logging.

Transcript: otter.ai/u/msNO2gn1b0FP2lK7rS…?utm_source=copy_url