TechOps Scaling Challenges

In this episode, we talk about scale and the hard realities of system failure in large tech operations. We explore why rare failures become common at scale, and what it takes to build systems that can handle that pressure. From predictive diagnostics to component redundancy, we share practical insights on keeping high-performance and AI infrastructure resilient. This is not theory, it is grounded in real-world lessons from managing complex environments and learning how to plan, isolate, and adapt when things go wrong.

Transcript: otter.ai/u/X8JYiADfPPLEfQ-gge…?utm_source=copy_url

The Opportunity for OpenShift Infrastructure

Today we tackle the generational infrastructure shift that’s keeping IT leaders awake at night: OpenShift virtualization adoption. We dig deep into why organizations are struggling to migrate from traditional VM-focused infrastructure to Kubernetes-managed infrastructure. We explore the real hurdles blocking this transition and unpack the strategic positioning that matters when you’re moving to container-orchestrated infrastructure. This isn’t about dumping everything into Kubernetes and calling it done, we examine what it really takes to use Kubernetes as your infrastructure abstraction layer while navigating the operational realities that make or break these migrations.

Transcript: otter.ai/u/IY2Y0a4aFN99ILg9da…?utm_source=copy_url

HA Troubleshooting [Tech Ops]

This episode of the TechOps series goes into high availability troubleshooting. Not just high availability, not just troubleshooting, but actually talking through what it takes to manage and maintain and fix HA systems. This is part of a longer discussion we’ve been having and so there’s some really interesting ideas in the middle of these discussions that I hope will shape your thinking as you build high availability systems, diagnostics and troubleshooting for people who are in high availability very complex environments.

Transcript: otter.ai/u/wM__4w1YIzZnhVdgLu…?utm_source=copy_url

References:
status.openai.com/incidents/ctrsv3lwd797\

High Availability Technology in DRP [TechOps]

Today we dive into RackN high availability technology and what we did to build consensus based raft HA capabilities directly into Digital Rebar. This is one of those episodes where we are talking specifically and only about Digital Rebar, so it is a vendored conversation from that perspective.

If you are building HA systems, or are interested in how HA systems work, this is a great session to learn firsthand from our experience!

Transcript: otter.ai/u/9lA9djczp5GkJbj12k…?utm_source=copy_url

Why is adding LLM into an App so hard?

We talk about current events, the acquisition of data stacks and the closing of the HashiCorp acquisition by IBM. Later, we dive into the productivity of AI and what’s going on – are companies really getting the benefits that they expect from AI chat bot integrations and what the challenges are?

We touch base on a little bit of something more infrastructure focused, where I give a preview of work I’ve been doing on separating Kubernetes virtualization from Kubernetes development use cases, which is something that we will be talking about more in the future.

References:
www.windowscentral.com/software-apps…ind-a-paywall
www.ibm.com/new/announcements/i…ise-ai-applications
www.youtube.com/watch?v=Ioc3r70HNLM
www.linkedin.com/posts/dhinchclif…9498138624-jR2R/
20250227

Software Defined Edge

We revisit edge infrastructure and the motivations behind building and managing edge infrastructure with an unusual take. In this case, we ask ourselves if all of these edge devices are becoming more software defined or becoming more standardized, off the shelf component tree. And will that change how we look at managing and running edge infrastructure? Will we shift compute and operations processes into these ever smarter devices? The answer is going to surprise you.

Transcript: otter.ai/u/tGIcIC1bijvaW4OkJN…?utm_source=copy_url

Silos Vs Systems

Martez Reed and I have an in depth conversation about the challenges of propagating technology inside of enterprises, this core challenge of selling silos and individual technologies. What Martez describes as beneficial tool sprawl versus building up systems and integrating things and end to end technology. This is what I’ve been calling infrastructure pipelining. We break down what’s going on in the street related to Open Source technology, Kubernetes, other aspects of what’s happening and how things fit together in an interesting and dynamic way.

Transcript: otter.ai/u/2M4P8U1haMsoT2ahg3…?utm_source=copy_url