TechOps Scaling Challenges

In this episode, we talk about scale and the hard realities of system failure in large tech operations. We explore why rare failures become common at scale, and what it takes to build systems that can handle that pressure. From predictive diagnostics to component redundancy, we share practical insights on keeping high-performance and AI infrastructure resilient. This is not theory, it is grounded in real-world lessons from managing complex environments and learning how to plan, isolate, and adapt when things go wrong.

Transcript: otter.ai/u/X8JYiADfPPLEfQ-gge…?utm_source=copy_url

Container Driven Architecture

In this episode, we continue our dive into the changing architecture of IT infrastructure and look at how containers and container platforms are changing. We also look at the fundamental nature of what people want to buy, accelerated by VMware Broadcom, making virtualization platforms much less attractive, and the shifting landscape here. This is work that is based on a presentation that I’ve been giving around the shift towards open shift virtualization and Kubernetes in general.

Transcript: otter.ai/u/BnYKzI0zzOLqqWi45v…?utm_source=copy_url

HA Troubleshooting [Tech Ops]

This episode of the TechOps series goes into high availability troubleshooting. Not just high availability, not just troubleshooting, but actually talking through what it takes to manage and maintain and fix HA systems. This is part of a longer discussion we’ve been having and so there’s some really interesting ideas in the middle of these discussions that I hope will shape your thinking as you build high availability systems, diagnostics and troubleshooting for people who are in high availability very complex environments.

Transcript: otter.ai/u/wM__4w1YIzZnhVdgLu…?utm_source=copy_url

References:
status.openai.com/incidents/ctrsv3lwd797\

Why is adding LLM into an App so hard?

We talk about current events, the acquisition of data stacks and the closing of the HashiCorp acquisition by IBM. Later, we dive into the productivity of AI and what’s going on – are companies really getting the benefits that they expect from AI chat bot integrations and what the challenges are?

We touch base on a little bit of something more infrastructure focused, where I give a preview of work I’ve been doing on separating Kubernetes virtualization from Kubernetes development use cases, which is something that we will be talking about more in the future.

References:
www.windowscentral.com/software-apps…ind-a-paywall
www.ibm.com/new/announcements/i…ise-ai-applications
www.youtube.com/watch?v=Ioc3r70HNLM
www.linkedin.com/posts/dhinchclif…9498138624-jR2R/
20250227

Virtualization in Containers (KubeVirt, OpenShift Virtualization)

In this episode, we dive deeper into the new architectural trends for infrastructure designers in this coming decade, which is a transition from virtualization platforms first like VMware into containerized platforms first. But this time, we talk through the use of virtualization in containerized systems – keeping VMs but with what changes are necessary to make a containerized virtualization platform dominant instead of a virtualized virtualization platform.

Reference:
kubevirt.io/user-guide/architecture/
www.redhat.com/en/technologies/c…ft/virtualization

Cloud2030VirtualizationVMwareContainersKVMKubeVirtOpenShiftIT

Software Defined Edge

We revisit edge infrastructure and the motivations behind building and managing edge infrastructure with an unusual take. In this case, we ask ourselves if all of these edge devices are becoming more software defined or becoming more standardized, off the shelf component tree. And will that change how we look at managing and running edge infrastructure? Will we shift compute and operations processes into these ever smarter devices? The answer is going to surprise you.

Transcript: otter.ai/u/tGIcIC1bijvaW4OkJN…?utm_source=copy_url

Silos Vs Systems

Martez Reed and I have an in depth conversation about the challenges of propagating technology inside of enterprises, this core challenge of selling silos and individual technologies. What Martez describes as beneficial tool sprawl versus building up systems and integrating things and end to end technology. This is what I’ve been calling infrastructure pipelining. We break down what’s going on in the street related to Open Source technology, Kubernetes, other aspects of what’s happening and how things fit together in an interesting and dynamic way.

Transcript: otter.ai/u/2M4P8U1haMsoT2ahg3…?utm_source=copy_url

Time for SBOMS? What’s Ahead for 2024?

After a brief hiatus, thecloud2030 group is back and deep in tech, talking about things that we think are going to come on the tech front, sans AI.

In this episode, we take some time to go through Kubernetes, hardware, software, bill of materials, and some governance. This includes a smattering of predictions to get your year started off with a bang.

From there, we are going to be moving into our tech-ops series. Find more details about that in today’s outro!

Resources:
www.theregister.com/2023/12/27/bruc…erens_post_open
developersalliance.org/open-source-l…ty-is-coming/

Transcript: otter.ai/u/UQyqHKJ9oNd1SquAWW…?utm_source=copy_url
Image by DALLE: cartoon images of a robot reviewing a long bill of materials on a scroll of paper.

Cloud20302024SBOMCloudAutomationInfrastructureOSSOpen Source