In this episode, we continue our dive into the changing architecture of IT infrastructure and look at how containers and container platforms are changing. We also look at the fundamental nature of what people want to buy, accelerated by VMware Broadcom, making virtualization platforms much less attractive, and the shifting landscape here. This is work that is based on a presentation that I’ve been giving around the shift towards open shift virtualization and Kubernetes in general.
We step back in this episode of our Tech Ops series and talk about cloud self managed infrastructure and how you balance the competing concerns. We started from a report that RackN had commissioned talking about on premises Kubernetes, and mixing that into your IT infrastructure.
Can you have a cloud broker? Can you do multi cloud, some sort of tried and true topics for cloud consideration, but through a new filter and through this repatriation idea of mixing and matching your IT Infrastructure?
We springboard from DeepThinking AI and have a robust conversation about what impact DeepThink is having on the industry. We also discuss where we see things going into the dilemma of people building AI infrastructure and working to do that quickly, robustly and with strong governance. This is necessary to ensure that they can quickly update and manage that AI infrastructure that they’re spending so much money to build, and this leads into a broader conversation about virtualization, containers and open shift.
We deep dive into something seemingly very small, but with a lot of repercussions for how you manage and run a data center, and that is test scripts for servers.
As you’re going through a production cycle or a provisioning cycle, how do you test? What do you test? This topic was from a Reddit thread that we answered and then had a whole hour conversation about just how important and impactful this type of script is.
Today we dive into RackN high availability technology and what we did to build consensus based raft HA capabilities directly into Digital Rebar. This is one of those episodes where we are talking specifically and only about Digital Rebar, so it is a vendored conversation from that perspective.
If you are building HA systems, or are interested in how HA systems work, this is a great session to learn firsthand from our experience!
We revisit edge infrastructure and the motivations behind building and managing edge infrastructure with an unusual take. In this case, we ask ourselves if all of these edge devices are becoming more software defined or becoming more standardized, off the shelf component tree. And will that change how we look at managing and running edge infrastructure? Will we shift compute and operations processes into these ever smarter devices? The answer is going to surprise you.
This TechOps episode explores the challenges of processing events and logs in technical operations.
The discussion covers the importance of understanding the intent and purpose of building systems downstream from eventing and logging systems. Key topics include the trade-offs between real-time and delayed event processing, the principle of least privilege, and strategies for handling event buffering and dropping. The conversation also touches on security concerns related to event and log data.
The episode concludes with plans for future discussions on adding events and logging to scripts to make them more useful.
This podcast episode explores the challenges of process improvement in IT operations, using examples from data centers, automotive, and cybersecurity.
The discussion covers the slow evolution of secure boot, the difficulties cloud providers face in translating their processes to the broader market, and the emergence of vehicle-to-anything ecosystems. The group delves into the need for standardization and security in vehicle ecosystems, as well as the policy management and automation challenges enterprises face.
The conversation also examines the balance of trust in technology versus human expertise, particularly around the use of AI and the risks of generative AI. The CrowdStrike incident is analyzed, with debate around the responsibility of CrowdStrike, Microsoft, and Delta’s operational controls. The impact on cyber insurance and the need for broader risk management approaches are also discussed, highlighting the interconnectedness of process improvement and risk management, and the call for greater industry collaboration to address these challenges.
In this episode, we dive deep into a recent and highly sophisticated SSH intrusion attack that was discovered in the Linux kernel. We’ll discuss how the attackers were able to inject a backdoor into a critical compression library, leveraging social engineering tactics to become a trusted maintainer over several years.
A software bill of materials is the idea that we can define and document exactly what goes into a system. We look at governance today and SBOMs as we put it together, both from a software and an operation side.