Today we dive deep into the mystery of Kubernetes installation, specifically OpenShift installation. We help explain why Kubernetes installs look so weird compared to traditional operations, install processes, and where are the playbooks? Where are the scripts? Are the runbooks describing all the steps you need to take? All of it seems to be missing, and in this podcast, we explain why.
DevOps Lunch and Learn focuses on home labbing versus enterprise use cases and why it is so tricky to satisfy the home user with enterprise products. This really is a dilemma because we love to see more crossover and we’re going to talk about why.
We deep dive into something seemingly very small, but with a lot of repercussions for how you manage and run a data center, and that is test scripts for servers.
As you’re going through a production cycle or a provisioning cycle, how do you test? What do you test? This topic was from a Reddit thread that we answered and then had a whole hour conversation about just how important and impactful this type of script is.
We talk about current events, the acquisition of data stacks and the closing of the HashiCorp acquisition by IBM. Later, we dive into the productivity of AI and what’s going on – are companies really getting the benefits that they expect from AI chat bot integrations and what the challenges are?
We touch base on a little bit of something more infrastructure focused, where I give a preview of work I’ve been doing on separating Kubernetes virtualization from Kubernetes development use cases, which is something that we will be talking about more in the future.
The cloud2030 Tech Ops series is an ongoing discussion for us to create what I think of as 200 level content for tech and operations leaders, exploring really complex, deep topics in a thoughtful way to really extend your knowledge base and capabilities in the data center and infrastructure space.
Today’s episode talks about gitops and immutability, and what we’re doing here is connecting together the operational concepts between controls and desired state communications and how that gets executed in infrastructure in an operations sense. Rather than a developer approach, this takes an operations approach. So if you are interested in how to manage immutability and what that means in infrastructure, this discussion is for you.
We dive deep into logging, tracing, metrics, observability, with a specific filter for automation and systems and infrastructure.
There’s a real challenge here of how you capture information from a running system in a way that provides the right information at the right time. That fundamentally is the question that we are working to answer throughout this really fascinating discussion about logging.
This TechOps episode explores the challenges of processing events and logs in technical operations.
The discussion covers the importance of understanding the intent and purpose of building systems downstream from eventing and logging systems. Key topics include the trade-offs between real-time and delayed event processing, the principle of least privilege, and strategies for handling event buffering and dropping. The conversation also touches on security concerns related to event and log data.
The episode concludes with plans for future discussions on adding events and logging to scripts to make them more useful.
This podcast episode explores the challenges of process improvement in IT operations, using examples from data centers, automotive, and cybersecurity.
The discussion covers the slow evolution of secure boot, the difficulties cloud providers face in translating their processes to the broader market, and the emergence of vehicle-to-anything ecosystems. The group delves into the need for standardization and security in vehicle ecosystems, as well as the policy management and automation challenges enterprises face.
The conversation also examines the balance of trust in technology versus human expertise, particularly around the use of AI and the risks of generative AI. The CrowdStrike incident is analyzed, with debate around the responsibility of CrowdStrike, Microsoft, and Delta’s operational controls. The impact on cyber insurance and the need for broader risk management approaches are also discussed, highlighting the interconnectedness of process improvement and risk management, and the call for greater industry collaboration to address these challenges.
In this episode, we dive deep into a recent and highly sophisticated SSH intrusion attack that was discovered in the Linux kernel. We’ll discuss how the attackers were able to inject a backdoor into a critical compression library, leveraging social engineering tactics to become a trusted maintainer over several years.
A software bill of materials is the idea that we can define and document exactly what goes into a system. We look at governance today and SBOMs as we put it together, both from a software and an operation side.