Reading Logs and Events [TechOps]

This TechOps episode explores the challenges of processing events and logs in technical operations.

The discussion covers the importance of understanding the intent and purpose of building systems downstream from eventing and logging systems. Key topics include the trade-offs between real-time and delayed event processing, the principle of least privilege, and strategies for handling event buffering and dropping. The conversation also touches on security concerns related to event and log data.

The episode concludes with plans for future discussions on adding events and logging to scripts to make them more useful.

Process: Good, Bad And Ugly

This podcast episode explores the challenges of process improvement in IT operations, using examples from data centers, automotive, and cybersecurity.

The discussion covers the slow evolution of secure boot, the difficulties cloud providers face in translating their processes to the broader market, and the emergence of vehicle-to-anything ecosystems. The group delves into the need for standardization and security in vehicle ecosystems, as well as the policy management and automation challenges enterprises face.

The conversation also examines the balance of trust in technology versus human expertise, particularly around the use of AI and the risks of generative AI. The CrowdStrike incident is analyzed, with debate around the responsibility of CrowdStrike, Microsoft, and Delta’s operational controls. The impact on cyber insurance and the need for broader risk management approaches are also discussed, highlighting the interconnectedness of process improvement and risk management, and the call for greater industry collaboration to address these challenges.

Transcript: otter.ai/u/93JhNjmekqf0ttX21g…?utm_source=copy_url

Supply Chain Security [TechOps]

In this episode, we dive deep into a recent and highly sophisticated SSH intrusion attack that was discovered in the Linux kernel. We’ll discuss how the attackers were able to inject a backdoor into a critical compression library, leveraging social engineering tactics to become a trusted maintainer over several years.

Advanced SSH [TechOps]

SSH and Secure Shell is one of those topics that people take for granted because it is a ubiquitous way to log in and access systems. True to form for the TechOps series, though, we break that down into much more detailed and granular components.

We talk about how to secure it and what best practices are. We also discuss how to use it for tunneling, or, more specifically, not use it for tunneling, and why all of this matters to your operations environment. Listen to what new things we’re doing that avoid having to have network access at all.

Transcript: otter.ai/u/XSRBfnifZOF0-nlNU5…?utm_source=copy_url

UEFI Trust & Secure Boot Issue

We explore the UEFI certificate issue in which secure boot is potentially compromised. Certificates that are included in most UEFI BIOSes have been compromised in ways that could easily be used as an attack vector, a very significant flaw and something that should be on your purview and radar to fix and patch.

We’re going to talk about what the issue is, why it’s important, how secure boot works, and what you can do to mitigate this problem in your own infrastructure. An important episode for anybody running or managing desktops, data centers or any infrastructure of any type.

Transcript: otter.ai/u/H15Z2NZDom8Hta8gHJ…?utm_source=copy_url

Compliance Death Curve [Working Session 1]

The compliance death curve is something I’ve been working on as an evolving concept that tries to explain how companies fight compliance governance and standardization efforts, something that is critical to platform team and infrastructure operations.

Today we try to decompose some of the mathematics that I’ve been using into more universal, more easily understood components. We built a compliance flywheel that I found really fascinating which you can see an example of that work in our podcast description.

It could also be helpful to check out my previously recorded compliance death curve talk that has been released.

Resources:
www.youtube.com/watch?v=4RUKsakKZI0

Transcript: otter.ai/u/k9q5ZZ81Hm-EAAtfkV…?utm_source=copy_url

Data Ops Platforms [Does DevOps work in AI?]

We dive into data operations in today’s episode! We cover the idea that with all of the work we’re doing in AI and ML data analytics analysis, you actually have to steward your data.

We also cover processes controls, like what we have with DevOps in infrastructure, but with similar types of concepts (governance controls automation) around how your data is flowing in your system.

Transcript: otter.ai/u/pesotDnHCCD5lyPVx7…?utm_source=copy_url
Image by DALL-E

Broadcom Creates Chaos & Opportunity

We dive into the chaos created by Broadcom’s acquisition of VMware. In this episode, we discuss what Broadcom is doing, why it’s a problem, how enterprises are reacting, and what alternatives are on the market.

We cover the whole mess in all its glory, and even provide some love for Broadcom.

Resources:
www.thestack.technology/vmware-is-kil…isor-and-nsx/
www.siderolabs.com/platform/saas-for-kubernetes/

Transcript: otter.ai/u/SO8PD-p8AHwwsKfGsN…?utm_source=copy_url
Image by DALL-E

DevOps and Legacy Buildings

Departing from our typical podcast format, today’s episode is part of a presentation that I’ve been preparing about comparing 125 year old house building architecture to modern DevOps. We also analyze as things that work and don’t work.

There are a lot of home maintenance stories and comparison notes. Particularly in the back half of the episode we get into how this type of challenge relates to Operations Management.

Refereces: nationalpost.com/news/canada/afte…oard-sewer-pipes

Transcript: otter.ai/u/jf8at50nf0KKQG7Drl…?utm_source=copy_url
Image by DALLE: Victorian house with the second floor redesigned in a modern style, featuring extensive use of glass. Each image also includes the porch with rockers and a poodle.