We dive deep into logging, tracing, metrics, observability, with a specific filter for automation and systems and infrastructure.
There’s a real challenge here of how you capture information from a running system in a way that provides the right information at the right time. That fundamentally is the question that we are working to answer throughout this really fascinating discussion about logging.
This TechOps episode explores the challenges of processing events and logs in technical operations.
The discussion covers the importance of understanding the intent and purpose of building systems downstream from eventing and logging systems. Key topics include the trade-offs between real-time and delayed event processing, the principle of least privilege, and strategies for handling event buffering and dropping. The conversation also touches on security concerns related to event and log data.
The episode concludes with plans for future discussions on adding events and logging to scripts to make them more useful.
This podcast episode explores the challenges of process improvement in IT operations, using examples from data centers, automotive, and cybersecurity.
The discussion covers the slow evolution of secure boot, the difficulties cloud providers face in translating their processes to the broader market, and the emergence of vehicle-to-anything ecosystems. The group delves into the need for standardization and security in vehicle ecosystems, as well as the policy management and automation challenges enterprises face.
The conversation also examines the balance of trust in technology versus human expertise, particularly around the use of AI and the risks of generative AI. The CrowdStrike incident is analyzed, with debate around the responsibility of CrowdStrike, Microsoft, and Delta’s operational controls. The impact on cyber insurance and the need for broader risk management approaches are also discussed, highlighting the interconnectedness of process improvement and risk management, and the call for greater industry collaboration to address these challenges.
In this episode, we dive deep into a recent and highly sophisticated SSH intrusion attack that was discovered in the Linux kernel. We’ll discuss how the attackers were able to inject a backdoor into a critical compression library, leveraging social engineering tactics to become a trusted maintainer over several years.
A software bill of materials is the idea that we can define and document exactly what goes into a system. We look at governance today and SBOMs as we put it together, both from a software and an operation side.
SSH and Secure Shell is one of those topics that people take for granted because it is a ubiquitous way to log in and access systems. True to form for the TechOps series, though, we break that down into much more detailed and granular components.
We talk about how to secure it and what best practices are. We also discuss how to use it for tunneling, or, more specifically, not use it for tunneling, and why all of this matters to your operations environment. Listen to what new things we’re doing that avoid having to have network access at all.
Martez Reed and I have an in depth conversation about the challenges of propagating technology inside of enterprises, this core challenge of selling silos and individual technologies. What Martez describes as beneficial tool sprawl versus building up systems and integrating things and end to end technology. This is what I’ve been calling infrastructure pipelining. We break down what’s going on in the street related to Open Source technology, Kubernetes, other aspects of what’s happening and how things fit together in an interesting and dynamic way.
The compliance death curve is something I’ve been working on as an evolving concept that tries to explain how companies fight compliance governance and standardization efforts, something that is critical to platform team and infrastructure operations.
Today we try to decompose some of the mathematics that I’ve been using into more universal, more easily understood components. We built a compliance flywheel that I found really fascinating which you can see an example of that work in our podcast description.
It could also be helpful to check out my previously recorded compliance death curve talk that has been released.
TechOps series episode 3 covers how to automate against API’s. We discuss exactly the ways in which you can use API’s effectively, and ways you can run into trouble. We also discuss how we should be consuming API’s, both as a consumer but also in times when we have produced API’s. Many ideas discussed were pulled from learning how people consume our API’s and what we can do to help make them better and safer.
Enjoy this broader TechOps series where we are diving in deep in tips and techniques that improve your journey as an Automator.
How can we understand agility and adaptability? In this discussion, we get very concrete about the differences between agility and adaptability and why that’s important for you as you go on your own innovation journey.
This includes looking for places where standards can be applied and accelerate your team, where it’s too early, and learning iterations that we would call agile processes more appropriate. We also discuss how teams get caught in the middle between standardization and agility.Transcript: otter.ai/u/vsWqEiJpssWnyqlOCm…?utm_source=copy_urlImage by DALL-E