Hardware – Cloud2030 Podcast

Secondary Markets for Infrastructure

In this episode we dive into the practice of recapturing gear from data centers and how it can be used in the secondary market and the ramifications for that market. What started off as a tangent ended up as a rabbit hole rewarding conversation!

Transcript: otter.ai/u/QR43ePMfKLNrxH1bZo…?utm_source=copy_url

AI Export Controls Work?

We discuss whether or not AI export controls work, but we take a really interesting twist because what we talk about is manufacturing. What we talk about is innovation, and it’s not whether or not you can control AI chips, but what does it actually take to build innovative product? That’s where we really have challenges on export and controls. There’s military manufacturing and goods, and that’s part of what this AI embargo is about. We really talk about how challenging it is to actually build truly innovative manufacturing and what the barriers are.

Transcript: otter.ai/u/k6Thp-TOfwKc_RjXHC…?utm_source=copy_url

Cloud2030 AI Hardware GPUs nVidia Export China

Deflating Cloud Mythology [+ book club]

Is hardware going to be innovative and change? Brian Cantrell brings up oxide computing and some of their design motivation.

Today we discuss our skepticism about some of his points, as well as the impacts for cloud distributed Compute hardware design mainframes, cloud, repatriation, and a whole bunch of topics about next generation thinking in Compute infrastructure management and applications.

We are officially starting our cloud2030 book group and I hope you will join us – we are going to be reading Data Cartels by Sara Landon, followed by Investments Unlimited by John Willis and crew.

Book Clubs Links:

May 4 > Data Cartels www.amazon.com/Data-Cartels-Comp…ion/dp/1503633713

Early July >
www.amazon.com/Investments-Unlim…tal/dp/1950508536

Transcript: otter.ai/u/S7CRv2J9fmOjAc8HM_…?utm_source=copy_url
Image: www.pexels.com/photo/a-man-stand…-balloon-9128460/

A Path for Cloud Standardization?

We discuss standards, de facto standards, and cloud standards. It comes down to how we are creating repeatable results for the cloud marketplace.

Ideally, we’re creating marketplaces where standards can be shared. We’d consider Amazon as the primary example, but we also talk about hardware and Kubernetes which have their own marketplaces.

Ultimately, we asked if we are creating standardized cloud infrastructure? The short answer is no.

Transcript: otter.ai/u/kGT8pGfbslZRgFktM0pE3AifwWI
Image: www.pexels.com/photo/measuring-g…tar-pick-3988555/

Rob’s Hot Take:

Rob Hirschfeld, CEO and co-founder of RackN and host of the Cloud 2030 Podcast, reflects on the November 30th DevOps Lunch and Learn session focused on standards and vendors’ attempts to establish standard operating processes. He highlights the market’s lack of convergence or trust in vendor-driven standards, emphasizing the durability of certain influential standards in the industry compared to vendor-specific APIs. Hirschfeld underscores the ongoing need for standard operating models and APIs to address market complexity, encouraging listeners to explore the insightful discussion at the2030.cloud for deeper insights into standardization efforts within the industry.

Shouldn’t we have Standard Automation for Commodity Infrastructure?

Our focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

An entire chapter of the Google SRE book was dedicated to the benefits of improving data center provisioning via automation; however, the description was abstract with a focus on the importance of validation testing and self-healing. That lack of detail is not surprising: Google’s infrastructure automation is highly specialized and considered a competitive advantage.

Shouldn’t everyone be able to do this?

After all, data centers are built from the same basic components with the same protocols.

Unfortunately, the stack of small (but critical) variations between these components makes it very difficult to build a universal solution. Reasonable variations like hardware configuration, vendor out-of-band management protocol, operating system, support systems and networking topologies add up quickly. Even Google, with their tremendous SRE talent and time investments, only built a solution for their specific needs.

To handle this variation, our SRE teams bake assumptions about their infrastructure directly into their automation. That’s expedient because there’s generally little operational reward for creating generic solutions for specific problems. I see this all the time in data centers that have server naming conventions and IP address schemes that are the automation glue between their tools and processes. While this may be a practical tactic for integration, it is fragile and site specific.

Hard coding your operational environment into automation has serious downsides.

First, it creates operational debt [reference] just like hard coding values in regular development. Please don’t mistake this as a call for yak shaving provisioning scripts into open ended models! There’s a happy medium where the scripts can be robust about infrastructure like ips, NIC ordering, system names and operating system behavior without compromising readability and development time.

Second, it eliminates reuse because code that works in one place must be forked (or copied) to be used again. Forking creates a proliferation of truth and technical debt. Unlike a shared script, the forked scripts do not benefit from mutual improvements. This is true for both internal use and when external communities advance. I have seen many cases where a company’s decision to fork away from open source code to “adjust it for their needs” cause them to forever lose the benefits accrued in the upstream community.

Consequently, Ops debt is quickly created when these infrastructure specific items are coded into the scripts because you have to touch a lot of code to make small changes. You also end up with hidden dependencies

However, until recently, we have not given SRE teams an alternative to site customization.

Of course, the alternative requires some additional investment up front. Hard coding and forking are faster out of the gate; however, the SRE mandate is to aggressively reduce ongoing maintenance tasks wherever possible. When core automation is site customized, Ops loses the benefits of reuse both internally and externally.

That’s why we believe SRE teams work to reuse automation whenever possible.

rebar-1 Digital Rebar was built from our frustration watching the OpenStack community struggle with exactly this lesson. We felt that having a platform for sharing code was essential; however, we also observed that differences between sites made it impossible to share code. Our solution was to isolate those changes into composable units. That isolation allowed us take a system integration view that did not break when inevitable changes were introduced.

If you are interested in breaking out of the script customization death spiral then review what the RackN team has done with Digital Rebar.

Even if you don’t use the code, the approach could save your SRE team a lot of heartburn down the road. Of course, if you do want to use it then just contact us at sre@rackn.com.