Crypto

Mastering API Governance: Inside SIG Architecture's Approach to Scalable System Design

2026-02-12 00:00
912 views

This is the fifth installment in our SIG Architecture Spotlight series, exploring the various subprojects within the special interest group. Today we focus on SIG Architecture: API Governance.

For this spotlight, we sat down with Jordan Liggitt, who leads the API Governance subproject.

Introduction

FM: Jordan, thanks for taking the time. Can you introduce yourself and tell us how you became involved with Kubernetes?

JL: I'm Jordan Liggitt—Christian, husband, father of four, software engineer at Google, and amateur musician when no one's looking. Born in Texas, though I've spent most of my life in North Carolina.

I started working on Kubernetes in 2014 while at Red Hat, focusing on authentication and authorization. My first pull request attempted to add an OAuth server to the Kubernetes API server. It never made it past work-in-progress status—I eventually took a different approach that layered on top of the core API server in a separate project, and closed the PR six months later without merging.

Despite that rocky start, I kept at it. I helped build Kubernetes authentication and authorization capabilities and got involved in defining and evolving the core APIs, from early beta versions like v1beta3 through to v1. I became an API reviewer in 2016 and an API approver in 2017.

Today, I co-lead the API Governance and code organization subprojects for SIG Architecture, and serve as a tech lead for SIG Auth.

FM: When did you specifically get involved with API Governance?

JL: Around 2019.

Goals and Scope of API Governance

FM: What are the main goals of the subproject, and what does its scope cover?

JL: The scope encompasses all of Kubernetes' APIs—and there are more than people realize. Beyond the obvious REST API, we also consider command-line flags, configuration files, how binaries execute, how they communicate with backend components like the container runtime, and how they persist data. These are all APIs with different audiences. The REST API has the largest and most diverse audience, but the others still require careful consideration even though their audiences are narrower.

Our goals are to maintain stability while enabling innovation. Stability is trivial if nothing ever changes, but that conflicts with evolution and growth. We balance those competing needs.

FM: In terms of ensuring consistency and quality—clearly a core reason this subproject exists—what are the specific quality gates in the lifecycle of a Kubernetes change? Does API Governance get involved during the release cycle, beforehand through guidelines, or both?

JL: We maintain guidelines and conventions covering both general API design and how to change existing APIs. These are living documents that we update as new scenarios emerge. Because they're long and dense, we supplement them with direct involvement at either the design or implementation stage.

Sometimes teams move forward with design work before getting API Review feedback due to bandwidth constraints. That's fine, but it means when implementation begins, the review happens then and may result in substantial changes. We engage whenever a new API is created or an existing one is modified—ideally at design time, but implementation stage works too.

FM: Does this happen during the Kubernetes Enhancement Proposal process? Since KEPs are mandatory for enhancements, I assume there's overlap with API Governance?

JL: It can. KEPs vary in detail. Some include literal API definitions, which lets us perform API review at the design stage. Implementation then becomes a matter of verifying fidelity to the approved design.

Early involvement is ideal. But some KEPs are conceptual and defer details to implementation. That's not wrong—it just means implementation will be more exploratory, and API Review happens later, potentially recommending structural changes.

There's a tradeoff either way: detailed upfront design versus iterative discovery during implementation. People and teams work differently, and we're flexible about consulting early or later in the process.

FM: This reminds me of Fred Brooks in "The Mythical Man-Month" emphasizing conceptual integrity as central to product quality. Regardless of process structure, someone must ensure that integrity. Since Kubernetes uses APIs everywhere—externally and internally—API Governance is critical to maintaining it. How do you capture that?

JL: The conventions document captures patterns we've learned over time—what to do in various situations. We also have automated linters and checks that enforce correctness around patterns like spec/status semantics. These tools catch issues even when humans miss them.

As new scenarios arise—and they constantly do—we work through how to approach them and fold the results back into our documentation and tooling. Sometimes it takes a few attempts before we settle on an approach that works well.

FM: Right. Each new interaction refines the guidelines.

JL: Exactly. And sometimes the first approach proves wrong. It may take two or three iterations to land on something robust.

The Impact of Custom Resource Definitions

FM: Has there been any particular change, episode, or domain that stands out as especially noteworthy or complex?

JL: The watershed moment was Custom Resources. Before that, every API was handcrafted and fully reviewed by us. There were inconsistencies, but we understood and controlled every type and field.

Custom Resources changed everything—anyone could define anything. The first version didn't even require a schema. That made it extremely powerful and enabled immediate extensibility, but it left us playing catch-up on stability and consistency.

When Custom Resources reached General Availability, schemas became required, though escape hatches remained for backward compatibility. Since then, we've been working to give CRD authors validation capabilities comparable to built-in types. Built-in validation rules for CRDs only reached GA in the last few releases.

CRDs opened the "anything is possible" era. Built-in validation rules represent the second major milestone: restoring consistency.

The three major themes have been defining schemas, validating data, and handling pre-existing invalid data. With ratcheting validation—which allows data to improve without breaking existing objects—we can now guide CRD authors toward conventions without breaking the world.

API Governance in Context

FM: How does API Governance relate to SIG Architecture and SIG API Machinery?

JL: SIG API Machinery provides the actual code and tools that people build APIs on. They don't review individual APIs for storage, networking, scheduling, and so on.

SIG Architecture sets overall system direction and works with API Machinery to ensure the system supports that direction. API Governance works with other SIGs building on that foundation to define conventions and patterns, ensuring consistent use of what API Machinery provides.

FM: That clarifies the flow. Regarding release cycles: do release phases—enhancements freeze, code freeze—change your workload? Or is API Governance mostly continuous?

JL: We engage at two points: design and implementation. Design involvement increases before enhancements freeze; implementation involvement increases before code freeze. However, many efforts span multiple releases, so there's always some design and implementation happening, even for work targeting future releases. Between those intense periods, we often have time for long-term design work.

An anti-pattern we see is teams thinking about a large feature for months, then presenting it three weeks before enhancements freeze saying, "Here's the design, please review." For big changes with API impact, early involvement with API Governance is much better.

The periods between freezes are ideal for this—that's when people have bandwidth for long-term review work.

Getting Involved

FM: How can someone get involved in API Governance? What should they focus on?

JL: It's usually best to follow a specific change rather than trying to learn everything at once. Pick a small API change—perhaps one someone else is making or one you want to make—and observe the full process: design, implementation, review.

High-bandwidth review—live discussion over video—is often very effective. If you're making or following a change, ask whether there's time to go over the design or PR together. Observing those discussions is extremely instructive.

Start with a small change, then move to a bigger one, then maybe a new API. That builds understanding of conventions as they're applied in practice.

FM: Excellent. Any final thoughts?

JL: Yes—the reason we care so much about compatibility and stability is for our users. It's easy for contributors to see those requirements as painful obstacles preventing cleanup or requiring tedious work, but users have integrated with our system, and we made a promise to them. We want them to trust that we won't break that contract. So even when it requires more work, moves slower, or involves duplication, we choose stability.

We're not trying to be obstructive; we're trying to make life good for users.

Many of our questions focus on the future: you want to do something now—how will you evolve it later without breaking it? We assume we'll know more in the future, and we want the design to leave room for that.

We also assume we'll make mistakes. The question then becomes: how do we leave ourselves avenues to improve while keeping compatibility promises?

FM: Exactly. Jordan, thank you—I think we've covered everything. This has been an insightful look into the API Governance project and its role in the wider Kubernetes ecosystem.

JL: Thank you.