Banking

Kubernetes v1.35 Brings Job Managed By Feature to General Availability

2025-12-18 10:30
317 views

Kubernetes v1.35 brings the .spec.managedBy field to General Availability, enabling external controllers to assume complete ownership of Job management. This capability allows specialized controllers to handle Job lifecycle operations while maintaining compatibility with the Kubernetes API, offering teams greater flexibility in orchestrating batch workloads through custom control planes.

Kubernetes v1.35 introduces a deceptively simple feature that addresses a complex architectural challenge: how to coordinate batch workloads across multiple clusters without creating a tangled mess of custom controllers. The .spec.managedBy field, now graduating to General Availability, lets external controllers take full ownership of Job reconciliation—a capability that becomes critical when you're orchestrating compute-intensive workloads across geographically distributed infrastructure.

At its core, this feature solves a problem that emerged as Kubernetes deployments scaled beyond single clusters. Organizations running machine learning pipelines, data processing jobs, or scientific computing workloads increasingly need to distribute work across multiple clusters—whether for capacity reasons, geographic requirements, or cost optimization. The traditional Kubernetes Job controller wasn't designed for this multi-cluster reality. As enterprises have matured their Kubernetes adoption, the limitations of single-cluster thinking have become increasingly apparent, particularly for organizations processing terabytes of data daily or training large language models that require coordinated GPU resources across multiple availability zones.

The Multi-Cluster Scheduling Problem

Before .spec.managedBy, teams faced an awkward choice when building multi-cluster batch systems. They could disable the built-in Job controller entirely, but this created two significant problems. First, many organizations run Kubernetes on managed control planes from cloud providers where controller manager flags are locked down—you simply can't turn off built-in controllers. Second, most clusters need to operate in a hybrid mode: dispatching heavy workloads remotely while still running smaller administrative jobs locally.

The MultiKueue architecture illustrates why this matters. It separates a Management Cluster—which accepts job submissions and tracks status—from Worker Clusters that execute the actual pods. Users interact with the Management Cluster and see live status updates, but the compute happens elsewhere. Without a clean delegation mechanism, the Management Cluster's Job controller would try to create pods that should never exist there, while the Worker Clusters would lack visibility into the overall job lifecycle.

This architectural pattern isn't unique to MultiKueue. It reflects a broader shift in how organizations think about Kubernetes infrastructure: not as isolated clusters, but as a fabric of compute resources that can be dynamically allocated based on workload characteristics. Financial services firms use this pattern to route compute-intensive risk calculations to clusters with specialized hardware, while media companies distribute video transcoding jobs across regions to minimize data transfer costs. The pattern has become so common that the lack of native support was creating significant friction in production environments.

How Delegation Actually Works

The .spec.managedBy field operates on a simple principle: when set to anything other than the reserved value kubernetes.io/job-controller, the built-in controller steps aside completely. An external controller—like MultiKueue—can then implement its own reconciliation logic. In MultiKueue's case, this means copying status from mirror jobs running on Worker Clusters back to the Management Cluster, creating the illusion of a single unified job that happens to execute remotely.

The field is immutable by design, preventing a running job from being transferred between controllers mid-execution. This constraint eliminates an entire class of potential bugs around orphaned pods and resource leaks. Once a job is created with a specific managedBy value, that ownership relationship is permanent for the lifetime of that job object. This design decision prioritizes operational safety over flexibility, reflecting lessons learned from years of production Kubernetes deployments where ownership ambiguity has caused countless incidents.

The implementation also includes careful validation logic. The API server rejects any attempt to create a job with managedBy set to the reserved controller name, preventing accidental misconfiguration. External controllers must use their own unique identifiers, typically following the reverse-DNS naming convention like kueue.x-k8s.io/multikueue. This namespacing ensures that different multi-cluster systems can coexist without conflicts, allowing organizations to experiment with multiple approaches or migrate between solutions over time.

Implications for the Kubernetes Ecosystem

The graduation of .spec.managedBy to GA status signals a significant maturation of Kubernetes as a multi-cluster platform. This feature legitimizes patterns that were previously implemented through workarounds and custom resource definitions, providing a standardized foundation for the next generation of batch processing systems. Projects beyond MultiKueue—including Volcano, Armada, and various proprietary scheduling systems—can now build on this primitive rather than fighting against the built-in controller.

For platform engineering teams, this feature reduces the operational complexity of running sophisticated batch systems. Instead of maintaining forks of Kubernetes or deploying complex admission webhooks to intercept job creation, teams can now implement clean controller delegation with native API support. This translates directly to reduced maintenance burden and faster time-to-production for multi-cluster initiatives.

The feature also has implications for cloud providers and managed Kubernetes offerings. Providers can now safely expose multi-cluster batch capabilities without requiring customers to sacrifice the simplicity of managed control planes. This could accelerate the adoption of federated computing models where workload placement becomes as dynamic as pod scheduling is today within a single cluster.

Looking Forward

The .spec.managedBy field represents more than a technical enhancement—it's an acknowledgment that Kubernetes has evolved beyond its origins as a single-cluster orchestrator. As organizations continue to scale their infrastructure across regions, clouds, and edge locations, features like this provide the foundation for treating distributed clusters as a unified compute platform. The simplicity of the implementation belies its significance: by getting out of the way at the right moment, Kubernetes enables a new class of sophisticated workload management systems that would have been architecturally awkward just a few releases ago. For teams running batch workloads at scale, this small field opens up possibilities that were previously accessible only to organizations with the resources to build and maintain complex custom solutions.