After more than six years of development, Kubernetes 1.35 brings a capability that fundamentally changes how resource management works in production clusters. In-Place Pod Resize, which allows you to adjust CPU and memory allocations without restarting containers, has reached stable status. For teams running stateful applications, machine learning workloads, or any service where downtime carries real costs, this represents a shift from workarounds to native support.
The feature's journey from alpha in version 1.27 to general availability reflects both its technical complexity and the careful validation required for production readiness. What makes this milestone significant isn't just the feature itself, but what it enables: truly dynamic resource management that responds to actual workload behavior rather than static predictions made at deployment time.
The Resource Immutability Problem
Kubernetes has always treated container resource specifications as immutable once a Pod starts running. If your application needed more memory or CPU, the only option was to delete the Pod and create a new one with updated resource requests. This approach made sense from an architectural purity standpoint—Pods were designed as ephemeral units—but it created friction for real-world operations.
Consider a database Pod that experiences increased load during business hours. Previously, scaling it vertically meant terminating the Pod, losing any in-memory state, and waiting for a new instance to initialize and warm up. For stateful services, this wasn't just inconvenient; it often meant choosing between suboptimal resource allocation and service disruption. Teams compensated by over-provisioning resources, leading to waste, or by building complex external orchestration to manage these transitions.
The cost of this limitation extended beyond individual applications. Cluster-wide resource efficiency suffered because workloads couldn't adapt to changing conditions without disruption. Autoscaling solutions like Vertical Pod Autoscaler existed but were hampered by the need to recreate Pods, making them unsuitable for many production scenarios.
How In-Place Resize Actually Works
The implementation introduces a distinction between desired and actual resources. When you update a Pod's resource specifications, you're now modifying the spec.containers[*].resources field, which represents what you want. The status.containerStatuses[*].resources field shows what's currently allocated. This separation allows Kubernetes to manage the transition between states.
The process uses a new resize subresource that handles the update request. The kubelet evaluates whether the node has capacity for the change and whether the container runtime can apply it without a restart. For CPU adjustments, this typically happens seamlessly. Memory changes are more nuanced—increasing memory allocation is straightforward, but decreasing it requires checking current usage to avoid triggering out-of-memory kills.
Not all resize requests complete immediately. If a node lacks available resources, the request enters a "Deferred" state. The kubelet now prioritizes these deferred requests based on PriorityClass, Quality of Service class, and how long the request has been waiting. This prevents lower-priority workloads from blocking critical services.
Practical Applications Beyond Basic Scaling
The most obvious use case is reactive scaling—adjusting resources when monitoring shows a workload needs more capacity. But the more interesting applications involve proactive patterns that weren't feasible before.
CPU Startup Boost exemplifies this. Many applications, particularly those using JVM or other runtime compilation, consume significantly more CPU during initialization than during steady-state operation. With in-place resize, you can allocate generous CPU resources for startup, then automatically scale back once the application is warm. This improves startup time without permanently reserving resources you don't need.
Game servers present another compelling scenario. Player count fluctuates throughout the day, and each connected player increases memory and CPU requirements. Traditional approaches either over-provision for peak load or accept degraded performance during busy periods. In-place resize allows the server to scale resources dynamically as players join and leave, optimizing both cost and experience.
Pre-warmed worker pools benefit similarly. You can maintain a set of initialized workers at minimal resource allocation, then inflate them to full capacity when requests arrive. This combines the responsiveness of pre-warming with the efficiency of on-demand scaling.
Integration with Vertical Pod Autoscaler
Vertical Pod Autoscaler has supported in-place updates through its InPlaceOrRecreate mode, which graduated to beta alongside the core feature. This mode attempts in-place resize first and falls back to Pod recreation only when necessary. The result is autoscaling that actually works for stateful workloads.
VPA analyzes historical resource usage and adjusts requests and limits to match actual needs. Previously, applying these recommendations meant disruption. Now, VPA can continuously optimize resource allocation without impacting service availability. For clusters running hundreds or thousands of Pods, this translates to substantial efficiency gains—you're no longer choosing between optimal resource allocation and stability.
The roadmap includes full support for InPlace mode, which will never fall back to recreation, and CPU Startup Boost integration. These enhancements will make VPA suitable for even more workload types, particularly those where any restart is unacceptable.
What Changed from Beta to Stable
The path to stability involved addressing edge cases and improving operational safety. Memory limit decreases were initially prohibited because reducing available memory could trigger out-of-memory kills if current usage exceeded the new limit. The stable release permits these decreases but implements a best-effort check—the kubelet verifies current usage is below the new limit before applying the change.
This check isn't guaranteed to prevent OOM kills because memory usage can spike between the check and the actual limit change. Future improvements will move this validation into the container runtime itself for better safety. For now, operators should treat memory limit decreases cautiously, particularly for workloads with unpredictable memory patterns.
Observability improvements include new kubelet metrics and Pod events specific to resize operations. These additions address a common operational challenge: understanding why a resize succeeded, failed, or was deferred. You can now track resize requests through their lifecycle and correlate resource changes with application behavior.
Alpha support for Pod-level resources appeared in version 1.35, behind its own feature gate. This extends in-place resize beyond individual containers to resources shared across all containers in a Pod, though production use should wait for beta graduation.
Current Limitations and Workarounds
Several restrictions remain. In-place resize doesn't work with swap, the static CPU Manager, or the static Memory Manager. These features rely on fixed resource allocations determined at Pod creation, which conflicts with dynamic resizing. If your workloads depend on these capabilities, you'll need to continue using Pod recreation for resource changes.
Only CPU and memory support in-place modification. Other resource types like ephemeral storage or extended resources remain immutable. This limitation reflects both technical complexity and the need to validate demand for additional resource types before expanding the feature.
Known race conditions exist between the kubelet and scheduler regarding resize operations. These can cause scheduling decisions based on outdated resource information. The Kubernetes team is actively working on resolution, but operators should be aware that edge cases may occur in high-churn environments. Monitoring resize events and node capacity will help identify when these races impact your clusters.
Ecosystem Integration and Future Direction
The Ray autoscaler for machine learning workloads plans to leverage in-place resize for better resource efficiency. ML training jobs often have distinct phases with different resource requirements—data loading, training, and evaluation each have unique profiles. Dynamic resizing allows these workloads to adapt their resource footprint to the current phase without interruption.
Agent-sandbox is exploring "soft-pause" functionality using in-place resize to improve latency for serverless workloads. The concept involves scaling resources down to near-zero for idle functions while maintaining the container, then rapidly scaling up when requests arrive. This could provide faster cold-start times than traditional approaches.
Runtime support remains a challenge. Java and Python runtimes don't currently support memory resizing without restart, limiting the benefit for applications using these languages. Discussions with the OpenJDK team are underway to address this at the runtime level. Until then, memory resizes for JVM-based applications will still require container restarts, though CPU adjustments work seamlessly.
Planned enhancements include workload preemption policies. If a high-priority Pod needs resources that aren't available, Kubernetes could automatically evict lower-priority workloads or trigger node scaling. This would make in-place resize more robust in resource-constrained environments, ensuring critical workloads can always obtain the resources they need.
Adoption Considerations
Moving to in-place resize requires evaluating your workloads and monitoring setup. Applications that maintain significant in-memory state benefit most—databases, caches, and stateful services where restart costs are high. Stateless applications that restart quickly may see less benefit, though they still gain from reduced scheduling overhead and faster scaling response.
Your monitoring and alerting should account for dynamic resource allocation. Traditional approaches that alert on resource requests versus usage may need adjustment when requests change automatically. Focus on actual resource consumption and application performance metrics rather than static thresholds based on initial allocations.
Container images should be tested with varying resource allocations. Some applications make assumptions about available resources at startup that may not hold if resources change during runtime. Verify that your applications handle resource changes gracefully, particularly memory decreases.
For teams using GitOps or declarative configuration management, in-place resize introduces a new consideration: the desired state in your repository may differ from the actual state in the cluster if autoscalers are making adjustments. Establish clear policies about which resources are managed declaratively and which are controlled by autoscalers to avoid conflicts.
The stable status in Kubernetes 1.35 means in-place resize is ready for production use, but adoption should be incremental. Start with non-critical workloads to build operational experience, then expand to more sensitive applications as you validate behavior in your specific environment. The six-year development cycle reflects the feature's complexity—respect that complexity with thoughtful rollout planning.