← All Posts

Kubernetes 1.36 Workload-Aware Scheduling: Gang Scheduling and Resource Optimization for AI/ML Workloads

Matthias Bruns · · 7 min read
kubernetes scheduling ai-workloads resource-management

Kubernetes 1.36 marks a significant leap forward in workload-aware scheduling, building on the foundation laid in version 1.35. The new capabilities address critical gaps in how Kubernetes handles complex AI/ML workloads, distributed training jobs, and batch processing scenarios that require coordinated resource allocation. This isn’t just another incremental update—it’s a fundamental shift toward treating groups of related pods as first-class scheduling entities.

The traditional Kubernetes scheduler operates on individual pods, which creates problems for workloads that need multiple pods to start together or not at all. Gang scheduling solves this by ensuring that either all pods in a group get scheduled simultaneously, or none do. Combined with workload-aware preemption and opportunistic batching, these features transform how Kubernetes handles resource-intensive workloads.

Understanding Workload-Aware Scheduling

Kubernetes v1.35 introduced the foundational Workload API alongside basic gang scheduling support, but v1.36 takes this much further. The core concept revolves around treating related pods as a single scheduling unit rather than independent entities.

Traditional scheduling fails spectacularly with distributed workloads. Consider a 4-node distributed training job where each node needs 8GB of memory. Without workload-aware scheduling, the scheduler may place 3-of-4 ranks of training job A and leave the 4th pending forever because no node has capacity. The entire job becomes deadlocked, consuming resources but producing no useful work.

Workload-aware scheduling addresses this through three key mechanisms:

  • Gang scheduling ensures all-or-nothing pod admission
  • Workload-aware preemption treats pod groups as single entities for eviction decisions
  • Opportunistic batching efficiently processes identical pods together

Gang Scheduling: All-or-Nothing Resource Allocation

Gang scheduling implements the fundamental principle that certain workloads only make sense when all components can run simultaneously. The minCount field defines the quorum: at least that many pods must be schedulable together for the group to be admitted.

This is particularly crucial for AI/ML workloads where distributed training requires all worker nodes to be available. A partially scheduled job not only wastes resources but can also prevent other workloads from being scheduled due to resource fragmentation.

The gang scheduling implementation in Kubernetes 1.36 goes beyond simple all-or-nothing logic. It lets controllers, status reporting, future preemption behavior, and future workload-aware features reason about related pods even if those pods do not need strict all-or-nothing admission. This flexibility allows for more nuanced scheduling policies where some pods in a group might be optional while others are mandatory.

Workload-Aware Preemption

Traditional Kubernetes preemption operates at the pod level, which can create chaos for multi-pod workloads. KEP-5710 brings in “workload-aware preemption,” meaning that groups of related Pods (PodGroups) are now treated as a single entity for both scheduling and preemption.

This change prevents scenarios where the scheduler evicts some but not all pods from a distributed workload, leaving the remaining pods in a useless state. Instead of removing pods one by one, the scheduler now understands the relationships between pods and makes preemption decisions at the workload level.

For AI/ML workloads, this is transformative. When cluster resources become scarce, the scheduler can now intelligently choose between evicting an entire lower-priority distributed training job versus partially disrupting multiple jobs. This leads to better resource utilization and fewer failed training runs.

Resource Optimization Patterns

Kubernetes 1.36 introduces several patterns for optimizing resource allocation in complex workloads. The most significant is the ability to express resource requirements at the pod level rather than just the container level.

Pod.spec.resources field accepts cpu, memory, and hugepages-* only — extended resources stay container-scope. This allows you to define the resource envelope once for multi-container pods instead of repeating specifications across containers. Container-level fields override pod-level resources when set, providing flexibility for mixed workload patterns.

This is particularly useful for AI/ML workloads that often combine multiple containers: a training container, a data preprocessing sidecar, and monitoring agents. Instead of calculating and specifying resources for each container individually, you can define the total pod requirements and let Kubernetes handle the distribution.

Opportunistic Batching for Identical Workloads

Opportunistic batching efficiently processes identical Pods by recognizing when multiple pods have identical resource requirements and scheduling characteristics. This feature is particularly valuable for batch processing workloads where you might have hundreds of identical data processing jobs.

The scheduler can now group these identical pods and make scheduling decisions for the entire batch rather than evaluating each pod separately. This dramatically reduces scheduling latency for large-scale batch workloads and improves cluster efficiency by considering resource allocation patterns across similar workloads.

For machine learning inference workloads, this means faster deployment of model serving pods and more efficient resource packing. The scheduler understands that these pods are functionally identical and can optimize their placement accordingly.

Practical Implementation Strategies

When implementing workload-aware scheduling for AI/ML workloads, start by identifying which of your workloads truly require coordinated scheduling. Not every multi-pod application needs gang scheduling—web applications with multiple replicas, for example, typically benefit from gradual rollouts rather than all-or-nothing deployment.

Distributed training jobs are the obvious candidates, but consider other scenarios:

  • Multi-node inference pipelines where all stages must be available
  • Data processing workflows with strict dependency requirements
  • Batch jobs that require specific node configurations across multiple pods

For resource optimization, leverage the new pod-level resource specifications when you have multi-container pods with shared resource pools. This is common in AI/ML workloads where containers share GPU memory or large datasets mounted as volumes.

Configuration Best Practices

Configure workload-aware scheduling features gradually. Start with non-critical workloads to understand the behavior and impact on your cluster. The scheduling changes can affect cluster resource utilization patterns, so monitor carefully during initial deployment.

For gang scheduling, set realistic minCount values that reflect the actual requirements of your workloads. Setting the value too high prevents scheduling when some nodes are temporarily unavailable. Setting it too low defeats the purpose of coordinated scheduling.

When using workload-aware preemption, establish clear priority classes for different types of workloads. Interactive workloads might have higher priority than batch processing, but long-running training jobs might need protection from frequent preemption once they’ve started.

Monitoring and Troubleshooting

Workload-aware scheduling introduces new failure modes that require updated monitoring strategies. Traditional pod scheduling metrics don’t capture the complexity of group scheduling decisions. Monitor for scenarios where pod groups are partially scheduled but waiting for additional resources.

Pay attention to resource fragmentation patterns. Gang scheduling can sometimes lead to less efficient resource packing if not properly configured. Monitor cluster utilization to ensure that the coordination benefits outweigh any packing inefficiencies.

For AI/ML workloads specifically, track metrics around training job success rates and time-to-start. These workloads often have strict SLA requirements, and the new scheduling features should improve both metrics significantly.

Future Considerations

The workload-aware scheduling features in Kubernetes 1.36 represent the beginning of a larger transformation in how Kubernetes handles complex workloads. Future versions will likely expand these capabilities with more sophisticated resource coordination and cross-cluster scheduling awareness.

Consider how these features fit into your broader AI/ML infrastructure strategy. The improved scheduling capabilities enable more efficient use of expensive GPU resources and can reduce the need for dedicated training clusters in some scenarios.

As these features mature, expect to see ecosystem tools that leverage the new APIs for more intelligent workload placement and resource optimization. The foundation laid in 1.36 opens up possibilities for application-aware scheduling that goes far beyond what’s possible with traditional pod-level scheduling.

The workload-aware scheduling improvements in Kubernetes 1.36 address real pain points in running complex, resource-intensive workloads. For organizations running AI/ML workloads at scale, these features can significantly improve resource utilization and reduce the operational complexity of managing distributed training and inference workloads.

Reader settings

Font size