As resource requirements for our application change over time, we can encounter a few challenges when trying to provision an appropriate amount of CPU and memory. If we simply provision enough resources for our application to support the maximum expected capacity, we run the risk of wasting these resources during periods of low utilization. On the other hand, we also run the risk of the server failing to accommodate our user base if we under provision.
In order to solve this problem, we need the ability to automate the process of adding and removing compute resources to a workload in response to changing demand.
One of the major benefits of a container orchestration platform like Kubernetes is the ability to autoscale workloads based upon key metrics. There are many different features provided by Kubernetes in this category, although here we'll focus on Horizontal Pod Autoscaling (HPA).
HPA specifically refers to autoscaling by adding or removing Pods to/from a given Deployment. This can be in relation to hardware metrics like CPU and memory utilization, or even in relation to custom metrics defined by us and reported by our application. HPA relies upon the Kubernetes Metrics Server being installed, which serves as the source for container metrics. The autoscaler itself consists of a Kubernetes API resource (HorizontalPodAutoscaler) as well as a Controller.
In a typical scenario, HPA works something like this:
For a quick hands-on demo, we'll make use of k3d. K3d allows us to easily create k3s (a lightweight Kubernetes distribution suitable for IoT, local development, etc.) clusters within Docker containers. In order to follow along, you can follow the official documentation for installing k3d on your chosen platform.
First, let's create a cluster:
$ k3d cluster create hpa-demo
This should proceed relatively quickly, and after k3d has finished provisioning our cluster we can ensure we're connected with this command:
$ kubectl get node
---
NAME STATUS ROLES AGE VERSION
k3d-hpa-demo-server-0 Ready control-plane,master 2m22s v1.22.7+k3s1
k3d automatically installs the metrics server. If you're using another distribution of Kubernetes that does not install the metrics server by default, you can install it by following the official installation instructions. We can confirm the metrics server is running with this command:
$ kubectl get pods -l k8s-app=metrics-server -n kube-system
---
NAME READY STATUS RESTARTS AGE
metrics-server-ff9dbcb6c-6zrvc 1/1 Running 0 3m26s
We can also use the kubectl top command for tracking resource utilization of pods and nodes:
$ kubectl top node
---
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3d-hpa-demo-server-0 113m 2% 755Mi 9%
Now that we have the necessary infrastructure in place, let's create an example Deployment and Service.
$ kubectl create deployment nginx --image=nginx:stable-alpine --port=80 && \
kubectl set resources deployment/nginx -c=nginx --limits=cpu=200m,memory=128Mi
---
deployment.apps/nginx created
deployment.apps/nginx resource requirements updated
$ kubectl create service clusterip nginx --tcp=80
---
service/nginx created
$ kubectl get deployment,service
---
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 2m4s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 32m
service/nginx ClusterIP 10.43.53.169 <none> 80/TCP 61s
Once we have these resources in place, we can create an autoscaling policy:
$ kubectl autoscale deployment nginx --min=1 --max=5 --cpu-percent=60
---
horizontalpodautoscaler.autoscaling/nginx autoscaled
This policy tells the HPA controller to target a CPU utilization of 60% for each of our pods, while going no fewer than 1 replica and no greater than 5. We can check the status of our policy with:
$ kubectl get hpa
---
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/60% 1 5 1 2m59s
Let's then create another Deployment that will call our Service and generate some activity:
$ kubectl create deployment load-test --image=busybox --replicas 6 -- \
/bin/sh -c 'while true; do wget -qO- http://nginx.default.svc; done'
After a minute or two of waiting, we should see HPA start to take action and add more pods to the Deployment:
$ kubectl get events | grep Rescale
---
15m Normal SuccessfulRescale horizontalpodautoscaler/nginx New size: 2; reason: cpu resource utilization (percentage of request) above target
$ kubectl get pods -l app=nginx
---
NAME READY STATUS RESTARTS AGE
nginx-64d84d6958-w82mq 1/1 Running 0 31m
nginx-64d84d6958-hr222 1/1 Running 0 15m
Finally, let's remove the load-test Deployment:
$ kubectl delete deployment/load-test
---
deployment.apps "load-test" deleted
We should eventually see the Deployment scale back down:
$ kubectl get hpa nginx
---
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/60% 1 5 1 38m
$ kubectl get events | grep Rescale
---
27m Normal SuccessfulRescale horizontalpodautoscaler/nginx New size: 2; reason: cpu resource utilization (percentage of request) above target
4m9s Normal SuccessfulRescale horizontalpodautoscaler/nginx New size: 1; reason: All metrics below target
$ kubectl get pods -l app=nginx
---
NAME READY STATUS RESTARTS AGE
nginx-64d84d6958-w82mq 1/1 Running 0 43m
In addition to scaling on container resource metrics, we also have the option of scaling based on custom metrics. If you're already monitoring your infrastructure with Prometheus, you can utilize the Prometheus Adapter for exposing your Prometheus metrics as custom metrics for use with HPA policies. For the sake of brevity in this article, we'll assume that you already have Prometheus configured for your cluster. In order to install the Prometheus Adapter, you can follow the official instructions. Note that you can also use the Prometheus Adapter for reporting resource metrics in place of Kubernetes Metrics Server.
With both Prometheus and Prometheus Adapter added to our cluster, we have something like this:
As a practical example of how you might use this, at Minds we use PHP-FPM and have a custom metric defined for scaling based on the number of PHP processes being utilized relative to the maximum allowed processes we've specified per container.
The exporter we use for PHP-FPM exposes two metrics of interest: phpfpm_active_processes and phpfpm_total_processes. We can use PromQL to query Prometheus for the current utilization of the processes with something like this:
(sum(phpfpm_active_processes) / sum(phpfpm_total_processes)) * 100
With this query, we can add a custom rule to Prometheus Adapter in order to execute this query on our behalf and present the result as a custom metric for use in our autoscaling policies (see the official documentation for more details on the rule configuration syntax for Prometheus Adapter):
rules:
- seriesQuery: 'phpfpm_total_processes'
resources:
template: '<<.Resource>>'
overrides:
kubernetes_namespace: { resource = 'namespace' }
app_minds_io_name: { group = 'apps', resource = 'deployment' }
name:
matches: 'phpfpm_total_processes'
as: 'engine_process_utilization'
metricsQuery: '(sum(phpfpm_active_processes{<<.LabelMatchers>>}) by (<<.GroupBy>>) / sum(phpfpm_total_processes{<<.LabelMatchers>>}) by (<<.GroupBy>>)) * 100'
We can confirm that our custom metric is available for use by asking the custom metrics API:
$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq '.resources[] | select(.name == "deployments.apps/engine_process_utilization")'
---
{
"name": "deployments.apps/engine_process_utilization",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
Now that we have a custom metric in place, we can use it in our HPA policies:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: example-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: engine_process_utilization # Custom metric name
target:
type: AverageValue
averageValue: 50 # 50% utilization
With HPA in place, the service should now be more resilient in the face of heavy utilization as well as less likely to waste resources during times of lower utilization. HPA is a powerful tool, especially when used in combination with custom metrics that allow us to be very detailed about the conditions under which we autoscale.
Of course, this is only a high-level overview of HPA and how it can be used with Prometheus Adapter. If this interests you, I encourage your to explore the official documentation for these projects that I've linked throughout the blog.