Workload autoscaling

Table of Contents

Workload autoscaling lets Instabase autoscale data services based on demand. Autoscaling optimizes service resources to maximize efficiency and performance for any workload at a given time. Workload autoscaling also removes the need to manually size services and presents cost saving opportunities.

Autoscaling is performed with Kubernetes HorizontalPodAutoscalers (HPAs) based on CPU usage for conversion-service, ocr-msft-lite, ocr-msft-v3, and ocr-service.

Info

See the infrastructure requirements documentation for information on the required Kubernetes components.

Enable workload autoscaling

You can enable workload autoscaling during or after an upgrade or installation.

Note

Workload autoscaling is in public preview and is disabled by default. If after enabling workload autoscaling you notice performance issues or resourcing constraints, contact Instabase Support.

Enable autoscaling during an upgrade

To enable autoscaling during an upgrade:

Before upgrading, edit the control-plane.yml file in the release you’re upgrading to and enable the ENABLE_AUTOSCALING and IS_CUSTOMER_HOSTED_AUTOSCALING environment variables:
1. Unzip the installation.zip file for the release you’re upgrading to.
2. On the command line, navigate to and open the control-plane.yml file (installation/control-plane/control-plane.yml).
3. Change the value of the ENABLE_AUTOSCALING environment variable to True.
4. Change the value of the IS_CUSTOMER_HOSTED_AUTOSCALING environment variable to True.
5. Save your changes. You apply this updated control-plane.yml file when updating Deployment Manager at the start of the upgrade or installation process.
Add the base configurations that enable autoscaling to the release’s base_configs.zip file:
1. Unzip the base_configs.zip file contained within the release’s installation folder.
2. Locate the autoscaling folder in the installation folder (installation > additional_configs > autoscaling).
3. Move the config files in the autoscaling folder to the unzipped base_configs folder.
4. Select all files in the base_configs folder and compress them, creating a new .zip file of base configs.
5. Rename the file base_configs.zip. This updated .zip file is what you upload during the upgrade.
(Optional) If your deployment uses custom resourcing sizing or if you didn’t select a resource sizing option during the upgrade, create patches that define the minReplicas and maxReplicas values for each autoscaled service’s corresponding HPA service. For example, a patch targeting autoscaler-conversion-service sets the autoscaling range for conversion-service. The required steps are:
1. Calculate the minReplicas values for all autoscaled services’ corresponding HPA services.
2. In the release’s installation file, locate the custom-hpa-patches folder (installation > optional_patches > custom-hpa-patches). This folder contains patches to configure minReplicas and maxReplicas values for each HPA service. Or, reference the sample patch in this article.
3. Edit each HPA service patch to define the minReplicas and maxReplicas values. Use your calculated minReplicas values and define the maxReplicas values based on your preferred resourcing sizing.
Update the default_patches.zip file to include the following patches:
- All patches contained in the enable-autoscaling folder (installation > optional_patches > enable-autoscaling).
- (Optional) If using custom or undefined resource sizing, any patches used to manually define an HPA service’s minReplicas and maxReplicas values.
To update the default_patches.zip file
1. Unzip the default_patches.zip file contained within the release’s installation folder.
2. Add the enable-autoscaling patches (and optional edited HPA service patches) to the now unzipped default_patches folder.
3. Select all files in the default_patches folder and compress them, creating a new .zip file of patches.
4. Rename the file default_patches.zip. This updated .zip file is what you upload during the upgrade.
During the upgrade, upload the updated base_configs.zip file and the default_patches.zip file. The default_patches.zip and base_configs.zip files contain all patches and configurations required to configure autoscaling in your deployment.

Info

You must upload the default_patches.zip file during the upgrade even if you didn’t add custom patches to it.

Note

If you selected a resource sizing option during the upgrade process and have previously decommissioned (set replicas to 0) conversion-service, ocr-msft-lite, ocr-msft-v3, or ocr-service, after the upgrade you must reset the service’s decommissioned state. Selecting a resource sizing option automatically updates a service’s replicas count to a non-zero value. You can reset the service’s replicas count using a patch or with the following kubectl command: kubectl scale --replicas=0 <name of decommissioned service> -n $IB_NS, where $IB_NS is your Instabase namespace.

Enable autoscaling during an installation

Follow the same steps as enabling autoscaling during an upgrade. However, if your deployment uses custom resourcing sizing, you don’t need to create patches that define the minReplicas and maxReplicas values for each autoscaled service’s corresponding HPA service.

Enable autoscaling outside of an upgrade or installation

If your deployment is already on release 23.07 or later, you can enable autoscaling at any time:

Enable the ENABLE_AUTOSCALING environment variable:
1. On the command line, run the following command: kubectl edit deployment/deployment-control-plane -n $IB_NS, where $IB_NS is your Instabase namespace.
2. Locate the ENABLE_AUTOSCALING environment variable.
3. Set the value to True.
4. Save your changes.
Enable the IS_CUSTOMER_HOSTED_AUTOSCALING environment variable:
1. On the command line, run the following command: kubectl edit deployment/deployment-control-plane -n $IB_NS, where $IB_NS is your Instabase namespace.
2. Locate the IS_CUSTOMER_HOSTED_AUTOSCALING environment variable.
3. Set the value to True.
4. Save your changes.
From the Deployment Manager Base Configs tab, update your deployment’s base configs to include the base configs required for workload autoscaling. (If you already included the workload autoscaling base configs when upgrading or installing, you can skip this step.)
1. Unzip the base_configs.zip file contained within the release’s installation folder.
2. Locate the autoscaling folder in the installation folder (installation > additional_configs > autoscaling).
3. Move the config files in the autoscaling folder to the now unzipped base_configs folder.
4. Select all files in the base_configs folder and compress them, creating a new .zip file of base configs.
5. Rename the file base_configs.zip.
6. From the Deployment Manager Base Configs tab, update your base configs.
From the Deployment Manager Configs tab, apply the patches required to enable workload autoscaling. All required patches are in the release’s enable_autoscaling patches folder (installation > optional_patches > enable-autoscaling).

Note

The enable-autoscaling patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:

# This patch will associate the HPA with the deployment allowing the HPA to autoscale the deployment's replica count

 # target: <name of HPA service>

 apiVersion: autoscaling/v2beta2
 kind: HorizontalPodAutoscaler
 spec:
   scaleTargetRef:
     apiVersion: apps/v1
     kind: Deployment
     name: <name of deployment service>

You can verify that all HPAs are working and have the desired minReplicas and maxReplicas values from the HPAs tab of the Infra Dashboard. If needed, you can adjust the minimum and maximum replica count.

Configure workload autoscaling

For deployments using Instabase standard resourcing sizing, Deployment Manager automatically determines and applies appropriate minReplicas and maxReplicas values for all autoscaled services. For deployments using custom resourcing sizing, however, you must set the minReplicas and maxReplicas values for the HPA services corresponding to the following autoscaled deployment services:

Deployment service	Corresponding HPA service
ocr-msft-v3	autoscaler-ocr-msft-v3
ocr-msft-lite	autoscaler-ocr-msft-lite
ocr-service	autoscaler-ocr-service
conversion-service	autoscaler-conversion-service

While the maxReplicas value can be set based on your preferred resourcing sizing, the minReplicas value must be calculated based on the number of celery-app-tasks pods in the deployment.

To adjust a service’s minReplicas and maxReplicas values, apply the following patch to each HPA service.

# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
  maxReplicas: <max replicas>
  minReplicas: <min replicas>

Tip

You can find sample patches for modifying HPA services in the release’s installation.zip file, in the custom-hpa-patches folder (installation > optional_patches > custom-hpa-patches). If you use these sample patches, you must still define the minReplicas and maxReplicas values.

Calculate HPA minReplicas for autoscaled services

To calculate minReplicas for an HPA service, use the following formulas, where n is the number of celery-app-tasks pods in your deployment:

Info

The ceil() function returns the smallest integer value that’s greater than or equal to the calculated number.

Deployment service	Corresponding HPA service	minReplicas formula
ocr-msft-v3	autoscaler-ocr-msft-v3	ceil(0.28 * n)
ocr-msft-lite	autoscaler-ocr-msft-lite	ceil(0.57 * n)
ocr-service	autoscaler-ocr-service	ceil(0.28 * n)
conversion-service	autoscaler-conversion-service	ceil(0.28 * n)

Disable autoscaling

You can disable autoscaling at any time. The process differs slightly based on whether your deployment uses custom resourcing sizing or standard Instabase resourcing sizing.

Deployments with standard resourcing sizing

To disable autoscaling in deployments using standard Instabase resourcing sizing:

Disable the ENABLE_AUTOSCALING environment variable:
1. On the command line, run the following command: kubectl edit deployment/deployment-control-plane -n $IB_NS, where $IB_NS is your Instabase namespace.
2. Locate the ENABLE_AUTOSCALING environment variable.
3. Set the value to False.
4. Save your changes.
Call the Push latest materialized configs to cluster API to redeploy the deployment with autoscaling disabled.
Call the Update cluster size API to reset your resourcing sizing.
From the Deployment Manager Configs tab, apply the patches required to disable workload autoscaling. All required patches are in the release’s disable_autoscaling patches folder (installation > optional_patches > disable-autoscaling). These patches dissociate each previously autoscaled service from its corresponding HPA.

Note

The disable-autoscaling patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:

 # This patch will disassociate the HPA with the deployment stopping the HPA from autoscaling the deployment's replica count

 # target: <name of HPA service>

 apiVersion: autoscaling/v2beta2
 kind: HorizontalPodAutoscaler
 spec:
   scaleTargetRef:
     apiVersion: apps/v1
     kind: Deployment
     name: disassociated

Deployments with custom resourcing sizing

To disable autoscaling in deployments using custom resourcing sizing:

Disable the ENABLE_AUTOSCALING environment variable:
1. On the command line, run the following command: kubectl edit deployment/deployment-control-plane -n $IB_NS
2. Locate the ENABLE_AUTOSCALING environment variable.
3. Set the value to False.
4. Save your changes.
Call the Push latest materialized configs to cluster API to redeploy the deployment with autoscaling disabled.
From the Deployment Manager Configs tab, apply the patches required to disable workload autoscaling. All required patches are in the release’s disable_autoscaling patches folder (installation > optional_patches > disable-autoscaling). These patches dissociate each previously autoscaled service from its corresponding HPA.

Note

 # This patch will disassociate the HPA with the deployment stopping the HPA from autoscaling the deployment's replica count

 # target: <name of HPA service>

 apiVersion: autoscaling/v2beta2
 kind: HorizontalPodAutoscaler
 spec:
   scaleTargetRef:
     apiVersion: apps/v1
     kind: Deployment
     name: disassociated

Verifying autoscaling configuration changes

You can verify that patches targeting HPAs have applied successfully using the HPAs tab of the Deployment Manager infra dashboard.

To confirm that an HPA’s replica count has updated successfully:

Open the Deployment Manager HPAs tab (All apps > Deployment Manager > Infra Dashboard > HPAs).
On the Horizontal Pod Autoscalers, select the updated HPA.
Confirm that the General Info section lists the correct Min Replicas and Max Replicas values.

To confirm that an HPA is active:

Open the Deployment Manager HPAs tab (All apps > Deployment Manager > Infra Dashboard > HPAs).
On the Horizontal Pod Autoscalers dashboard, select the updated HPA.
Verify that the Conditions table includes an AbleToScale condition. This condition means that CPU metrics are available and the HPA is active.

Enabling autoscaling controllers

Instabase offers several autoscaling controllers that can help optimize infrastructure costs during periods of low activity. HPA based autoscaling is not required for these controllers to run.

To enable autoscaling controllers, set the environment variable ENABLE_AUTOSCALING_CONTROLLERS to "true" on deployment-control-plane.

Binary Autoscaler

The binary autoscaling controller scales non-HPA-based resources to 0 replicas when idle, and to the desired number of replicas when the service is needed. This controller is useful for GPU-based services such as deployment-ray-model-training-worker as, when combined with a node autoscaler, the controller removes the GPU node from the environment when no model training is in progress.

To enable the binary autoscaler for deployment-ray-model-training-worker:

Set the following environment variables on deployment-control-plane:
- ENABLE_BINARY_AUTOSCALING: "true"
- BINARY_AUTOSCALING_DEPLOYMENT_NAMES: "deployment-ray-model-training-worker"
Using Deployment Manager, apply the following patch to deployment-ray-model-training-worker:

# target: deployment-ray-model-training-worker

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-ray-model-training-worker
  annotations:
    autoscaling/enabled: "true"
    autoscaling/max_replicas: "1"
    autoscaling/queries: "max_over_time(clamp_min(sum(ray_tasks{State!~\"FINISHED|FAILED\"}[15s]), 0)[30m]) or vector(0)"

You can also use this controller to scale down deployment-celery-webdriver-tasks and deployment-celery-core-tasks.

To enable the binary autoscaler for deployment-celery-webdriver-tasks and deployment-celery-core-tasks:

Note

These instructions assume you previously enabled the binary autoscaler for deployment-ray-model-training-worker, including setting the ENABLE_BINARY_AUTOSCALING variable to "true".

In deployment-control-plane, update the BINARY_AUTOSCALING_DEPLOYMENT_NAMES variable to "deployment-ray-model-training-worker,deployment-celery-core-tasks,deployment-celery-webdriver-tasks".
Using Deployment Manager, apply the following patch to deployment-celery-core-tasks:

# target: deployment-celery-core-tasks

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-celery-core-tasks
  annotations:
    autoscaling/enabled: "true"
    autoscaling/max_replicas: "{SET_TO_CURRENT_REPLICA_COUNT}" // ADJUST THIS
    autoscaling/queries: max_over_time(sum(rabbitmq_queue_messages_unacked{queue="celery-core-tasks"})[30m])&max_over_time(sum(rabbitmq_queue_messages_ready{queue="celery-core-tasks"})[30m])

Apply the following patch to deployment-celery-webdriver-tasks:

# target: deployment-celery-webdriver-tasks

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-celery-webdriver-tasks
  annotations:
    autoscaling/enabled: "true"
    autoscaling/max_replicas: "{SET_TO_CURRENT_REPLICA_COUNT}" // ADJUST THIS
    autoscaling/queries: max_over_time(sum(rabbitmq_queue_messages_unacked{queue="celery-webdriver-tasks"})[30m])&max_over_time(sum(rabbitmq_queue_messages_ready{queue="celery-webdriver-tasks"})[30m])

Scale down to zero controller

The scale down to zero controller scales down HPA-based resources to 0 replicas when idle, and to the desired number of replicas when the service is needed.

To enable the scale down to zero controller:

In deployment-control-plane, set the ENABLE_SCALE_DOWN_TO_ZERO_CONTROLLER environment variable to "true".
Onboard the services deployment-conversion-service, deployment-ocr-msft-lite, deployment-ocr-msft-v3, and deployment-ocr-service by using Deployment Manager to apply the following patches:

Apply to deployment-conversion-service:

# target: deployment-conversion-service

kind: Deployment
metadata:
  annotations:
    autoscaling/scale_down_to_zero_enabled: "true"
    autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-conversion-service-27979"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-conversion-service-27979"}))[4dd]) or vector(0)'
    autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-conversion-service-27979"}[30s]))'

Apply to deployment-ocr-msft-lite:

# target: deployment-ocr-msft-lite

kind: Deployment
metadata:
  annotations:
    autoscaling/scale_down_to_zero_enabled: "true"
    autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}))[4d]) or vector(0)'
    autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}[30s]))'

Apply to deployment-ocr-msft-v3:

# target: deployment-ocr-msft-v3

kind: Deployment
metadata:
  annotations:
    autoscaling/scale_down_to_zero_enabled: "true"
    autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}))[4d]) or vector(0)'
    autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}[30s]))'

Apply to deployment-ocr-service:

# target: deployment-ocr-service

kind: Deployment
metadata:
  annotations:
    autoscaling/scale_down_to_zero_enabled: "true"
    autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"(cluster-ocr-service-27068|cluster-ocr-service-27090)"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-service-27068|cluster-ocr-service-27090"}))[4d]) or vector(0)'
    autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"(cluster-ocr-service-27068|cluster-ocr-service-27090)"}[30s]))'