Workload autoscaling
Workload autoscaling lets Instabase autoscale data services based on demand. Autoscaling optimizes service resources to maximize efficiency and performance for any workload at a given time. Workload autoscaling also removes the need to manually size services and presents cost saving opportunities.
Autoscaling is performed with Kubernetes HorizontalPodAutoscalers (HPAs) based on CPU usage for conversion-service, ocr-msft-lite, ocr-msft-v3, and ocr-service.
See the infrastructure requirements documentation for information on the required Kubernetes components.
Enable workload autoscaling
You can enable workload autoscaling during or after an upgrade or installation.
Workload autoscaling is in public preview and is disabled by default. If after enabling workload autoscaling you notice performance issues or resourcing constraints, contact Instabase Support.
Enable autoscaling during an upgrade
To enable autoscaling during an upgrade:
-
Before upgrading, edit the
control-plane.ymlfile in the release you’re upgrading to and enable theENABLE_AUTOSCALINGandIS_CUSTOMER_HOSTED_AUTOSCALINGenvironment variables:-
Unzip the
installation.zipfile for the release you’re upgrading to. -
On the command line, navigate to and open the
control-plane.ymlfile (installation/control-plane/control-plane.yml). -
Change the value of the
ENABLE_AUTOSCALINGenvironment variable toTrue. -
Change the value of the
IS_CUSTOMER_HOSTED_AUTOSCALINGenvironment variable toTrue. -
Save your changes. You apply this updated
control-plane.ymlfile when updating Deployment Manager at the start of the upgrade or installation process.
-
-
Add the base configurations that enable autoscaling to the release’s
base_configs.zipfile:-
Unzip the
base_configs.zipfile contained within the release’sinstallationfolder. -
Locate the
autoscalingfolder in theinstallationfolder (installation>additional_configs>autoscaling). -
Move the config files in the
autoscalingfolder to the unzippedbase_configsfolder. -
Select all files in the
base_configsfolder and compress them, creating a new .zip file of base configs. -
Rename the file
base_configs.zip. This updated .zip file is what you upload during the upgrade.
-
-
(Optional) If your deployment uses custom resourcing sizing or if you didn’t select a resource sizing option during the upgrade, create patches that define the
minReplicasandmaxReplicasvalues for each autoscaled service’s corresponding HPA service. For example, a patch targetingautoscaler-conversion-servicesets the autoscaling range forconversion-service. The required steps are:-
Calculate the
minReplicasvalues for all autoscaled services’ corresponding HPA services. -
In the release’s
installationfile, locate thecustom-hpa-patchesfolder (installation>optional_patches>custom-hpa-patches). This folder contains patches to configureminReplicasandmaxReplicasvalues for each HPA service. Or, reference the sample patch in this article. -
Edit each HPA service patch to define the
minReplicasandmaxReplicasvalues. Use your calculatedminReplicasvalues and define themaxReplicasvalues based on your preferred resourcing sizing.
-
-
Update the
default_patches.zipfile to include the following patches:-
All patches contained in the
enable-autoscalingfolder (installation>optional_patches>enable-autoscaling). -
(Optional) If using custom or undefined resource sizing, any patches used to manually define an HPA service’s
minReplicasandmaxReplicasvalues.
To update the
default_patches.zip file-
Unzip the
default_patches.zipfile contained within the release’sinstallationfolder. -
Add the
enable-autoscalingpatches (and optional edited HPA service patches) to the now unzippeddefault_patchesfolder. -
Select all files in the
default_patchesfolder and compress them, creating a new .zip file of patches. -
Rename the file
default_patches.zip. This updated .zip file is what you upload during the upgrade.
-
-
During the upgrade, upload the updated
base_configs.zipfile and thedefault_patches.zipfile. Thedefault_patches.zipandbase_configs.zipfiles contain all patches and configurations required to configure autoscaling in your deployment.InfoYou must upload the
default_patches.zipfile during the upgrade even if you didn’t add custom patches to it.
If you selected a resource sizing option during the upgrade process and have previously decommissioned (set replicas to 0) conversion-service, ocr-msft-lite, ocr-msft-v3, or ocr-service, after the upgrade you must reset the service’s decommissioned state. Selecting a resource sizing option automatically updates a service’s replicas count to a non-zero value. You can reset the service’s replicas count using a patch or with the following kubectl command: kubectl scale --replicas=0 <name of decommissioned service> -n $IB_NS, where $IB_NS is your Instabase namespace.
Enable autoscaling during an installation
Follow the same steps as enabling autoscaling during an upgrade. However, if your deployment uses custom resourcing sizing, you don’t need to create patches that define the minReplicas and maxReplicas values for each autoscaled service’s corresponding HPA service.
Enable autoscaling outside of an upgrade or installation
If your deployment is already on release 23.07 or later, you can enable autoscaling at any time:
-
Enable the
ENABLE_AUTOSCALINGenvironment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS, where$IB_NSis your Instabase namespace. -
Locate the
ENABLE_AUTOSCALINGenvironment variable. -
Set the value to
True. -
Save your changes.
-
-
Enable the
IS_CUSTOMER_HOSTED_AUTOSCALINGenvironment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS, where$IB_NSis your Instabase namespace. -
Locate the
IS_CUSTOMER_HOSTED_AUTOSCALINGenvironment variable. -
Set the value to
True. -
Save your changes.
-
-
From the Deployment Manager Base Configs tab, update your deployment’s base configs to include the base configs required for workload autoscaling. (If you already included the workload autoscaling base configs when upgrading or installing, you can skip this step.)
-
Unzip the
base_configs.zipfile contained within the release’sinstallationfolder. -
Locate the
autoscalingfolder in theinstallationfolder (installation>additional_configs>autoscaling). -
Move the config files in the
autoscalingfolder to the now unzippedbase_configsfolder. -
Select all files in the
base_configsfolder and compress them, creating a new .zip file of base configs. -
Rename the file
base_configs.zip. -
From the Deployment Manager Base Configs tab, update your base configs.
-
-
From the Deployment Manager Configs tab, apply the patches required to enable workload autoscaling. All required patches are in the release’s
enable_autoscalingpatches folder (installation>optional_patches>enable-autoscaling).
The enable-autoscaling patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:
# This patch will associate the HPA with the deployment allowing the HPA to autoscale the deployment's replica count
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: <name of deployment service>
You can verify that all HPAs are working and have the desired minReplicas and maxReplicas values from the HPAs tab of the Infra Dashboard. If needed, you can adjust the minimum and maximum replica count.
Configure workload autoscaling
For deployments using Instabase standard resourcing sizing, Deployment Manager automatically determines and applies appropriate minReplicas and maxReplicas values for all autoscaled services. For deployments using custom resourcing sizing, however, you must set the minReplicas and maxReplicas values for the HPA services corresponding to the following autoscaled deployment services:
| Deployment service | Corresponding HPA service |
|---|---|
| ocr-msft-v3 | autoscaler-ocr-msft-v3 |
| ocr-msft-lite | autoscaler-ocr-msft-lite |
| ocr-service | autoscaler-ocr-service |
| conversion-service | autoscaler-conversion-service |
While the maxReplicas value can be set based on your preferred resourcing sizing, the minReplicas value must be calculated based on the number of celery-app-tasks pods in the deployment.
To adjust a service’s minReplicas and maxReplicas values, apply the following patch to each HPA service.
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
maxReplicas: <max replicas>
minReplicas: <min replicas>
You can find sample patches for modifying HPA services in the release’s installation.zip file, in the custom-hpa-patches folder (installation > optional_patches > custom-hpa-patches). If you use these sample patches, you must still define the minReplicas and maxReplicas values.
Calculate HPA minReplicas for autoscaled services
To calculate minReplicas for an HPA service, use the following formulas, where n is the number of celery-app-tasks pods in your deployment:
The ceil() function returns the smallest integer value that’s greater than or equal to the calculated number.
| Deployment service | Corresponding HPA service | minReplicas formula |
|---|---|---|
| ocr-msft-v3 | autoscaler-ocr-msft-v3 | ceil(0.28 * n) |
| ocr-msft-lite | autoscaler-ocr-msft-lite | ceil(0.57 * n) |
| ocr-service | autoscaler-ocr-service | ceil(0.28 * n) |
| conversion-service | autoscaler-conversion-service | ceil(0.28 * n) |
Disable autoscaling
You can disable autoscaling at any time. The process differs slightly based on whether your deployment uses custom resourcing sizing or standard Instabase resourcing sizing.
Deployments with standard resourcing sizing
To disable autoscaling in deployments using standard Instabase resourcing sizing:
-
Disable the
ENABLE_AUTOSCALINGenvironment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS, where$IB_NSis your Instabase namespace. -
Locate the
ENABLE_AUTOSCALINGenvironment variable. -
Set the value to
False. -
Save your changes.
-
-
Call the Push latest materialized configs to cluster API to redeploy the deployment with autoscaling disabled.
-
Call the Update cluster size API to reset your resourcing sizing.
-
From the Deployment Manager Configs tab, apply the patches required to disable workload autoscaling. All required patches are in the release’s
disable_autoscalingpatches folder (installation>optional_patches>disable-autoscaling). These patches dissociate each previously autoscaled service from its corresponding HPA.
The disable-autoscaling patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:
# This patch will disassociate the HPA with the deployment stopping the HPA from autoscaling the deployment's replica count
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: disassociated
Deployments with custom resourcing sizing
To disable autoscaling in deployments using custom resourcing sizing:
-
Disable the
ENABLE_AUTOSCALINGenvironment variable:-
On the command line, run the following command:
kubectl edit deployment/deployment-control-plane -n $IB_NS -
Locate the
ENABLE_AUTOSCALINGenvironment variable. -
Set the value to
False. -
Save your changes.
-
-
Call the Push latest materialized configs to cluster API to redeploy the deployment with autoscaling disabled.
-
From the Deployment Manager Configs tab, apply the patches required to disable workload autoscaling. All required patches are in the release’s
disable_autoscalingpatches folder (installation>optional_patches>disable-autoscaling). These patches dissociate each previously autoscaled service from its corresponding HPA.
The disable-autoscaling patches folder isn’t present in the 23.07 release bundle. You can find the folder in the 23.10 release bundle, or create your own patches for each service, using the following patch template:
# This patch will disassociate the HPA with the deployment stopping the HPA from autoscaling the deployment's replica count
# target: <name of HPA service>
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: disassociated
Verifying autoscaling configuration changes
You can verify that patches targeting HPAs have applied successfully using the HPAs tab of the Deployment Manager infra dashboard.
To confirm that an HPA’s replica count has updated successfully:
-
Open the Deployment Manager HPAs tab (All apps > Deployment Manager > Infra Dashboard > HPAs).
-
On the Horizontal Pod Autoscalers, select the updated HPA.
-
Confirm that the General Info section lists the correct Min Replicas and Max Replicas values.
To confirm that an HPA is active:
-
Open the Deployment Manager HPAs tab (All apps > Deployment Manager > Infra Dashboard > HPAs).
-
On the Horizontal Pod Autoscalers dashboard, select the updated HPA.
-
Verify that the Conditions table includes an AbleToScale condition. This condition means that CPU metrics are available and the HPA is active.
Enabling autoscaling controllers
Instabase offers several autoscaling controllers that can help optimize infrastructure costs during periods of low activity. HPA based autoscaling is not required for these controllers to run.
To enable autoscaling controllers, set the environment variable ENABLE_AUTOSCALING_CONTROLLERS to "true" on deployment-control-plane.
Binary Autoscaler
The binary autoscaling controller scales non-HPA-based resources to 0 replicas when idle, and to the desired number of replicas when the service is needed. This controller is useful for GPU-based services such as deployment-ray-model-training-worker
as, when combined with a node autoscaler, the controller removes the GPU node from the environment when no model training is in progress.
To enable the binary autoscaler for deployment-ray-model-training-worker:
-
Set the following environment variables on
deployment-control-plane:-
ENABLE_BINARY_AUTOSCALING:"true" -
BINARY_AUTOSCALING_DEPLOYMENT_NAMES:"deployment-ray-model-training-worker"
-
-
Using Deployment Manager, apply the following patch to
deployment-ray-model-training-worker:
# target: deployment-ray-model-training-worker
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-ray-model-training-worker
annotations:
autoscaling/enabled: "true"
autoscaling/max_replicas: "1"
autoscaling/queries: "max_over_time(clamp_min(sum(ray_tasks{State!~\"FINISHED|FAILED\"}[15s]), 0)[30m]) or vector(0)"
You can also use this controller to scale down deployment-celery-webdriver-tasks and deployment-celery-core-tasks.
To enable the binary autoscaler for deployment-celery-webdriver-tasks and deployment-celery-core-tasks:
These instructions assume you previously enabled the binary autoscaler for deployment-ray-model-training-worker, including setting the ENABLE_BINARY_AUTOSCALING variable to "true".
-
In
deployment-control-plane, update theBINARY_AUTOSCALING_DEPLOYMENT_NAMESvariable to"deployment-ray-model-training-worker,deployment-celery-core-tasks,deployment-celery-webdriver-tasks". -
Using Deployment Manager, apply the following patch to
deployment-celery-core-tasks:
# target: deployment-celery-core-tasks
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-celery-core-tasks
annotations:
autoscaling/enabled: "true"
autoscaling/max_replicas: "{SET_TO_CURRENT_REPLICA_COUNT}" // ADJUST THIS
autoscaling/queries: max_over_time(sum(rabbitmq_queue_messages_unacked{queue="celery-core-tasks"})[30m])&max_over_time(sum(rabbitmq_queue_messages_ready{queue="celery-core-tasks"})[30m])
- Apply the following patch to
deployment-celery-webdriver-tasks:
# target: deployment-celery-webdriver-tasks
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-celery-webdriver-tasks
annotations:
autoscaling/enabled: "true"
autoscaling/max_replicas: "{SET_TO_CURRENT_REPLICA_COUNT}" // ADJUST THIS
autoscaling/queries: max_over_time(sum(rabbitmq_queue_messages_unacked{queue="celery-webdriver-tasks"})[30m])&max_over_time(sum(rabbitmq_queue_messages_ready{queue="celery-webdriver-tasks"})[30m])
Scale down to zero controller
The scale down to zero controller scales down HPA-based resources to 0 replicas when idle, and to the desired number of replicas when the service is needed.
To enable the scale down to zero controller:
-
In
deployment-control-plane, set theENABLE_SCALE_DOWN_TO_ZERO_CONTROLLERenvironment variable to"true". -
Onboard the services
deployment-conversion-service,deployment-ocr-msft-lite,deployment-ocr-msft-v3, anddeployment-ocr-serviceby using Deployment Manager to apply the following patches:
- Apply to
deployment-conversion-service:
# target: deployment-conversion-service
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-conversion-service-27979"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-conversion-service-27979"}))[4dd]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-conversion-service-27979"}[30s]))'
- Apply to
deployment-ocr-msft-lite:
# target: deployment-ocr-msft-lite
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}))[4d]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-lite-25000"}[30s]))'
- Apply to
deployment-ocr-msft-v3:
# target: deployment-ocr-msft-v3
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}))[4d]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-msft-v3-25001"}[30s]))'
- Apply to
deployment-ocr-service:
# target: deployment-ocr-service
kind: Deployment
metadata:
annotations:
autoscaling/scale_down_to_zero_enabled: "true"
autoscaling/scale_down_to_zero_query: 'max_over_time(sum(rate(envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"(cluster-ocr-service-27068|cluster-ocr-service-27090)"}))[4d]) or vector(0)&max_over_time(sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"cluster-ocr-service-27068|cluster-ocr-service-27090"}))[4d]) or vector(0)'
autoscaling/scale_down_to_zero_freshness_query: 'sum(rate(envoy_cluster_upstream_rq_active{envoy_cluster_name=~"(cluster-ocr-service-27068|cluster-ocr-service-27090)"}[30s]))'