Skip to main content

Cluster Management

This section describes the process of deploying a Kubernetes cluster via Astro platform on your AWS, GCP, or Azure account.

Introduction

There are three ways to manage Kubernetes clusters with the Astro platform:

  • Self-Hosted K8s Clusters: Vanilla Kubernetes clusters that Astro platform will manage on your behalf.
  • Managed K8s Clusters: Kubernetes clusters deployed and managed by Astro platform using the cloud provider's managed service for the control plane.
  • Bring Your Own Kubernetes Cluster (BYOC): Existing Kubernetes clusters you bring to the platform. See Bring Your Own Kubernetes Cluster.

More specifically, you can choose between the following cloud providers:

  • AWS: Deploy on your own VPC or use the VPC created by Astro on your account.
  • GCP: Deploy on your own VPC or use the VPC created by Astro on your account.
  • Azure: Deploy on your own VNet or use the VNet created by Astro on your subscription.

Self-Hosted Clusters

This is vanilla kubernetes cluster that astro platform will manage on your behalf.

AWS

This section describes the process of deploying a Kubernetes cluster on your AWS account.

API Specification
Cluster API Specification
clusterName: aws-k8s-dev # name of the cluster
provider: aws # provider of the cluster
region: us-west-2 # region of the cluster
provisioner:
type: selfHosted # type of the provisioner
selfHosted:
accountId: <your-aws-account-id> # your aws account id
# networkId: vpc-018f2dec8bb5ddfdd # bring your own VPC
bucketName: <bucket-name> # bucket name to store the cluster configuration
credentials: # credentials stored in platform vault via 'astroctl cloud aws selfHosted setup'
type: vault
clusterSpec:
dataPlane: # specification of the data plane
nodeGroups: # specification of the node groups
- name: test-mg # name of the node group
minNode: 3 # minimum number of nodes, 3 is good default given we use three availability zones
maxNode: 6 # maximum number of nodes
machineTypes: # machine types of the nodes
- t3.large # machine type of the nodes
labels: # labels of the nodes
foo: "bar"
controlPlane: # specification of the control plane
nodeGroup:
name: control-plane
machineTypes:
- t3.medium
Prerequisites

Make sure to follow the prerequisites before you start.

For more information about the cluster specification, see

Operations

GCP

This section describes the process of deploying a Kubernetes cluster on your GCP account.

API Specification
Cluster API Specification
clusterName: foo-cluster # name of the cluster
provider: gcp # provider of the cluster
region: us-west2 # region of the cluster
provisioner:
type: selfHosted # type of the provisioner
selfHosted:
# this is the gcp project id
accountId: xxxx # your gcp project id
# networkId: vpc-018f2dec8bb5ddfdd
bucketName: <bucket-name> # bucket name to store the cluster configuration
credentials: # credentials stored in platform vault via 'astroctl cloud gcp selfHosted setup'
type: vault
clusterSpec:
dataPlane: # specification of the data plane
nodeGroups: # specification of the node groups
- name: test-mg # name of the node group
minNode: 3 # minimum number of nodes, 3 is good default given we use three availability zones
maxNode: 6 # maximum number of nodes
instanceType: spot # ondemand or spot
machineTypes:
- e2-medium # machine type of the nodes
labels: # labels of the nodes
hello: "test"
controlPlane: # specification of the control plane
nodeGroup: # specification of the node group
name: control-plane # name of the node group
machineTypes: # machine type of the nodes
- e2-medium # machine type of the nodes\

For more information about the cluster specification, see

Operations

Managed Clusters

This is the version of Kubernetes cluster that is deployed and managed by Astro platform but uses the cloud provider's managed service for the control plane.

EKS

This section describes the process of deploying a Kubernetes cluster on your AWS account.

API Specification
Cluster API Specification
clusterName: <cluster-name> # name of the cluster
provider: aws # provider of the cluster
region: us-west-2 # region of the cluster
provisioner:
type: eks # type of the provisioner
eks:
accountId: <your-aws-account-id> # your aws account id
credentials: # credentials to access the cluster
type: dynamic # type of the credentials
# approvalWorkflow is set server-side based on your org role.
# Org owners provision immediately; admins require owner approval.
# Do not include this field in your request — it is overridden by the backend.
clusterSpec: # specification of the cluster
dataPlane: # specification of the data plane
nodeGroups: # specification of the node groups
- name: test-node-group
minNode: 3 # minimum number of nodes, 3 is good default given we use three availability zones
maxNode: 6 # maximum number of nodes
machineTypes: # machine types of the nodes
- t3.medium
labels: # labels of the nodes
hello: test
Prerequisites

Make sure to follow the prerequisites before you start.

Please refer to the following documents for more information about the cluster specification:

Operations

GKE

This section describes the process of deploying a Google Kubernetes Engine (GKE) cluster on your GCP project.

API Specification
Cluster API Specification
apiVersion: platform.astropulse.io/v1
kind: K8sCluster
spec:
clusterName: my-gke-cluster # name of the cluster
provider: gcp # provider of the cluster
region: us-central1 # region of the cluster
provisioner:
type: gke # type of the provisioner
gke:
projectId: <your-gcp-project-id> # your GCP project id
credentials: # credentials to access the cluster
type: dynamic # dynamic credentials
clusterSpec: # specification of the cluster
dataPlane: # specification of the data plane
nodeGroups: # specification of the node groups
- name: default-pool
minNode: 1 # minimum number of nodes
maxNode: 3 # maximum number of nodes
instanceType: ondemand # ondemand or spot
machineTypes: # machine types of the nodes
- e2-medium
labels: # labels of the nodes
environment: production
Prerequisites

Make sure to follow the prerequisites before you start.

Please refer to the following documents for more information about the cluster specification:

Operations

AKS

This section describes the process of deploying an Azure Kubernetes Service (AKS) cluster on your Azure subscription.

API Specification
Cluster API Specification
apiVersion: platform.astropulse.io/v1
kind: K8sCluster
spec:
clusterName: my-aks-cluster # name of the cluster
provider: azure # provider of the cluster
region: eastus # region of the cluster
provisioner:
type: aks # type of the provisioner
aks:
subscriptionId: <your-azure-subscription-id> # your Azure subscription ID
resourceGroup: <your-resource-group> # your Azure resource group
credentials: # credentials to access the cluster
type: dynamic # dynamic credentials
clusterSpec: # specification of the cluster
dataPlane: # specification of the data plane
nodeGroups: # specification of the node groups
- name: default-pool
minNode: 1 # minimum number of nodes
maxNode: 3 # maximum number of nodes
instanceType: ondemand # ondemand or spot
machineTypes: # machine types of the nodes
- Standard_D2s_v3
labels: # labels of the nodes
environment: production
Prerequisites

Make sure to follow the prerequisites before you start.

Please refer to the following documents for more information about the cluster specification:

Operations

Cluster Operations

These are the operations that you can perform on the cluster.

🚀 Cluster Operations

These are the operations that you can perform on the cluster.

📈 Apply Cluster Configuration

To apply the updated cluster configuration, execute the following command (admin role required):

astroctl infra k8s apply -f <cluster-config.yaml>

🗑️ Delete Cluster

To delete a cluster, execute the following command (admin role required):

astroctl infra k8s delete <cluster-name>

🛠️ Get Cluster Details

To get the details of a cluster, execute the following command (no admin required):

astroctl infra k8s get <cluster-name>

🔑 Manage Kubeconfig

To generate the kubeconfig file for a cluster, execute the following command (no admin required):

astroctl infra k8s generate-kubeconfig <cluster-name>

To set the context for a cluster, execute the following command (no admin required):

astroctl infra k8s set-context <cluster-name>

📊 Check Cluster Status

This command will show the status of the cluster.

astroctl infra k8s get <cluster-name>

Note The status subcommand is not available yet!. If your cluster is stuck in INPROGRESS, Please contact Astro platform team for help.

🔄 Upgrade Cluster

The Astro platform supports Kubernetes version upgrades for all cluster types (EKS, GKE, AKS, and self-hosted). Upgrades follow a phased approach: control plane first, then node groups.

Basic Upgrade

To upgrade a cluster to a new Kubernetes version:

# List available versions first
astroctl infra k8s upgrade <cluster-name> --list-versions

# Upgrade to specific version
astroctl infra k8s upgrade <cluster-name> 1.30
Dry-Run Validation

Before upgrading, validate the upgrade path without making changes:

astroctl infra k8s upgrade <cluster-name> 1.30 --dry-run
Generate Readiness Report

Get a comprehensive readiness report including capacity analysis:

astroctl infra k8s upgrade <cluster-name> 1.30 --generate-report
Understanding Rolling Updates

The --max-surge and --max-unavailable flags control how nodes are upgraded:

SettingDescriptionCapacity Impact
--max-surgeExtra nodes created DURING upgradeRequires additional capacity
--max-unavailableNodes that can be DOWN during upgradeReduces cluster capacity temporarily

Example with Multiple Node Groups:

Cluster: my-production-cluster
├── ng-system: 3 nodes (critical) → maxSurge=1, maxUnavailable=0
├── ng-app: 10 nodes (general) → maxSurge=2, maxUnavailable=1
└── ng-workers: 20 nodes (batch) → maxSurge=10%, maxUnavailable=10%

UPGRADE PROCESS (each node group upgraded sequentially):
1. ng-system: Creates 1 extra node, drains old → needs 4 nodes temporarily
2. ng-app: Creates 2 extra, drains 3 at a time → needs 12 nodes temporarily
3. ng-workers: Creates 2 extra (10%), drains 2 → needs 22 nodes temporarily

TOTAL CAPACITY NEEDED: max(4, 12, 22) = 22 nodes at peak

PRODUCTION (Default - Zero Downtime):

# Creates 1 extra node per group, needs N+1 capacity
astroctl infra k8s upgrade <cluster-name> 1.30 --max-surge 1 --max-unavailable 0

COST-SENSITIVE (No Extra Capacity Needed):

# Upgrades in-place, one node at a time, slower but no extra cost
astroctl infra k8s upgrade <cluster-name> 1.30 --max-surge 0 --max-unavailable 1

FAST UPGRADE (More Parallelism):

# Creates 25% extra nodes, faster but needs more capacity
astroctl infra k8s upgrade <cluster-name> 1.30 --max-surge 25% --max-unavailable 0
Upgrade from YAML File

For complex updates, use a YAML configuration:

kubernetesVersion: "1.30"

rollingUpdate:
maxUnavailable: "1"
maxSurge: "1"

nodeGroups:
- name: "default-pool"
minNode: 2
maxNode: 8
machineTypes: ["t3.medium"] # or ["e2-medium"] for GKE, ["Standard_D2s_v3"] for AKS
labels:
workload: "general"

Apply the update:

astroctl infra k8s update <cluster-name> -f upgrade.yaml
Upgrade Workflow Phases
  1. Pre-flight Validation: Version validation, cluster health check, addon compatibility, capacity analysis
  2. Control Plane Upgrade: Cloud provider managed upgrade (EKS/GKE/AKS) or self-hosted control plane update
  3. Node Group Upgrade: Rolling update of nodes with configurable surge and unavailability
  4. Addon Updates: Update managed addons to compatible versions
  5. Post-Upgrade Verification: Verify all nodes at target version and healthy
Important Upgrade Notes
  • Sequential upgrades only: Cannot skip minor versions (e.g., 1.28 to 1.30 is invalid)
  • Control plane cannot be downgraded: Only node groups can be rolled back
  • Test in non-production first: Always validate upgrades in a test environment
  • Check capacity: Ensure your cloud account has quota for surge nodes
Provider-Specific Behavior
ProviderControl PlaneNode GroupsRollback
EKSAWS managedRolling update with surgeNode groups only
GKEGCP managedSequential node pool upgradesNode pools only
AKSAzure managedRolling agent pool upgradesAgent pools only
Self-HostedManaged rolling updateRolling update with drainWith --allow-downgrade
Monitoring Upgrade Progress

Track upgrade progress in real-time:

astroctl infra k8s progress stream <cluster-name>