Skip to content

Cluster creation and resizing

This guide is for the cluster administrator (currently Derek). Since we only need one cluster and Derek is taking care of creating it, this guide is not necessary for the whole team. There is a separate guide on cluster management that is for the whole team.

Key resource referenced in creating this guide: Beginner's Guide to Magnum on Jetstream2

One-time local machine software setup

These instructions will set up your local (Linux, Mac, or WSL) machine to control the cluster through the command line.

Install Python and create virtual environment

Make sure you have a recent Python interpreter and the venv utility, then create a virtual environment for OpenStack management:

sudo apt update
sudo apt install -y python3-full python3-venv
python3 -m venv ~/venv/openstack

Install OpenStack command line tools

# Activate environment
source ~/venv/openstack/bin/activate

# Install OpenStack utilities
pip install -U python-openstackclient python-magnumclient python-designateclient

Install kubectl

Install the Kubernetes control utility kubectl (from the official Kubernetes documentation):

# Install prerequisites
sudo apt install -y apt-transport-https ca-certificates curl gnupg

# Add Kubernetes apt repository
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
sudo chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list

# Install kubectl
sudo apt update
sudo apt install -y kubectl

Create application credential

In Horizon, go to Identity > Application Credentials. Click Create:

  • Do not change the roles
  • Do not set a secret (one will be generated)
  • Do set an expiration date
  • Do check "unrestricted" (required because Magnum creates additional app credentials for the cluster)

Download the openrc file and store it in the OFO Vaultwarden organization where OFO members can access it.

Copy the application credential onto your local computer (do not put it on a JS2 machine), ideally into ~/.ofocluster/app-cred-ofocluster-openrc.sh (which is where we will assume it is in these docs).

Source the application credential (which sets relevant environment variables for the OpenStack command line tools):

source ~/.ofocluster/app-cred-ofocluster-openrc.sh

Create OpenStack keypair

Create a keypair for cluster node access. If you want, you can save the private key that is displayed when you run this command in order to SSH into the cluster nodes later. However, they won't have public IP addresses, so this is mainly to satisfy Magnum's requirements.

# Create a new keypair (displays private key - save if needed)
openstack keypair create my-openstack-keypair-name

Alternatively, specify an existing public key you normally use, in this example ~/.ssh/id_rsa.pub:

# Use existing public key
openstack keypair create my-openstack-keypair-name --public-key ~/.ssh/id_rsa.pub

Enable shell completion

# Create directory for completion scripts
mkdir -p ~/.bash_completion.d

# Generate completion scripts
openstack complete > ~/.ofocluster/openstack-completion.bash
kubectl completion bash > ~/.ofocluster/kubectl-completion.bash

# Add to ~/.bashrc
echo 'source ~/.ofocluster/openstack-completion.bash' >> ~/.bashrc
echo 'source ~/.ofocluster/kubectl-completion.bash' >> ~/.bashrc

Cluster creation

Assuming you're in a fresh shell session, enter your JS2 venv and source the application credential:

source ~/venv/openstack/bin/activate
source ~/.ofocluster/app-cred-ofocluster-openrc.sh

View available cluster templates

openstack coe cluster template list

Deploy the cluster

Specify the deployment parameters and create the cluster. Choose the most recent Kubernetes version (highest number in the template list). The master node can be m3.small. We'll deploy a cluster with a single worker node that is also m3.small. When we need to scale up, we'll add nodegroups. This initial spec is just the base setup for when we're not running Argo workloads on it. We need to enable audo-scaling now, even though we don't want it for the default worker nodegroup, because these settings apply to child nodegroups and it appears the max node count cannot be overridden.

# Set deployment parameters
TEMPLATE="kubernetes-1-33-jammy"
KEYPAIR=my-openstack-keypair-name # what you created above

# Network configuration
NETWORK_ID=$(openstack network show --format value -c id auto_allocated_network)
SUBNET_ID=$(openstack subnet show --format value -c id auto_allocated_subnet_v4)

openstack coe cluster create \
    --cluster-template $TEMPLATE \
    --master-count 1 --node-count 1 \
    --master-flavor m3.small --flavor m3.small \
    --merge-labels \
    --labels auto_scaling_enabled=true,min_node_count=1,boot_volume_size=80 \
    --keypair $KEYPAIR \
    --fixed-network "${NETWORK_ID}" \
    --fixed-subnet "${SUBNET_ID}" \
    "ofocluster2"



### Check cluster status (optional)

```bash
openstack coe cluster list
openstack coe cluster show ofocluster
openstack coe nodegroup list ofocluster

Or with formatting that makes it easier to copy the cluster UUID:

openstack coe cluster list --format value -c uuid -c name

Set up kubectl to control Kubernetes

This is required the first time you interact with Kubernetes on the cluster. kubectl is a tool to control Kubernetes (the cluster's software, not its compute nodes/VMs) from your local command line.

Once the openstack coe cluster list status (command above) changes to CREATE_COMPLETE, get the Kubernetes configuration file (kubeconfig) and configure your environment:

# Get cluster configuration
openstack coe cluster config "ofocluster2" --force

# Set permissions and move to appropriate location
chmod 600 config
mkdir -p ~/.ofocluster
mv -i config ~/.ofocluster/ofocluster.kubeconfig

# Set KUBECONFIG environment variable
export KUBECONFIG=~/.ofocluster/ofocluster.kubeconfig

Create the Argo namespace

We will install various resources into this namespace in this guide and subsequent ones.

kubectl create namespace argo

Create Kubernetes secrets

The Argo workflows require two Kubernetes secrets to be created:

S3 credentials secret

The Argo workflows upload to and download from Jetstream2's S3-compatible buckets. You need to create a secret to store the S3 Access ID, Secret Key, provider type, and endpoint URL. Obtain the access key ID and secret access key from the OFO Vaultwarden organization. The credentials were originally created by Derek following JS2 docs and particularly openstack ec2 credentials create.

kubectl create secret generic s3-credentials \
  --from-literal=provider='Other' \
  --from-literal=endpoint='https://js2.jetstream-cloud.org:8001' \
  --from-literal=access_key='<YOUR_ACCESS_KEY_ID>' \
  --from-literal=secret_key='<YOUR_SECRET_ACCESS_KEY>' \
  -n argo

Agisoft Metashape license secret

The photogrammetry workflow requires access to an Agisoft Metashape floating license server. Create a secret to store the license server address. Obtain the license server IP address from the OFO Vaultwarden organization.

kubectl create secret generic agisoft-license \
  --from-literal=license_server='<LICENSE_SERVER_IP>:5842' \
  -n argo

Replace <LICENSE_SERVER_IP> with the actual IP address from the credentials document.

These secrets only need to be created once per cluster.

Configure CPU node labeling

To prevent CPU workloads from being scheduled on expensive GPU nodes, CPU nodes are labeled based on their nodegroup naming pattern. CPU-only workflow templates use nodeSelector to explicitly target these labeled nodes, while GPU pods use resource requests that naturally constrain them to GPU nodes.

Apply CPU node labels

Label CPU nodes automatically based on their nodegroup naming pattern:

kubectl apply -f setup/k8s/cpu-nodegroup-labels.yaml

This creates a NodeFeatureRule that automatically labels any node with cpu in its name with feature.node.kubernetes.io/workload-type: cpu. The label is applied by Node Feature Discovery (NFD) when the node joins the cluster.

Nodegroup naming requirement

When creating CPU nodegroups, ensure the nodegroup name contains cpu (e.g., cpu-group, cpu-m3xl) so nodes are automatically labeled. See nodegroup creation for details.

Verify CPU node labels

kubectl get nodes -L feature.node.kubernetes.io/workload-type

All nodes with cpu in their name should show feature.node.kubernetes.io/workload-type=cpu.

How it works

  • CPU pods: Use nodeSelector: feature.node.kubernetes.io/workload-type: cpu to explicitly target CPU nodes
  • GPU pods: Request GPU resources (e.g., nvidia.com/mig-1g.5gb), which naturally constrains them to nodes advertising those resources
  • System pods: DaemonSets run on all nodes as needed

Enable mixed MIG strategy

The GPU Operator defaults to "single" MIG strategy, which exposes MIG slices as generic nvidia.com/gpu resources. For MIG nodegroups to expose specific (and mixed) resources like nvidia.com/mig-2g.10gb (optionally along with other MIG nodes or full nodes like nvidia.com/gpu), enable "mixed" strategy:

# Add NVIDIA helm repo (if not already added)
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update nvidia

# Check current GPU Operator version
helm list -n gpu-operator

# Enable mixed MIG strategy (use the same version as currently installed)
helm upgrade nvidia-gpu-operator nvidia/gpu-operator \
  -n gpu-operator \
  --version <CURRENT_VERSION> \
  --reuse-values \
  --set mig.strategy=mixed

Cluster upgrades

This setting may be reset if the cluster template is upgraded and Magnum redeploys the GPU Operator. Re-run this command after cluster upgrades if MIG resources stop appearing.

Configure MIG (Multi-Instance GPU)

MIG partitions A100 GPUs into isolated slices, allowing multiple pods to share one physical GPU with hardware-level isolation. This is optional - standard GPU nodegroups work without MIG.

MIG profiles

Nodegroup pattern MIG profile Pods/GPU VRAM each Compute each
mig1-* all-1g.10gb 4 10GB 1/7
mig2-* all-2g.10gb 3 10GB 2/7
mig3-* all-3g.20gb 2 20GB 3/7

Apply MIG configuration rule

kubectl apply -f setup/k8s/mig-nodegroup-labels.yaml

This creates a NodeFeatureRule that automatically labels GPU nodes based on their nodegroup name. The NVIDIA MIG manager watches for these labels and configures the GPU accordingly.

Verify MIG is working

After creating a MIG nodegroup (see MIG nodegroups):

# Check node MIG config label
kubectl get nodes -l nvidia.com/mig.config -o custom-columns='NAME:.metadata.name,MIG_CONFIG:.metadata.labels.nvidia\.com/mig\.config'

# Check MIG resources are available
kubectl get nodes -o custom-columns='NAME:.metadata.name,MIG-1G:.status.allocatable.nvidia\.com/mig-1g\.10gb,MIG-2G:.status.allocatable.nvidia\.com/mig-2g\.10gb,MIG-3G:.status.allocatable.nvidia\.com/mig-3g\.20gb'

How it works

  1. User creates nodegroup with MIG naming (e.g., mig1-group)
  2. Node joins cluster with name containing -mig1-
  3. NFD applies label nvidia.com/mig.config=all-1g.5gb
  4. MIG manager detects label, configures GPU into 3 partitions
  5. Device plugin exposes nvidia.com/mig-1g.5gb: 3 as allocatable resources
  6. Pods requesting nvidia.com/mig-1g.5gb: 1 get one partition

Kubernetes management

If you are resuming cluster management after a reboot, you will need to re-set environment variables and source the application credential:

source ~/venv/openstack/bin/activate
export KUBECONFIG=~/.ofocluster/ofocluster.kubeconfig
source ~/.ofocluster/app-cred-ofocluster-openrc.sh

View cluster nodes

kubectl get nodes

Access shell on cluster nodes

Run commands on a node with:

# Start a debug session on a specific node
kubectl debug node/<node-name> -it --image=ubuntu

# Once inside, you have host access via /host
# Check kernel modules
chroot /host modprobe ceph
chroot /host lsmod | grep ceph

Check disk usage

Run a one-off command to check disk usage:

kubectl debug node/<node-name> -it --image=busybox -- df -h

Look for the /dev/vda1 volume. Then delete the debugging pods:

kubectl get pods -o name | grep node-debugger | xargs kubectl delete

Rotating secrets

If you accidentally expose a secret, or for periodic rotation, delete, then re-create. Example for S3 (assuming your S3 is via JS2 Swift):

List creds to get the ID of the cred you want to swap out: openstack ec2 credentials list Delete it: openstack ec2 credentials delete <your-access-key-id> Create a new one: openstack ec2 credentials create Update it in Vaultwarden. Delete the k8s secret: kubectl delete secret -n argo s3-credentials Re-create k8s secret following the instructions above. If you have already installed Argo on the cluster, restart the workflow controller so it picks up the new creds: kubectl rollout restart deployment workflow-controller -n argo

Cluster resizing

These instructions are for managing which nodes are in the cluster, not what software is running on them.

Resize the default worker group

Resize the cluster by adding or removing nodes from the original worker group (not a later-added nodegroup). We will likely not do this, relying instead on nodegroups for specific runs.

openstack coe cluster resize "ofocluster" 4

Add, resize, or delete nodegroups

For nodegroup management, see the corresponding user guide.

Delete the cluster

openstack coe cluster delete "ofocluster"

Monitoring dashboards

Incomplete notes in development.

Grafana dashboard

# Port-forward Grafana to your local machine
kubectl port-forward -n monitoring-system svc/kube-prometheus-stack-grafana 3000:80

Then open http://localhost:3000 in your browser.

Kubernetes dashboard

# Create service account
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard

# Create cluster role binding
kubectl create clusterrolebinding dashboard-admin \
    --clusterrole=cluster-admin \
    --serviceaccount=kubernetes-dashboard:dashboard-admin

# Create token (24 hour duration)
kubectl create token dashboard-admin -n kubernetes-dashboard --duration=24h

# Port-forward (if not already running)
kubectl port-forward -n kubernetes-dashboard svc/kubernetes-dashboard 8443:443

Then open https://localhost:8443 in your browser and use the token to log in.

Notes from testing and experimentation attempting to set up autoscaling and fixed nodegroups

It seems impossible to set new (or override existing) labels when adding nodegroups. Labels only seem to be intended/used for overall cluster creation. Also if we deploy one nodegroup that violates the requirement for min_node_count to be specified, cannot deploy any others (they all fail), even if they would have succeeded otherwise.

By not specifying a label max_node_count upon cluster creation, the default-worker nodegroup will not autoscale. But still we need to set the label auto_scaling_enabled to true upon cluster creation because cluster labels apparently cannot be overridden by nodegroups. This means that all nodegroups will autoscale, and we are required to specify --min-nodes, or nodegroup clreation will fail. If you don't specify --max-nodes when creating a nodegroup, it treats the --node-count as the max and may scale down to the min.

I tried creating a cluster with no scaling (max nodes 1 and auto_scaling_enabled=false) and then overriding it at the nodegroup level with values that should enable scaling, but it didn't scale (apparently these values get overridden). Also tried not specifying auto_scale_enabled label at all, but then specifying it for nodegroups, but these nodegroups did not scale. Learned that --node-count needs to be within the range of the min and max (if omitted, it is assumed to be 1).

Testing/monitoring autoscaling behavior

Deploy a bunch of pods that will need to get scheduled:

kubectl create deployment scale-test --image=nginx --replicas=20 -- sleep infinity && kubectl set resources deployment scale-test --requests=cpu=500m,memory=512Mi

Make sure some become pending (which should trigger a scale up):

kubectl get pods

Monitor the cluster autoscaler status to see if it is planning any scaling up or down:

kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

When finished, delete the test deployment:

kubectl delete deployment scale-test