Running the step-based photogrammetry workflow

Recommended Workflow

This is the recommended workflow for photogrammetry processing. It provides optimized resource allocation, cost savings, and better monitoring compared to the original monolithic workflow.

This guide describes how to run the OFO step-based photogrammetry workflow, which splits Metashape processing into 10 individual steps with optimized CPU/GPU node allocation. The workflow uses automate-metashape and performs post-processing steps.

Key Benefits

🎯 GPU steps (match_photos, build_depth_maps, build_mesh) run on expensive GPU nodes only when needed
💻 CPU steps (align_cameras, build_point_cloud, build_dem_orthomosaic, etc.) run on cheaper CPU nodes
⚡ Disabled steps are completely skipped (no pod creation, no resource allocation)
📊 Fine-grained monitoring - Track progress of each individual step in the Argo UI
🔧 Flexible GPU usage - Configure whether GPU-capable steps use GPU or CPU nodes
💰 Cost optimization - Reduce GPU usage by 60-80% compared to monolithic workflow

Prerequisites

Before running the workflow, ensure you have:

Installed and set up the openstack and kubectl utilities
Installed the Argo CLI
Added the appropriate type and number of nodes to the cluster
Set up your kubectl authentication env var (part of instructions for adding nodes). Quick reference:

source ~/venv/openstack/bin/activate
source ~/.ofocluster/app-cred-ofocluster-openrc.sh
export KUBECONFIG=~/.ofocluster/ofocluster.kubeconfig

Workflow overview

The step-based workflow executes 10 separate Metashape processing steps as individual containerized tasks, followed by upload and post-processing. Each mission processes sequentially through these steps:

Metashape Processing Steps

setup (CPU) - Initialize project, add photos, calibrate reflectance
match_photos (GPU/CPU configurable) - Generate tie points for camera alignment
align_cameras (CPU) - Align cameras, add GCPs, optimize, filter sparse points
build_depth_maps (GPU) - Create depth maps for dense reconstruction
build_point_cloud (CPU) - Generate dense point cloud from depth maps
build_mesh (GPU/CPU configurable) - Build 3D mesh model
build_dem_orthomosaic (CPU) - Create DEMs and orthomosaic products
match_photos_secondary (GPU/CPU configurable, optional) - Match secondary photos if provided
align_cameras_secondary (CPU, optional) - Align secondary cameras if provided
finalize (CPU) - Cleanup, generate reports

Post-Processing Steps

rclone-upload-task - Upload Metashape outputs to S3
postprocessing-task - Generate CHMs, clip to boundaries, create COGs and thumbnails, upload to S3

Sequential Execution

Steps execute sequentially within each mission to prevent conflicts with shared Metashape project files. However, multiple missions process in parallel, each with its own step sequence.

Conditional Execution

Steps disabled in your config file are completely skipped - no container is created and no resources are allocated. This is more efficient than the original workflow where disabled operations still ran inside a single long-running container.

Setup

Prepare inputs

Before running the workflow, you need to prepare three types of inputs on the cluster's shared storage:

Drone imagery datasets (JPEG images)
Metashape configuration files
A config list file specifying which configs to process

All inputs must be placed in /ofo-share/argo-data/.

Add drone imagery datasets

To add new drone imagery datasets to be processed using Argo, transfer files from your local machine (or the cloud) to the /ofo-share volume. Put the drone imagery datasets to be processed in their own directory in /ofo-share/argo-data/argo-input/datasets (or another folder within argo-input).

One data transfer method is the scp command-line tool:

scp -r <local/directory/drone_image_dataset/> exouser@<vm.ip.address>:/ofo-share/argo-data/argo-input/datasets

Replace <vm.ip.address> with the IP address of a cluster node that has the share mounted.

Specify Metashape parameters

Config Structure Requirement

The step-based workflow requires an updated config structure with:

Global settings under project: section
Each operation as a top-level config section with enabled flag
Separate match_photos and align_cameras sections (not combined alignPhotos)
Separate build_dem and build_orthomosaic sections

See the updated config example for the full structure.

Metashape processing parameters are specified in configuration YAML files which should be placed somewhere within /ofo-share/argo-data/argo-input.

Every project to be processed needs to have its own standalone configuration file.

Setting the photo_path: Within the project: section of the config YAML, you must specify photo_path which is the location of the drone imagery dataset. When running via Argo workflows, this path refers to the location inside the docker container. The /ofo-share/argo-data directory gets mounted at /data inside the container, so for example, if your drone images are at /ofo-share/argo-data/argo-input/datasets/dataset_1, then the photo_path should be written as:

project:
  photo_path: /data/argo-input/datasets/dataset_1

Resource request configuration

All Argo workflow resource requests (GPU, CPU, memory) are configured in the top-level argo section of your automate-metashape config file. The defaults assume one or more JS2 m3.large CPU nodes and one or more mig1 (7-slice MIG g3.xl) GPU nodes (see cluster access and resizing).

Importantly, using well-selected resource requests may allow more than one workflow step to schedule simultaneously on the same compute node, without substantially extending the compute time of either, thus greatly increasing compute efficiency by requiring fewer compute nodes. The example config YAML includes suggested resource requests we have developed through extensive benchmarking.

GPU scheduling

Three steps support configurable GPU usage via argo.<step>.gpu_enabled parameters:

argo.match_photos.gpu_enabled - If true, runs on GPU node; if false, runs on CPU node (default: true)
argo.build_mesh.gpu_enabled - If true, runs on GPU node; if false, runs on CPU node (default: true)
argo.match_photos_secondary.gpu_enabled - Inherits from match_photos unless explicitly set

The build_depth_maps step always runs on GPU nodes (gpu_enabled cannot be disabled) as it always benefits from GPU acceleration. However, you can configure the GPU resource type and count using gpu_resource and gpu_count.

GPU resource selection (MIG Support)

For GPU steps, you can specify which GPU resource to request using gpu_resource and gpu_count in the argo section. This allows using MIG (Multi-Instance GPU) partitions instead of full GPUs:

argo:
  match_photos:
    gpu_enabled: true
    gpu_resource: "nvidia.com/mig-1g.5gb"  # Use smallest MIG partition
    gpu_count: 2                           # Request 2 MIG slices for more parallelism

  build_depth_maps:
    gpu_resource: "nvidia.com/gpu"         # Explicitly request full GPU (this is the default)
    # gpu_count defaults to 1 if omitted

  build_mesh:
    gpu_enabled: true
    gpu_resource: "nvidia.com/mig-3g.20gb" # Larger MIG partition for mesh building
    gpu_count: 1

Available GPU resources:

Resource	Description	Pods per GPU
`nvidia.com/gpu`	Full GPU (default if `gpu_resource` omitted)	1
`nvidia.com/mig-1g.5gb`	1/7 compute, 5GB VRAM	7
`nvidia.com/mig-2g.10gb`	2/7 compute, 10GB VRAM	3
`nvidia.com/mig-3g.20gb`	3/7 compute, 20GB VRAM	2

Use gpu_count to request multiple MIG slices (e.g., gpu_count: 2 with mig-1g.5gb to get 2/7 compute power).

When to use MIG

Use MIG partitions when your GPU steps have low utilization. This allows multiple workflow steps to share a single physical GPU, reducing costs. In extensive benchmarking, we have found that we get the greatest efficiency with mig-1g.5gb nodes, potentially providing more than one slice to GPU-intensive pods.

Nodegroup requirement

MIG resources are only available on MIG-enabled nodegroups. Create a MIG nodegroup with a name containing mig1-, mig2-, or mig3- (see MIG nodegroups).

CPU and memory configuration

You can configure CPU and memory requests for all workflow steps (both CPU and GPU steps) using cpu_request and memory_request parameters in the argo section:

argo:
  # Optional: Set global defaults that apply to all steps
  defaults:
    cpu_request: "10"        # Default CPU cores for all steps
    memory_request: "50Gi"   # Default memory for all steps

  # Override for specific steps
  match_photos:
    cpu_request: "8"         # Override default CPU request for this step
    memory_request: "32Gi"   # Override default memory request for this step

  build_depth_maps:
    cpu_request: "6"
    memory_request: "24Gi"

  align_cameras:
    cpu_request: "15"        # CPU-heavy step
    memory_request: "50Gi"

Default values (if not specified) are hard-coded into the workflow YAML under the CPU and GPU step templates.

Fallback order:

Step-specific value (e.g., argo.match_photos.cpu_request)
User default from argo.defaults (if specified)
Hardcoded default (based on step type and GPU mode)

Using defaults as a template

You can leave step-level parameters blank/empty to use the defaults, which serves as a visual template:

argo:
  defaults:
    cpu_request: "8"
    memory_request: "40Gi"

  match_photos:
    cpu_request:      # Blank = uses defaults.cpu_request → 8
    memory_request:   # Blank = uses defaults.memory_request → 40Gi

  build_depth_maps:
    cpu_request: "12" # Override: uses 12 instead of defaults
    memory_request:   # Blank = uses defaults.memory_request → 40Gi

Secondary photo processing

The match_photos_secondary and align_cameras_secondary steps inherit resource configuration from their primary steps unless explicitly overridden:

argo:
  match_photos:
    gpu_resource: "nvidia.com/mig-2g.10gb"
    cpu_request: "6"
    memory_request: "24Gi"

  # match_photos_secondary automatically inherits the above settings
  # unless you override them:
  match_photos_secondary:
    gpu_resource: "nvidia.com/mig-1g.5gb"  # Override: use smaller GPU
    # cpu_request and memory_request still inherited from match_photos

This 4-level fallback applies: Secondary-specific → Primary step → User defaults → Hardcoded defaults

Parameters handled by Argo: The project_path, output_path, and project_name configuration parameters are handled automatically by the Argo workflow:

project_path and output_path are determined via CLI arguments passed to the automate-metashape container, derived from the TEMP_WORKING_DIR Argo workflow parameter (passed by the user on the command line when invoking argo submit)
project_name is extracted from project.project_name in the config file (or from the filename of the config file if missing in the config) and passed by Argo via CLI to each step to ensure consistent project names per mission

Any values specified for project_path and output_path in the config.yml will be overridden by Argo CLI arguments.

Create a config list file

We use a text file, for example config_list.txt, to tell the Argo workflow which config files should be processed in the current run. This text file should list the paths to each config.yml file you want to process within the container (for example, use /data/XYZ to specity the path /ofo-share/argo-data/XYZ), one config file path per line.

For example:

/data/argo-input/configs/01_benchmarking-greasewood.yml
/data/argo-input/configs/02_benchmarking-greasewood.yml
/data/argo-input/configs/01_benchmarking-emerald-subset.yml
/data/argo-input/configs/02_benchmarking-emerald-subset.yml

This allows you to organize your config files in subdirectories or different locations. The project name will be automatically derived from the config filename (e.g., /data/argo-input/configs/project-name.yml becomes project project-name), unless it is explicity set in the config file at project.project_name (which takes priority).

You can create your own config list file and name it whatever you want, placing it anywhere within /ofo-share/argo-data/. Then specify the path to it within the container (using /data/XYZ to refer to /ofo-share/argo-data/XYZ) using the CONFIG_LIST parameter when submitting the workflow.

Determine the maximum number of projects to process in parallel

When tasked with parallelizing across multiple multi-step DAGs, Argo prioritizes breadth first. So when it has a choice, it will start on a new DAG (metashape project) rather than starting the next step of an existing one. This is unfortunately not customizable, and it is undesirable because the workflow involves storing in-process files (including raw imagery, metashape project, outputs) locally during processing. Our shared storage does not have the space to store all files locally at the same time. In addition, we have a limited number of Metashape licenses. So we need to restrict the number of parallel DAGs (metashape projects) it will attempt to run.

The workflow controls this via the parallelism field in the main template (line 66 in photogrammetry-workflow-stepbased.yaml). To change the max parallel projects, edit this value directly in the workflow file before submitting. The default is set to 10.

Why not a command-line parameter?

Argo Workflows doesn't support parameter substitution for integer fields like parallelism, so this value must be hardcoded in the workflow file. This is an known issue with Argo and we should look for it to be resovled so we can implement it as a command line parameter.

Adjusting parallelism on a running workflow

If you need to increase or decrease parallelism while a workflow is already running, you can patch the workflow in place. First, find your workflow name:

argo list -n argo

Then patch the main template's parallelism (index 0):

kubectl patch workflow <workflow-name> -n argo --type='json' \
  -p='[{"op": "replace", "path": "/spec/templates/0/parallelism", "value": 20}]'

The change takes effect immediately for any new pods that haven't started yet. Already-running pods are not affected.

Note

This only affects the running workflow instance. Future submissions will still use the value from the YAML file.

Submit the workflow

Once your cluster authentication is set up and your inputs are prepared, run:

argo submit -n argo photogrammetry-workflow-stepbased.yaml \
  --name "my-run-$(date +%Y%m%d)" \
  -p CONFIG_LIST=/data/argo-input/config-lists/config_list.txt \
  -p TEMP_WORKING_DIR=/data/argo-output/temp-runs/gillan_june27 \
  -p S3_PHOTOGRAMMETRY_DIR=gillan_june27 \
  -p PHOTOGRAMMETRY_CONFIG_ID=01 \
  -p S3_BUCKET_PHOTOGRAMMETRY_OUTPUTS=ofo-internal \
  -p S3_POSTPROCESSED_DIR=jgillan_test \
  -p S3_BUCKET_POSTPROCESSED_OUTPUTS=ofo-public \
  -p BOUNDARY_DIRECTORY=jgillan_test \
  -p POSTPROCESSING_IMAGE_TAG=latest \
  -p UTILS_IMAGE_TAG=latest \
  -p AUTOMATE_METASHAPE_IMAGE_TAG=latest

Workflow File

Note the different workflow file: photogrammetry-workflow-stepbased.yaml instead of photogrammetry-workflow.yaml

Database parameters (not currently functional):

-p DB_PASSWORD=<password> \
-p DB_HOST=<vm_ip_address> \
-p DB_NAME=<db_name> \
-p DB_USER=<user_name>

Workflow parameters

Parameter	Description
`CONFIG_LIST`	Absolute path to text file listing metashape config file paths (each line should be an absolute path starting with `/data/`). Example: `/data/argo-input/config-lists/config_list.txt`
`TEMP_WORKING_DIR`	Absolute path for temporary workflow files (both photogrammetry and postprocessing). Workflow creates `photogrammetry/` and `postprocessing/` subdirectories automatically. All files are deleted after successful S3 upload. Example: `/data/argo-output/temp-runs/gillan_june27`
`S3_PHOTOGRAMMETRY_DIR`	S3 directory name for raw Metashape outputs. When `PHOTOGRAMMETRY_CONFIG_ID` is set, products upload to `{bucket}/{S3_PHOTOGRAMMETRY_DIR}/photogrammetry_{PHOTOGRAMMETRY_CONFIG_ID}/`. When not set, products go to `{bucket}/{S3_PHOTOGRAMMETRY_DIR}/`. Example: `gillan_june27`
`PHOTOGRAMMETRY_CONFIG_ID`	Two-digit configuration ID (e.g., `01`, `02`) used to organize outputs into `photogrammetry_NN` subdirectories in S3 for both raw and postprocessed products. If not specified or set to `NONE`, both raw and postprocessed products are stored without the `photogrammetry_NN` subfolder.
`S3_BUCKET_PHOTOGRAMMETRY_OUTPUTS`	S3 bucket where raw Metashape products (orthomosaics, point clouds, etc.) are uploaded (typically `ofo-internal`).
`S3_POSTPROCESSED_DIR`	S3 directory name for postprocessed outputs. When `PHOTOGRAMMETRY_CONFIG_ID` is set, products are organized as `{S3_POSTPROCESSED_DIR}/{mission_name}/photogrammetry_{PHOTOGRAMMETRY_CONFIG_ID}/`. When not set, products go to `{S3_POSTPROCESSED_DIR}/{mission_name}/`. Example: `jgillan_test`
`S3_BUCKET_POSTPROCESSED_OUTPUTS`	S3 bucket for final postprocessed outputs and where boundary files are stored (typically `ofo-public`)
`BOUNDARY_DIRECTORY`	Parent directory in S3 where mission boundary polygons reside (used to clip imagery). Example: `jgillan_test`
`POSTPROCESSING_IMAGE_TAG`	Docker image tag for the postprocessing container (default: `latest`). Use a specific branch name or tag to test development versions (e.g., `dy-manila`)
`UTILS_IMAGE_TAG`	Docker image tag for the argo-workflow-utils container (default: `latest`). Use a specific branch name or tag to test development versions (e.g., `dy-manila`)
`AUTOMATE_METASHAPE_IMAGE_TAG`	Docker image tag for the automate-metashape container (default: `latest`). Use a specific branch name or tag to test development versions
`DB_*`	Database parameters for logging Argo status (not currently functional; credentials in OFO credentials document)

Secrets configuration:

S3 credentials: S3 access credentials, provider type, and endpoint URL are configured via the s3-credentials Kubernetes secret
Agisoft license: Metashape floating license server address is configured via the agisoft-license Kubernetes secret

These secrets should have been created (within the argo namespace) during cluster creation.

Monitor the workflow

Using the Argo UI

The Argo UI is great for troubleshooting and checking individual step progress. Access it at argo.focal-lab.org, using the credentials from Vaultwarden under the record "Argo UI token".

Navigating the Argo UI

The Workflows tab on the left side menu shows all running workflows. Click a workflow to see a detailed DAG (directed acyclic graph) showing:

Preprocessing task: The determine-projects step that reads config files
Per-mission columns: Each mission shows as a separate column with all its processing steps
Individual step status: Each of the 10+ steps shown with color-coded status

Step status colors:

🟢 Green (Succeeded): Step completed successfully
🔵 Blue (Running): Step currently executing
⚪ Gray (Skipped): Step was disabled in config or conditionally skipped
🔴 Red (Failed): Step encountered an error
🟡 Yellow (Pending): Step waiting for dependencies

Click on a specific step to see detailed information including:

Which VM/node it's running on (CPU vs GPU node)
Duration of the step
Real-time logs
Resource usage
Input/output parameters

Viewing Step Logs

To view logs for a specific step:

Click the workflow in Argo UI
Click on the individual step node (e.g., match-photos-gpu, build-depth-maps)
Click the "Logs" tab
Logs will stream in real-time if the step is running

Multi-mission miew

When processing multiple missions, the Argo UI shows all missions side-by-side. This makes it easy to:

See which missions are at which step
Identify if one mission is failing while others succeed
Compare processing times across missions
Monitor overall workflow progress

Understanding step names

Task names in the Argo UI follow the pattern process-projects-N.<step-name>:

process-projects-0.setup - Setup step for first mission (index 0)
process-projects-0.match-photos-gpu - Match photos on GPU for first mission
process-projects-1.build-depth-maps - Build depth maps for second mission (index 1)

Finding Your Mission

To identify which mission corresponds to which index:

Check the determine-projects step logs to see the order of missions in the JSON output
Click on any task (e.g., process-projects-0.setup) and view the parameters to see the project-name value
The project name appears in all file paths, logs, and processing outputs

GPU-capable steps show either -gpu or -cpu suffix depending on config.

Using the CLI

View workflow status from the command line:

# Watch overall workflow progress
argo watch <workflow-name>

# List all workflows
argo list

# Get logs for preprocessing step
argo logs <workflow-name> -c determine-projects

# Get logs for a specific mission's step
# Format: process-projects-<N>.<step-name>
argo logs <workflow-name> -c process-projects-0.setup
argo logs <workflow-name> -c process-projects-0.match-photos-gpu
argo logs <workflow-name> -c process-projects-1.build-depth-maps

# Follow logs in real-time
argo logs <workflow-name> -c process-projects-0.setup -f

Workflow outputs

The final outputs will be written to S3:ofo-public in the following directory structure:

/S3:ofo-public/
├── <OUTPUT_DIRECTORY>/
    ├── dataset1/
         ├── images/
         ├── metadata-images/
         ├── metadata-mission/
            └── dataset1_mission-metadata.gpkg
         ├──photogrammetry_01/
            ├── full/
               ├── dataset1_cameras.xml
               ├── dataset1_chm-ptcloud.tif
               ├── dataset1_dsm-ptcloud.tif
               ├── dataset1_dtm-ptcloud.tif
               ├── dataset1_log.txt
               ├── dataset1_ortho-dtm-ptcloud.tif
               ├── dataset1_points-copc.laz
               └── dataset1_report.pdf
            ├── thumbnails/
               ├── dataset1_chm-ptcloud.png
               ├── dataset1_dsm-ptcloud.png
               ├── dataset1_dtm-ptcloud.png
               └── dataset1-ortho-dtm-ptcloud.png
         ├──photogrammetry_02/
            ├── full/
               ├── dataset1_cameras.xml
               ├── dataset1_chm-ptcloud.tif
               ├── dataset1_dsm-ptcloud.tif
               ├── dataset1_dtm-ptcloud.tif
               ├── dataset1_log.txt
               ├── dataset1_ortho-dtm-ptcloud.tif
               ├── dataset1_points-copc.laz
               └── dataset1_report.pdf
            ├── thumbnails/
               ├── dataset1_chm-ptcloud.png
               ├── dataset1_dsm-ptcloud.png
               ├── dataset1_dtm-ptcloud.png
               └── dataset1-ortho-dtm-ptcloud.png
    ├── dataset2/

This directory structure should already exist prior to running the Argo workflow.