Running the step-based photogrammetry workflow
Recommended Workflow
This is the recommended workflow for photogrammetry processing. It provides optimized resource allocation, cost savings, and better monitoring compared to the original monolithic workflow.
This guide describes how to run the OFO step-based photogrammetry workflow, which splits Metashape processing into 10 individual steps with optimized CPU/GPU node allocation. The workflow uses automate-metashape and performs post-processing steps.
Key Benefits
- π― GPU steps (match_photos, build_depth_maps, build_mesh) run on expensive GPU nodes only when needed
- π» CPU steps (align_cameras, build_point_cloud, build_dem_orthomosaic, etc.) run on cheaper CPU nodes
- β‘ Disabled steps are completely skipped (no pod creation, no resource allocation)
- π Fine-grained monitoring - Track progress of each individual step in the Argo UI
- π§ Flexible GPU usage - Configure whether GPU-capable steps use GPU or CPU nodes
- π° Cost optimization - Reduce GPU usage by 60-80% compared to monolithic workflow
Prerequisites
Before running the workflow, ensure you have:
- Installed and set up the
openstackandkubectlutilities - Installed the Argo CLI
- Added the appropriate type and number of nodes to the cluster
- Set up your
kubectlauthentication env var (part of instructions for adding nodes). Quick reference:
source ~/venv/openstack/bin/activate
source ~/.ofocluster/app-cred-ofocluster-openrc.sh
export KUBECONFIG=~/.ofocluster/ofocluster.kubeconfig
Workflow overview
The step-based workflow executes 10 separate Metashape processing steps as individual containerized tasks, followed by upload and post-processing. Each mission processes sequentially through these steps:
Metashape Processing Steps
- setup (CPU) - Initialize project, add photos, calibrate reflectance
- match_photos (GPU/CPU configurable) - Generate tie points for camera alignment
- align_cameras (CPU) - Align cameras, add GCPs, optimize, filter sparse points
- build_depth_maps (GPU) - Create depth maps for dense reconstruction
- build_point_cloud (CPU) - Generate dense point cloud from depth maps
- build_mesh (GPU/CPU configurable) - Build 3D mesh model
- build_dem_orthomosaic (CPU) - Create DEMs and orthomosaic products
- match_photos_secondary (GPU/CPU configurable, optional) - Match secondary photos if provided
- align_cameras_secondary (CPU, optional) - Align secondary cameras if provided
- finalize (CPU) - Cleanup, generate reports
Post-Processing Steps
- rclone-upload-task - Upload Metashape outputs to S3
- postprocessing-task - Generate CHMs, clip to boundaries, create COGs and thumbnails, upload to S3
Sequential Execution
Steps execute sequentially within each mission to prevent conflicts with shared Metashape project files. However, multiple missions process in parallel, each with its own step sequence.
Conditional Execution
Steps disabled in your config file are completely skipped - no container is created and no resources are allocated. This is more efficient than the original workflow where disabled operations still ran inside a single long-running container.
Setup
Prepare inputs
Before running the workflow, you need to prepare three types of inputs on the cluster's shared storage:
- Drone imagery datasets (JPEG images)
- Metashape configuration files
- A config list file specifying which configs to process
All inputs must be placed in /ofo-share/argo-data/.
Add drone imagery datasets
To add new drone imagery datasets to be processed using Argo, transfer files from your local machine (or the cloud) to the /ofo-share volume. Put the drone imagery datasets to be processed in their own directory in /ofo-share/argo-data/argo-input/datasets (or another folder within argo-input).
One data transfer method is the scp command-line tool:
scp -r <local/directory/drone_image_dataset/> exouser@<vm.ip.address>:/ofo-share/argo-data/argo-input/datasets
Replace <vm.ip.address> with the IP address of a cluster node that has the share mounted.
Specify Metashape parameters
Config Structure Requirement
The step-based workflow requires an updated config structure with:
- Global settings under
project:section - Each operation as a top-level config section with
enabledflag - Separate
match_photosandalign_camerassections (not combinedalignPhotos) - Separate
build_demandbuild_orthomosaicsections
See the updated config example for the full structure.
Metashape processing parameters are specified in configuration YAML files which should be placed somewhere within /ofo-share/argo-data/argo-input.
Every project to be processed needs to have its own standalone configuration file.
Setting the photo_path: Within the project: section of the config YAML, you must specify photo_path which is
the location of the drone imagery dataset. When running via Argo workflows, this path refers to the
location inside the docker container. The /ofo-share/argo-data directory gets mounted at /data inside the container, so for example, if your drone images are at
/ofo-share/argo-data/argo-input/datasets/dataset_1, then the photo_path should be written as:
Resource request configuration
All Argo workflow resource requests (GPU, CPU, memory) are configured in the top-level argo section of your automate-metashape config file. The defaults assume one or more JS2 m3.large CPU nodes and one or more mig1 (7-slice MIG g3.xl) GPU nodes (see cluster access and resizing).
Importantly, using well-selected resource requests may allow more than one workflow step to schedule simultaneously on the same compute node, without substantially extending the compute time of either, thus greatly increasing compute efficiency by requiring fewer compute nodes. The example config YAML includes suggested resource requests we have developed through extensive benchmarking.
GPU scheduling
Three steps support configurable GPU usage via argo.<step>.gpu_enabled parameters:
argo.match_photos.gpu_enabled- Iftrue, runs on GPU node; iffalse, runs on CPU node (default:true)argo.build_mesh.gpu_enabled- Iftrue, runs on GPU node; iffalse, runs on CPU node (default:true)argo.match_photos_secondary.gpu_enabled- Inherits frommatch_photosunless explicitly set
The build_depth_maps step always runs on GPU nodes (gpu_enabled cannot be disabled) as it always benefits from GPU acceleration. However, you can configure the GPU resource type and count using gpu_resource and gpu_count.
GPU resource selection (MIG Support)
For GPU steps, you can specify which GPU resource to request using gpu_resource and gpu_count in the argo section. This allows using MIG (Multi-Instance GPU) partitions instead of full GPUs:
argo:
match_photos:
gpu_enabled: true
gpu_resource: "nvidia.com/mig-1g.5gb" # Use smallest MIG partition
gpu_count: 2 # Request 2 MIG slices for more parallelism
build_depth_maps:
gpu_resource: "nvidia.com/gpu" # Explicitly request full GPU (this is the default)
# gpu_count defaults to 1 if omitted
build_mesh:
gpu_enabled: true
gpu_resource: "nvidia.com/mig-3g.20gb" # Larger MIG partition for mesh building
gpu_count: 1
Available GPU resources:
| Resource | Description | Pods per GPU |
|---|---|---|
nvidia.com/gpu |
Full GPU (default if gpu_resource omitted) |
1 |
nvidia.com/mig-1g.5gb |
1/7 compute, 5GB VRAM | 7 |
nvidia.com/mig-2g.10gb |
2/7 compute, 10GB VRAM | 3 |
nvidia.com/mig-3g.20gb |
3/7 compute, 20GB VRAM | 2 |
Use gpu_count to request multiple MIG slices (e.g., gpu_count: 2 with mig-1g.5gb to get 2/7 compute power).
When to use MIG
Use MIG partitions when your GPU steps have low utilization. This allows multiple workflow steps to share a single physical GPU, reducing costs. In extensive benchmarking, we have found that we get the greatest efficiency with mig-1g.5gb nodes, potentially providing more than one slice to GPU-intensive pods.
Nodegroup requirement
MIG resources are only available on MIG-enabled nodegroups. Create a MIG nodegroup with a name containing mig1-, mig2-, or mig3- (see MIG nodegroups).
CPU and memory configuration
You can configure CPU and memory requests for all workflow steps (both CPU and GPU steps) using cpu_request and memory_request parameters in the argo section:
argo:
# Optional: Set global defaults that apply to all steps
defaults:
cpu_request: "10" # Default CPU cores for all steps
memory_request: "50Gi" # Default memory for all steps
# Override for specific steps
match_photos:
cpu_request: "8" # Override default CPU request for this step
memory_request: "32Gi" # Override default memory request for this step
build_depth_maps:
cpu_request: "6"
memory_request: "24Gi"
align_cameras:
cpu_request: "15" # CPU-heavy step
memory_request: "50Gi"
Default values (if not specified) are hard-coded into the workflow YAML under the CPU and GPU step templates.
Fallback order:
- Step-specific value (e.g.,
argo.match_photos.cpu_request) - User default from
argo.defaults(if specified) - Hardcoded default (based on step type and GPU mode)
Using defaults as a template
You can leave step-level parameters blank/empty to use the defaults, which serves as a visual template:
argo:
defaults:
cpu_request: "8"
memory_request: "40Gi"
match_photos:
cpu_request: # Blank = uses defaults.cpu_request β 8
memory_request: # Blank = uses defaults.memory_request β 40Gi
build_depth_maps:
cpu_request: "12" # Override: uses 12 instead of defaults
memory_request: # Blank = uses defaults.memory_request β 40Gi
Secondary photo processing
The match_photos_secondary and align_cameras_secondary steps inherit resource configuration from their primary steps unless explicitly overridden:
argo:
match_photos:
gpu_resource: "nvidia.com/mig-2g.10gb"
cpu_request: "6"
memory_request: "24Gi"
# match_photos_secondary automatically inherits the above settings
# unless you override them:
match_photos_secondary:
gpu_resource: "nvidia.com/mig-1g.5gb" # Override: use smaller GPU
# cpu_request and memory_request still inherited from match_photos
This 4-level fallback applies: Secondary-specific β Primary step β User defaults β Hardcoded defaults
Parameters handled by Argo: The project_path, output_path, and project_name configuration parameters are handled automatically by the Argo workflow:
project_pathandoutput_pathare determined via CLI arguments passed to the automate-metashape container, derived from theTEMP_WORKING_DIRArgo workflow parameter (passed by the user on the command line when invokingargo submit)project_nameis extracted fromproject.project_namein the config file (or from the filename of the config file if missing in the config) and passed by Argo via CLI to each step to ensure consistent project names per mission
Any values specified for project_path and output_path in the config.yml will be overridden by Argo CLI arguments.
Create a config list file
We use a text file, for example config_list.txt, to tell the Argo workflow which config files
should be processed in the current run. This text file should list the paths to each config.yml file
you want to process within the container (for example, use /data/XYZ to specity the path /ofo-share/argo-data/XYZ), one config file path per line.
For example:
/data/argo-input/configs/01_benchmarking-greasewood.yml
/data/argo-input/configs/02_benchmarking-greasewood.yml
/data/argo-input/configs/01_benchmarking-emerald-subset.yml
/data/argo-input/configs/02_benchmarking-emerald-subset.yml
This allows you to organize your config files in subdirectories or different locations. The project name will be automatically derived from the config filename (e.g., /data/argo-input/configs/project-name.yml becomes project project-name), unless it is explicity set in the config file at project.project_name (which takes priority).
You can create your own config list file and name it whatever you want, placing it anywhere within /ofo-share/argo-data/. Then specify the path to it within the container (using /data/XYZ to refer to /ofo-share/argo-data/XYZ) using the CONFIG_LIST parameter when submitting the workflow.
Determine the maximum number of projects to process in parallel
When tasked with parallelizing across multiple multi-step DAGs, Argo prioritizes breadth first. So when it has a choice, it will start on a new DAG (metashape project) rather than starting the next step of an existing one. This is unfortunately not customizable, and it is undesirable because the workflow involves storing in-process files (including raw imagery, metashape project, outputs) locally during processing. Our shared storage does not have the space to store all files locally at the same time. In addition, we have a limited number of Metashape licenses. So we need to restrict the number of parallel DAGs (metashape projects) it will attempt to run.
The workflow controls this via the parallelism field in the main template (line 66 in
photogrammetry-workflow-stepbased.yaml). To change the max parallel projects, edit this value
directly in the workflow file before submitting. The default is set to 10.
Why not a command-line parameter?
Argo Workflows doesn't support parameter substitution for integer fields like parallelism,
so this value must be hardcoded in the workflow file. This is an known issue with Argo and we
should look for it to be resovled so we can implement it as a command line parameter.
Adjusting parallelism on a running workflow
If you need to increase or decrease parallelism while a workflow is already running, you can patch the workflow in place. First, find your workflow name:
Then patch the main template's parallelism (index 0):
kubectl patch workflow <workflow-name> -n argo --type='json' \
-p='[{"op": "replace", "path": "/spec/templates/0/parallelism", "value": 20}]'
The change takes effect immediately for any new pods that haven't started yet. Already-running pods are not affected.
Note
This only affects the running workflow instance. Future submissions will still use the value from the YAML file.
Submit the workflow
Once your cluster authentication is set up and your inputs are prepared, run:
argo submit -n argo photogrammetry-workflow-stepbased.yaml \
--name "my-run-$(date +%Y%m%d)" \
-p CONFIG_LIST=/data/argo-input/config-lists/config_list.txt \
-p TEMP_WORKING_DIR=/data/argo-output/temp-runs/gillan_june27 \
-p S3_PHOTOGRAMMETRY_DIR=gillan_june27 \
-p PHOTOGRAMMETRY_CONFIG_ID=01 \
-p S3_BUCKET_PHOTOGRAMMETRY_OUTPUTS=ofo-internal \
-p S3_POSTPROCESSED_DIR=jgillan_test \
-p S3_BUCKET_POSTPROCESSED_OUTPUTS=ofo-public \
-p BOUNDARY_DIRECTORY=jgillan_test \
-p POSTPROCESSING_IMAGE_TAG=latest \
-p UTILS_IMAGE_TAG=latest \
-p AUTOMATE_METASHAPE_IMAGE_TAG=latest
Workflow File
Note the different workflow file: photogrammetry-workflow-stepbased.yaml instead of photogrammetry-workflow.yaml
Database parameters (not currently functional):
-p DB_PASSWORD=<password> \
-p DB_HOST=<vm_ip_address> \
-p DB_NAME=<db_name> \
-p DB_USER=<user_name>
Workflow parameters
| Parameter | Description |
|---|---|
CONFIG_LIST |
Absolute path to text file listing metashape config file paths (each line should be an absolute path starting with /data/). Example: /data/argo-input/config-lists/config_list.txt |
TEMP_WORKING_DIR |
Absolute path for temporary workflow files (both photogrammetry and postprocessing). Workflow creates photogrammetry/ and postprocessing/ subdirectories automatically. All files are deleted after successful S3 upload. Example: /data/argo-output/temp-runs/gillan_june27 |
S3_PHOTOGRAMMETRY_DIR |
S3 directory name for raw Metashape outputs. When PHOTOGRAMMETRY_CONFIG_ID is set, products upload to {bucket}/{S3_PHOTOGRAMMETRY_DIR}/photogrammetry_{PHOTOGRAMMETRY_CONFIG_ID}/. When not set, products go to {bucket}/{S3_PHOTOGRAMMETRY_DIR}/. Example: gillan_june27 |
PHOTOGRAMMETRY_CONFIG_ID |
Two-digit configuration ID (e.g., 01, 02) used to organize outputs into photogrammetry_NN subdirectories in S3 for both raw and postprocessed products. If not specified or set to NONE, both raw and postprocessed products are stored without the photogrammetry_NN subfolder. |
S3_BUCKET_PHOTOGRAMMETRY_OUTPUTS |
S3 bucket where raw Metashape products (orthomosaics, point clouds, etc.) are uploaded (typically ofo-internal). |
S3_POSTPROCESSED_DIR |
S3 directory name for postprocessed outputs. When PHOTOGRAMMETRY_CONFIG_ID is set, products are organized as {S3_POSTPROCESSED_DIR}/{mission_name}/photogrammetry_{PHOTOGRAMMETRY_CONFIG_ID}/. When not set, products go to {S3_POSTPROCESSED_DIR}/{mission_name}/. Example: jgillan_test |
S3_BUCKET_POSTPROCESSED_OUTPUTS |
S3 bucket for final postprocessed outputs and where boundary files are stored (typically ofo-public) |
BOUNDARY_DIRECTORY |
Parent directory in S3 where mission boundary polygons reside (used to clip imagery). Example: jgillan_test |
POSTPROCESSING_IMAGE_TAG |
Docker image tag for the postprocessing container (default: latest). Use a specific branch name or tag to test development versions (e.g., dy-manila) |
UTILS_IMAGE_TAG |
Docker image tag for the argo-workflow-utils container (default: latest). Use a specific branch name or tag to test development versions (e.g., dy-manila) |
AUTOMATE_METASHAPE_IMAGE_TAG |
Docker image tag for the automate-metashape container (default: latest). Use a specific branch name or tag to test development versions |
DB_* |
Database parameters for logging Argo status (not currently functional; credentials in OFO credentials document) |
Secrets configuration:
- S3 credentials: S3 access credentials, provider type, and endpoint URL are configured via the
s3-credentialsKubernetes secret - Agisoft license: Metashape floating license server address is configured via the
agisoft-licenseKubernetes secret
These secrets should have been created (within the argo namespace) during cluster creation.
Monitor the workflow
Using the Argo UI
The Argo UI is great for troubleshooting and checking individual step progress. Access it at argo.focal-lab.org, using the credentials from Vaultwarden under the record "Argo UI token".
Navigating the Argo UI
The Workflows tab on the left side menu shows all running workflows. Click a workflow to see a detailed DAG (directed acyclic graph) showing:
- Preprocessing task: The
determine-projectsstep that reads config files - Per-mission columns: Each mission shows as a separate column with all its processing steps
- Individual step status: Each of the 10+ steps shown with color-coded status
Step status colors:
- π’ Green (Succeeded): Step completed successfully
- π΅ Blue (Running): Step currently executing
- βͺ Gray (Skipped): Step was disabled in config or conditionally skipped
- π΄ Red (Failed): Step encountered an error
- π‘ Yellow (Pending): Step waiting for dependencies
Click on a specific step to see detailed information including:
- Which VM/node it's running on (CPU vs GPU node)
- Duration of the step
- Real-time logs
- Resource usage
- Input/output parameters
Viewing Step Logs
To view logs for a specific step:
- Click the workflow in Argo UI
- Click on the individual step node (e.g.,
match-photos-gpu,build-depth-maps) - Click the "Logs" tab
- Logs will stream in real-time if the step is running
Multi-mission miew
When processing multiple missions, the Argo UI shows all missions side-by-side. This makes it easy to:
- See which missions are at which step
- Identify if one mission is failing while others succeed
- Compare processing times across missions
- Monitor overall workflow progress
Understanding step names
Task names in the Argo UI follow the pattern process-projects-N.<step-name>:
process-projects-0.setup- Setup step for first mission (index 0)process-projects-0.match-photos-gpu- Match photos on GPU for first missionprocess-projects-1.build-depth-maps- Build depth maps for second mission (index 1)
Finding Your Mission
To identify which mission corresponds to which index:
- Check the
determine-projectsstep logs to see the order of missions in the JSON output - Click on any task (e.g.,
process-projects-0.setup) and view the parameters to see theproject-namevalue - The project name appears in all file paths, logs, and processing outputs
GPU-capable steps show either -gpu or -cpu suffix depending on config.
Using the CLI
View workflow status from the command line:
# Watch overall workflow progress
argo watch <workflow-name>
# List all workflows
argo list
# Get logs for preprocessing step
argo logs <workflow-name> -c determine-projects
# Get logs for a specific mission's step
# Format: process-projects-<N>.<step-name>
argo logs <workflow-name> -c process-projects-0.setup
argo logs <workflow-name> -c process-projects-0.match-photos-gpu
argo logs <workflow-name> -c process-projects-1.build-depth-maps
# Follow logs in real-time
argo logs <workflow-name> -c process-projects-0.setup -f
Workflow outputs
The final outputs will be written to S3:ofo-public in the following directory structure:
/S3:ofo-public/
βββ <OUTPUT_DIRECTORY>/
βββ dataset1/
βββ images/
βββ metadata-images/
βββ metadata-mission/
βββ dataset1_mission-metadata.gpkg
βββphotogrammetry_01/
βββ full/
βββ dataset1_cameras.xml
βββ dataset1_chm-ptcloud.tif
βββ dataset1_dsm-ptcloud.tif
βββ dataset1_dtm-ptcloud.tif
βββ dataset1_log.txt
βββ dataset1_ortho-dtm-ptcloud.tif
βββ dataset1_points-copc.laz
βββ dataset1_report.pdf
βββ thumbnails/
βββ dataset1_chm-ptcloud.png
βββ dataset1_dsm-ptcloud.png
βββ dataset1_dtm-ptcloud.png
βββ dataset1-ortho-dtm-ptcloud.png
βββphotogrammetry_02/
βββ full/
βββ dataset1_cameras.xml
βββ dataset1_chm-ptcloud.tif
βββ dataset1_dsm-ptcloud.tif
βββ dataset1_dtm-ptcloud.tif
βββ dataset1_log.txt
βββ dataset1_ortho-dtm-ptcloud.tif
βββ dataset1_points-copc.laz
βββ dataset1_report.pdf
βββ thumbnails/
βββ dataset1_chm-ptcloud.png
βββ dataset1_dsm-ptcloud.png
βββ dataset1_dtm-ptcloud.png
βββ dataset1-ortho-dtm-ptcloud.png
βββ dataset2/
This directory structure should already exist prior to running the Argo workflow.