Running the photogrammetry workflow (original monolithic version)

Step-Based Workflow Recommended

A new step-based workflow with optimized resource allocation is now available and recommended for all new processing.

Benefits of step-based workflow:

💰 60-80% reduction in GPU costs
📊 Individual step monitoring and debugging
🔧 Configurable GPU vs CPU scheduling
⚡ Disabled steps are completely skipped (no resource allocation)

This page documents the original monolithic workflow for reference only.

This guide describes how to run the original OFO photogrammetry workflow, which processes drone imagery using automate-metashape in a single monolithic container and performs post-processing steps.

Prerequisites

Before running the workflow, ensure you have:

Installed and set up the openstack and kubectl utilities
Installed the Argo CLI
Added the appropriate type and number of nodes to the cluster (cluster-access-and-resizing.md#cluster-resizing)
Set up your kubectl authentication env var (part of instructions for adding nodes). Quick reference:

source ~/venv/openstack/bin/activate
source ~/.ofocluster/app-cred-ofocluster-openrc.sh
export KUBECONFIG=~/.ofocluster/ofocluster.kubeconfig

Workflow overview

The workflow performs the following steps:

Pulls raw drone imagery from /ofo-share-2 onto the Kubernetes VM cluster
Processes the imagery with Metashape
Writes the imagery products to /ofo-share-2
Uploads the imagery products to S3:ofo-internal and deletes them from /ofo-share
Downloads the imagery products from S3 back to the cluster and performs postprocessing (CHMs, clipping, COGs, thumbnails)
Uploads the final products to S3:ofo-public

Setup

1. Prepare inputs

Before running the workflow, you need to prepare three types of inputs on the cluster's shared storage:

Drone imagery datasets (JPEG images)
Metashape configuration files
A config list file specifying which configs to process

All inputs must be placed in /ofo-share-2/argo-data/.

Directory structure

Here is a schematic of the /ofo-share-2/argo-data directory:

/ofo-share-2/argo-data/
├── argo-input/
   ├── datasets/
   │   ├──dataset_1/
   │   │   ├── image_01.jpg
   │   │   └── image_02.jpg
   │   └──dataset_2/
   │       ├── image_01.jpg
   │       └── image_02.jpg
   ├── configs/
   │   ├──config_dataset_1.yml
   │   └──config_dataset_2.yml
   └── config_list.txt

Add drone imagery datasets

To add new drone imagery datasets to be processed using Argo, transfer files from your local machine (or the cloud) to the /ofo-share-2 volume. Put the drone imagery projects to be processed in their own directory in /ofo-share-2/argo-data/argo-input/datasets.

One data transfer method is the scp command-line tool:

scp -r <local/directory/drone_image_dataset/> exouser@<vm.ip.address>:/ofo-share-2/argo-data/argo-input/datasets

Replace <vm.ip.address> with the IP address of a cluster node that has the share mounted.

Specify Metashape parameters

Metashape processing parameters are specified in configuration YAML files which need to be located at /ofo-share-2/argo-data/argo-input/configs/.

Every dataset to be processed needs to have its own standalone configuration file.

Naming convention: Config files should be named to match the naming convention <config_id>_<datasetname>.yml. For example:

01_benchmarking-greasewood.yml
02_benchmarking-greasewood.yml

Setting the photo_path: Within each metashape config.yml file, you must specify photo_path which is the location of the drone imagery dataset to be processed. When running via Argo workflows, this path refers to the location of the images inside the docker container.

For example, if your drone images were uploaded to /ofo-share-2/argo-data/argo-input/datasets/dataset_1, then the photo_path should be written as:

photo_path: /data/argo-input/datasets/dataset_1

Parameters handled by Argo: The output_path, project_path, and run_name configuration parameters are handled automatically by the Argo workflow:

output_path and project_path are determined via the arguments passed to the automate-metashape container, which in turn are derived from the RUN_FOLDER workflow parameter passed when invoking argo submit
run_name is pulled from the name of the config file (minus the extension) by the Argo workflow

Any values specified for these parameters in the config.yml will be ignored.

Create a config list file

We use a text file, for example config_list.txt, to tell the Argo workflow which config files should be processed in the current run. This text file should list the paths to each config.yml file you want to process (relative to /ofo-share-2/argo-data), one config file path per line.

For example:

argo-input/configs/01_benchmarking-greasewood.yml
argo-input/configs/02_benchmarking-greasewood.yml
argo-input/configs/01_benchmarking-emerald-subset.yml
argo-input/configs/02_benchmarking-emerald-subset.yml

This allows you to organize your config files in subdirectories or different locations. The dataset name will be automatically derived from the config filename (e.g., argo-input/configs/dataset-name.yml becomes dataset dataset-name).

You can create your own config list file and name it whatever you want, placing it anywhere within /ofo-share-2/argo-data/. Then specify the path to it (relative to /ofo-share-2/argo-data) using the CONFIG_LIST parameter when submitting the workflow.

Submit the workflow

Once your cluster authentication is set up and your inputs are prepared, run:

argo submit -n argo photogrammetry-workflow.yaml \
-p CONFIG_LIST=argo-input/config-lists/config_list.txt \
-p RUN_FOLDER=gillan_june27 \
-p PHOTOGRAMMETRY_CONFIG_ID=01 \
-p S3_BUCKET_PHOTOGRAMMETRY_OUTPUTS=ofo-internal \
-p S3_BUCKET_POSTPROCESSED_OUTPUTS=ofo-public \
-p OUTPUT_DIRECTORY=jgillan_test \
-p BOUNDARY_DIRECTORY=jgillan_test \
-p WORKING_DIR=/argo-output/temp-working-dir \
-p POSTPROCESSING_IMAGE_TAG=latest

Database parameters (not currently functional):

-p DB_PASSWORD=<password> \
-p DB_HOST=<vm_ip_address> \
-p DB_NAME=<db_name> \
-p DB_USER=<user_name>

Workflow parameters

Parameter	Description
`CONFIG_LIST`	Path to text file listing paths to metashape config files (all paths relative to `/ofo-share-2/argo-data`)
`RUN_FOLDER`	Name for the parent directory of the Metashape outputs (locally under `argo-data/argo-outputs` and at the top level of the S3 bucket). Example: `photogrammetry-outputs`.
`PHOTOGRAMMETRY_CONFIG_ID`	Two-digit configuration ID (e.g., `01`, `02`) used to organize outputs into `photogrammetry_NN` subdirectories in S3 for both raw and postprocessed products. If not specified, both raw and postprocessed products are stored directly in `RUN_FOLDER` (no `photogrammetry_NN` subfolder).
`S3_BUCKET_PHOTOGRAMMETRY_OUTPUTS`	S3 bucket where raw Metashape products (orthomosaics, point clouds, etc.) are uploaded (typically `ofo-internal`). When `PHOTOGRAMMETRY_CONFIG_ID` is set, products are uploaded to `{bucket}/{RUN_FOLDER}/photogrammetry_{PHOTOGRAMMETRY_CONFIG_ID}/`. When not set, products go to `{bucket}/{RUN_FOLDER}/`.
`S3_BUCKET_POSTPROCESSED_OUTPUTS`	S3 bucket for final postprocessed outputs and where boundary files are stored (typically `ofo-public`)
`OUTPUT_DIRECTORY`	Name of parent folder where postprocessed products are uploaded. When `PHOTOGRAMMETRY_CONFIG_ID` is set, products are organized as `{OUTPUT_DIRECTORY}/{mission_name}/photogrammetry_{PHOTOGRAMMETRY_CONFIG_ID}/`. When not set, products go to `{OUTPUT_DIRECTORY}/{mission_name}/`.
`BOUNDARY_DIRECTORY`	Parent directory where mission boundary polygons reside (used to clip imagery)
`WORKING_DIR`	Directory within container for downloading and postprocessing (typically `/tmp/processing` which downloads data to the processing computer; can be changed to a persistent volume)
`POSTPROCESSING_IMAGE_TAG`	Docker image tag for the postprocessing container (default: `latest`). Use a specific branch name or tag to test development versions (e.g., `dy-manila`)
`DB_*`	Database parameters for logging Argo status (not currently functional; credentials in OFO credentials document)

Secrets configuration: - S3 credentials: S3 access credentials, provider type, and endpoint URL are configured via the s3-credentials Kubernetes secret - Agisoft license: Metashape floating license server address is configured via the agisoft-license Kubernetes secret

These secrets should have been created (within the argo namespace) during cluster creation.

Monitor the workflow

Using the Argo UI

The Argo UI is great for troubleshooting and checking additional logs. Access it at argo.focal-lab.org, using the credentials from Vaultwarden under the record "Argo UI token".

Navigating the Argo UI

The Workflows tab on the left side menu shows all running workflows. Click a current workflow to see a schematic of the jobs spread across multiple instances:

Argo workflow overview

Click on a specific job to see detailed information including which VM it is running on, the duration of the process, and logs:

Argo job details

A successful Argo run looks like this:

Successful Argo run

Workflow outputs

The final outputs will be written to S3:ofo-public in the following directory structure:

/S3:ofo-public/
├── <OUTPUT_DIRECTORY>/
    ├── dataset1/
         ├── images/
         ├── metadata-images/
         ├── metadata-mission/
            └── dataset1_mission-metadata.gpkg
         ├──photogrammetry_01/
            ├── full/
               ├── dataset1_cameras.xml
               ├── dataset1_chm-ptcloud.tif
               ├── dataset1_dsm-ptcloud.tif
               ├── dataset1_dtm-ptcloud.tif
               ├── dataset1_log.txt
               ├── dataset1_ortho-dtm-ptcloud.tif
               ├── dataset1_points-copc.laz
               └── dataset1_report.pdf
            ├── thumbnails/
               ├── dataset1_chm-ptcloud.png
               ├── dataset1_dsm-ptcloud.png
               ├── dataset1_dtm-ptcloud.png
               └── dataset1-ortho-dtm-ptcloud.png
         ├──photogrammetry_02/
            ├── full/
               ├── dataset1_cameras.xml
               ├── dataset1_chm-ptcloud.tif
               ├── dataset1_dsm-ptcloud.tif
               ├── dataset1_dtm-ptcloud.tif
               ├── dataset1_log.txt
               ├── dataset1_ortho-dtm-ptcloud.tif
               ├── dataset1_points-copc.laz
               └── dataset1_report.pdf
            ├── thumbnails/
               ├── dataset1_chm-ptcloud.png
               ├── dataset1_dsm-ptcloud.png
               ├── dataset1_dtm-ptcloud.png
               └── dataset1-ortho-dtm-ptcloud.png
    ├── dataset2/

This directory structure should already exist prior to running the Argo workflow.