Getting Started

How It Works

Understanding how dxflow Hub workflows are structured and deployed

Learn how dxflow Hub workflows are organized, deployed, and managed across multiple schedulers through the dxflow engine.

Workflow Structure

Workflows in the Hub are primarily provided as Docker Compose configurations, which dxflow can deploy on various container runtimes and job schedulers. Here's what a typical workflow looks like:

Basic Workflow Example

version: '3.8'

services:
  app:
    image: workflow-image:latest
    container_name: dxflow-app

    # Environment variables
    environment:
      - PARAM1=value1
      - PARAM2=value2

    # Data volumes
    volumes:
      - ./data:/data
      - ./results:/results

    # Port mappings
    ports:
      - "8080:8080"

    # Resource limits
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G

Key Components

Service Definition: Each workflow defines one or more services (containers) that work together.

Environment Configuration: Variables to customize behavior without changing the workflow file.

Volume Mounts: Connect local directories to container paths for data input/output.

Port Mappings: Expose container ports to access web interfaces or APIs.

Resource Limits: Control CPU, memory, and GPU allocation for each service.

Deployment Process

Step 1: Browse the Hub

Navigate through workflow categories to find tools for your domain:

  • Genomics workflows for DNA/RNA analysis
  • Molecular simulations for MD/chemistry
  • Structural biology for cryo-EM and structure prediction
  • Data science environments for analysis and ML
  • Fluid flow simulations for CFD

Step 2: Select a Workflow

Each workflow page includes:

  • Overview: What the tool does and key features
  • Docker Compose Configuration: Complete YAML definition
  • Usage Instructions: Step-by-step deployment guide
  • Configuration Options: Available parameters and settings
  • System Requirements: Resource recommendations

Step 3: Deploy via dxflow

Option A: Web Interface

1. Open dxflow web interface (http://localhost)
2. Navigate to "Apps & Pipelines"
3. Click "Hub" or "Templates"
4. Select workflow and click "Deploy"
5. Configure parameters
6. Launch workflow

Option B: Command Line

# Deploy workflow from Hub
dxflow compose create --identity my-workflow workflow.yml

# Start the workflow
dxflow compose start my-workflow

# Monitor progress
dxflow compose logs my-workflow

Step 4: Monitor and Manage

Once deployed, you can:

  • View logs in real-time
  • Check resource usage (CPU, memory, network)
  • Access web interfaces if the workflow provides them
  • Stop/restart workflows as needed
  • Scale services up or down

Data Management

Input Data

Upload your input data before running workflows:

# Create input directory
dxflow fs create /data/input

# Upload files
dxflow fs upload /local/data.csv /data/input/

Output Data

Download results after workflow completion:

# List output files
dxflow fs list /data/output

# Download results
dxflow fs download /data/output/ /local/results/

Data Persistence

Workflows use volume mounts to persist data:

  • Input data remains accessible across workflow restarts
  • Output data is preserved even after stopping workflows
  • Shared volumes allow multiple workflows to access the same data

Workflow Lifecycle

States

Created: Workflow is defined but not running Starting: Containers are being pulled and initialized Running: Workflow is actively executing Paused: Temporarily suspended (can be resumed) Stopped: Cleanly stopped (can be restarted) Failed: Encountered an error (check logs)

Operations

# Start workflow
dxflow compose start my-workflow

# Stop workflow
dxflow compose stop my-workflow

# Restart workflow
dxflow compose restart my-workflow

# Pause/unpause (keep state in memory)
dxflow compose pause my-workflow
dxflow compose unpause my-workflow

# Remove workflow completely
dxflow compose remove my-workflow

Common Patterns

Interactive Workflows

Some workflows provide web interfaces:

services:
  jupyter:
    image: jupyter/scipy-notebook
    ports:
      - "8888:8888"  # Access at http://localhost:8888

GPU Workflows

For ML/AI and scientific computing:

services:
  ml-training:
    image: tensorflow/tensorflow:latest-gpu
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Multi-Service Workflows

Complex applications with multiple containers:

services:
  web:
    image: nginx
    ports:
      - "80:80"

  api:
    image: api-server
    depends_on:
      - database

  database:
    image: postgres
    volumes:
      - db-data:/var/lib/postgresql/data

Best Practices

Resource Management

Right-size your resources:

  • Start with recommended minimums
  • Monitor actual usage
  • Scale up if needed
  • Use resource limits to prevent overallocation

Data Organization

Keep data organized:

  • Separate input/output directories
  • Use meaningful names
  • Document data requirements
  • Back up important results

Workflow Monitoring

Stay informed:

  • Check logs regularly for errors
  • Monitor resource usage
  • Set up alerts for long-running jobs
  • Validate outputs

Cleanup

Maintain your environment:

  • Stop unused workflows
  • Remove completed workflows
  • Clean up old data
  • Free up disk space regularly

Troubleshooting

Common Issues

Workflow won't start:

  • Check if ports are already in use
  • Verify sufficient resources available
  • Review workflow logs for errors
  • Ensure Docker is running

Out of memory:

  • Increase memory limit in workflow config
  • Reduce dataset size for testing
  • Check for memory leaks in application

No output files:

  • Verify volume mount paths
  • Check file permissions
  • Review workflow logs
  • Ensure workflow completed successfully

Getting Help

Workflow-specific: Check the workflow documentation page

General issues: See the dxflow documentation or report issues on GitHub


Ready to deploy your first workflow? Browse the Hub categories and get started!