Getting Started

How It Works

Understanding how dxflow Hub workflows are structured and deployed

Learn how dxflow Hub workflows are organized, deployed, and managed across multiple schedulers through the dxflow engine.

Workflow Structure

Workflows in the Hub are primarily provided as Docker Compose configurations, which dxflow can deploy on various container runtimes and job schedulers. Here's what a typical workflow looks like:

Basic Workflow Example

version: '3.8'

services:
  app:
    image: workflow-image:latest
    container_name: dxflow-app

    # Environment variables
    environment:
      - PARAM1=value1
      - PARAM2=value2

    # Data volumes
    volumes:
      - ./data:/data
      - ./results:/results

    # Port mappings
    ports:
      - "8080:8080"

    # Resource limits
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G

Key Components

Service Definition: Each workflow defines one or more services (containers) that work together.

Environment Configuration: Variables to customize behavior without changing the workflow file.

Volume Mounts: Connect local directories to container paths for data input/output.

Port Mappings: Expose container ports to access web interfaces or APIs.

Resource Limits: Control CPU, memory, and GPU allocation for each service.

Deployment Process

Step 1: Browse the Hub

Navigate through workflow categories to find tools for your domain:

Genomics workflows for DNA/RNA analysis
Molecular simulations for MD/chemistry
Structural biology for cryo-EM and structure prediction
Data science environments for analysis and ML
Fluid flow simulations for CFD

Step 2: Select a Workflow

Each workflow page includes:

Overview: What the tool does and key features
Docker Compose Configuration: Complete YAML definition
Usage Instructions: Step-by-step deployment guide
Configuration Options: Available parameters and settings
System Requirements: Resource recommendations

Step 3: Deploy via dxflow

Option A: Web Interface

1. Open dxflow web interface (http://localhost)
2. Navigate to "Apps & Pipelines"
3. Click "Hub" or "Templates"
4. Select workflow and click "Deploy"
5. Configure parameters
6. Launch workflow

Option B: Command Line

# Deploy workflow from Hub
dxflow compose create --identity my-workflow workflow.yml

# Start the workflow
dxflow compose start my-workflow

# Monitor progress
dxflow compose logs my-workflow

Step 4: Monitor and Manage

Once deployed, you can:

View logs in real-time
Check resource usage (CPU, memory, network)
Access web interfaces if the workflow provides them
Stop/restart workflows as needed
Scale services up or down

Data Management

Input Data

Upload your input data before running workflows:

# Create input directory
dxflow fs create /data/input

# Upload files
dxflow fs upload /local/data.csv /data/input/

Output Data

Download results after workflow completion:

# List output files
dxflow fs list /data/output

# Download results
dxflow fs download /data/output/ /local/results/

Data Persistence

Workflows use volume mounts to persist data:

Input data remains accessible across workflow restarts
Output data is preserved even after stopping workflows
Shared volumes allow multiple workflows to access the same data

Created: Workflow is defined but not running Starting: Containers are being pulled and initialized Running: Workflow is actively executing Paused: Temporarily suspended (can be resumed) Stopped: Cleanly stopped (can be restarted) Failed: Encountered an error (check logs)

Operations

# Start workflow
dxflow compose start my-workflow

# Stop workflow
dxflow compose stop my-workflow

# Restart workflow
dxflow compose restart my-workflow

# Pause/unpause (keep state in memory)
dxflow compose pause my-workflow
dxflow compose unpause my-workflow

# Remove workflow completely
dxflow compose remove my-workflow

Common Patterns

Interactive Workflows

Some workflows provide web interfaces:

services:
  jupyter:
    image: jupyter/scipy-notebook
    ports:
      - "8888:8888"  # Access at http://localhost:8888

GPU Workflows

For ML/AI and scientific computing:

services:
  ml-training:
    image: tensorflow/tensorflow:latest-gpu
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Multi-Service Workflows

Complex applications with multiple containers:

services:
  web:
    image: nginx
    ports:
      - "80:80"

  api:
    image: api-server
    depends_on:
      - database

  database:
    image: postgres
    volumes:
      - db-data:/var/lib/postgresql/data

Best Practices

Resource Management

Right-size your resources:

Start with recommended minimums
Monitor actual usage
Scale up if needed
Use resource limits to prevent overallocation

Data Organization

Keep data organized:

Separate input/output directories
Use meaningful names
Document data requirements
Back up important results

Workflow Monitoring

Stay informed:

Check logs regularly for errors
Monitor resource usage
Set up alerts for long-running jobs
Validate outputs

Cleanup

Maintain your environment:

Stop unused workflows
Remove completed workflows
Clean up old data
Free up disk space regularly

Troubleshooting

Common Issues

Workflow won't start:

Check if ports are already in use
Verify sufficient resources available
Review workflow logs for errors
Ensure Docker is running

Out of memory:

Increase memory limit in workflow config
Reduce dataset size for testing
Check for memory leaks in application

No output files:

Verify volume mount paths
Check file permissions
Review workflow logs
Ensure workflow completed successfully

Getting Help

Workflow-specific: Check the workflow documentation page

General issues: See the dxflow documentation or report issues on GitHub

Ready to deploy your first workflow? Browse the Hub categories and get started!