FastQC is a quality control tool for high throughput sequence data. It provides a modular set of analyses to give you a quick impression of whether your data has any problems before you do any further analysis.
FastQC can be run in one of two modes: interactive or non-interactive. In non-interactive mode, the program will process all specified files and produce an HTML report for each one.
Key Features:
version: '3.8'
services:
fastqc:
image: biocontainers/fastqc:v0.11.9_cv8
container_name: dxflow-fastqc
# Working directory
working_dir: /data
# Mount volumes for input/output
volumes:
- ./input:/data/input:ro
- ./output:/data/output
# Command to run FastQC
command: >
fastqc
/data/input/*.fastq.gz
--outdir /data/output
--threads 4
# Resource limits
deploy:
resources:
limits:
cpus: '4'
memory: 4G
Upload your FASTQ files:
# Create input directory
mkdir -p input output
# Upload your sequencing files
dxflow fs upload /local/sample_R1.fastq.gz input/
dxflow fs upload /local/sample_R2.fastq.gz input/
# Deploy FastQC workflow
dxflow compose create --identity fastqc-analysis fastqc.yml
# Start analysis
dxflow compose start fastqc-analysis
# View logs
dxflow compose logs fastqc-analysis
# Check status
dxflow compose list
# Download HTML reports
dxflow fs download output/ /local/fastqc-results/
| Option | Description | Default |
|---|---|---|
--threads | Number of CPU threads | 1 |
--outdir | Output directory | Current directory |
--extract | Extract ZIP files | false |
--noextract | Do not extract ZIP files | false |
| Option | Description |
|---|---|
--casava | Files from Casava pipeline |
--nofilter | Do not filter sequences |
--format | Input file format (fastq, bam, sam) |
--contaminants | Custom contaminants file |
--adapters | Custom adapters file |
--limits | Custom limits file |
FastQC generates the following output files:
*_fastqc.html - HTML report with all graphs and tables*_fastqc.zip - ZIP archive containing detailed data filessummary.txt - Summary of pass/warn/fail for all modulesfastqc_data.txt - Raw data for all analysesFastQC evaluates several quality metrics:
Complete analysis workflow:
# 1. Upload raw FASTQ files
dxflow fs upload raw_data/ input/
# 2. Deploy and run FastQC
dxflow compose create --identity qc fastqc.yml
dxflow compose start qc
# 3. Wait for completion (monitor logs)
dxflow compose logs -f qc
# 4. Download reports
dxflow fs download output/ results/
# 5. Review HTML reports in browser
# 6. Decide on trimming/filtering based on results
Minimum:
Recommended:
Optimize Processing:
Batch Processing:
# Process all FASTQ files in parallel
command: >
parallel -j 4 fastqc {} --outdir /data/output ::: /data/input/*.fastq.gz
Container fails to start:
Out of memory errors:
No output files:
After FastQC analysis:
If you use FastQC in your research, please cite:
Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data.
Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Need help? Check the troubleshooting section or report issues on GitHub.