As a leading supplier of scientific workstations and servers, Exxact has conducted benchmarks for RELION Cryo-EM on the new NVIDIA GeForce RTX 3090 GPUs. The results are quite good, especially when paired with an AMD EPYC 7552 Processor, we were able to run the plasmodium ribosome benchmark in under an hour.

RELION GPU Support Summary

With advancements in automation, compute power, and visual technology, the scope and complexity of datasets used in cryo-EM have grown substantially. GPU support and acceleration are essential for the flexibility of resource management, prevention of memory limitations, and to address the most computationally intensive processes of cryo-EM such as image classification, and high-resolution refinement.

System Specs

Base System Configuration
Nodes 1
Processor AMD EPYC 7552
Processor Count 2
Total Logical Cores 48
Memory Type DDR4
Memory Size 512 GB
Storage SSD
OS CentOS 7
CUDA Version 10.2
Relion Version 3

RTX 3090 Relion Benchmarks 2GPU & 4 GPU Configuration

RTX 3090 Benchmark

Plasmodium ribosome data set ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz

Benchmark Parameters 4 GPU

mpirun -n 5 /usr/local/relion-3/bin/relion_refine_mpi --j 6 --gpu --pool 100 --dont_combine_weights_via_disc --keep_scratch --reuse_scratch --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d

Benchmark Parameters 2 GPU

mpirun -n 3 /usr/local/relion-3/bin/relion_refine_mpi --j 6 --gpu --pool 100 --dont_combine_weights_via_disc --keep_scratch --reuse_scratch --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d

Notes on System Memory

Although a minimum of 64 GB of RAM is recommended to run RELION with small image sizes (eg. 200×200) on either the original or accelerated versions of RELION, 360×360 problems run best on systems with more than 128GB of RAM. Systems with 256GB or more RAM are recommended for the CPU-accelerated kernels on larger image sizes. Insufficient memory causes individual MPI ranks to be killed, leading to zombie RELION jobs.

MPI Settings

Where some users may want to run more than one MPI rank per GPU, sufficient GPU memory is needed. Each MPI-slave that shares a GPU increases the use of memory. In this case, however, it’s recommended running a single MPI-slave per GPU for good performance and stable execution.

Notes on Scaling

The GPUs tested were Turing/Volta-based and performed similarly. As a result, it is more beneficial to scale out  than scale up. Another thing to note is the diminishing returns in scaling once you pass 4 GPUs.

