AlphaFold 2 and AlphaFold 3: Structure Prediction, Inputs, Outputs, and Deployment

In October 2024, the Nobel Prize in Chemistry honored the research behind AlphaFold 2, a deep learning system created by Google DeepMind that can accurately predict a protein’s three-dimensional structure using only its amino acid sequence. Calling this achievement groundbreaking hardly does it justice, considering how essential proteins are to biological systems and how closely a protein’s structure is tied to its function. In the past, scientists often dedicated an entire PhD or many years of professional research to determining the structure of just one protein through experimental techniques such as X-ray crystallography or cryo-electron microscopy. That slow process limited progress across many areas of biology, including the search for strong drug candidates for numerous diseases that are still difficult to treat.

Key Takeaways

AlphaFold 2 is not designed to replace experimental wet-lab approaches. Instead, it generates hypotheses that researchers can test while investigating protein structures. AlphaFold 2 was trained using information from the Protein Data Bank.

AlphaFold 3 extends the capabilities of AlphaFold 2 and can predict the structures of complexes containing nearly all molecular categories represented in the Protein Data Bank, with the exception of water molecules. AlphaFold 3 can work with:

  • Protein complexes that include DNA, RNA, small-molecule ligands, and ions
  • Protein structures that contain post-translational modifications, including glycosylation

Both AlphaFold 2 and AlphaFold 3 use MSA (Multiple Sequence Alignment) as part of the input, but AlphaFold 3 also incorporates RNA chains in addition to proteins.

AlphaFold 2 is available free of charge under the Apache 2.0 license, while AlphaFold 3 is restricted to non-commercial usage.

Prerequisites

Because this article focuses on molecular structure prediction, it is helpful to already understand, or to first become familiar with, biomolecules and related biological terminology such as proteins, RNA, ligands, and similar concepts. To use AlphaFold 3 for non-commercial purposes, you must also request access to the model parameters. Approval is often provided within two to three business days.

In addition, to follow the deployment steps, this guide assumes familiarity with technical topics such as cloud infrastructure with GPU-enabled virtual machines, command-line usage including SSH and git, and containerized workflows with Docker.

Understanding the Inputs

Multiple Sequence Alignment

AlphaFold 2 relies on Multiple Sequence Alignment (MSA) to capture evolutionary relationships among proteins. This approach is effective because amino acids that interact with one another often show correlated mutations when evolutionary changes occur.

Model Architecture (AF2 vs. AF3)

Feature AlphaFold 2 AlphaFold 3
Main Processor Evoformer: deeply combines MSA and pair features throughout the model. Pairformer: reduces the complexity of MSA processing and centers on pair representations.
3D Output Engine Structure Module: applies physical and geometric biases such as frames and torsions. Diffusion Module: uses generative denoising on raw atomic coordinates.
Symmetry Constraints Strictly preserves rotation and translation invariance. Removes many explicit geometric constraints to allow more flexibility.
Input Versatility Mainly optimized for amino acid sequences, especially proteins. Uses a unified token framework for proteins, nucleic acids, and small ligands.

Understanding the Outputs

Confidence Metrics

AlphaFold reports several confidence metrics, including pLDDT, pTM, and PAE.

pLDDT

pLDDT, or predicted local distance difference test, estimates the LDDT value (Local Distance Difference Test). This metric reflects confidence in the local structure for each residue. Put simply, it indicates how strongly the predicted structure is expected to match an experimentally determined structure. The score ranges from 0 to 100, with higher values representing stronger confidence and typically better structural accuracy.

pTM and ipTM

The predicted template modeling score (pTM) and interface predicted template modeling score (ipTM) both derive from the template modeling score, which evaluates the accuracy of the overall structure. A pTM value above 0.5 suggests that the predicted fold of the complex may resemble the actual structure. ipTM measures the predicted relative arrangement of subunits: scores above 0.8 indicate high confidence, values below 0.6 suggest the prediction likely failed, and values between 0.6 and 0.8 remain uncertain. For small structures or short chains under 20 tokens, TM scoring can become overly strict and may produce pTM values below 0.05. In such cases, PAE or pLDDT usually provide a better indication of prediction quality.

PAE

PAE, or predicted aligned error, measures how confident AlphaFold 2 is about the relative position and orientation of two residues or tokens within the predicted structure. Larger values indicate higher expected error and therefore lower confidence.

Should You Use AlphaFold 2 or AlphaFold 3?

AlphaFold 3 improves on AlphaFold 2 in terms of accuracy and can model complexes containing multiple molecule types, but licensing limitations mean AlphaFold 2 still remains highly relevant for many users. AlphaFold 2 is openly available for both academic and commercial work under the Apache 2.0 license. AlphaFold 3, by contrast, is restricted to non-commercial use, which means it cannot be used for commercial research, for training competing machine learning systems, or for generating outputs intended for commercial purposes. In addition, AlphaFold 3 confidence scores for polymers can be strongly affected by surrounding non-polymer context such as ligands or ions. For polymer-focused studies like protein-protein interactions, contextual molecules may need to be added to obtain dependable scores. AlphaFold 2 avoids that extra complexity, although it may deliver slightly lower accuracy. For these reasons, Google DeepMind continues to maintain AlphaFold 2 as an important resource for research and development.

Running AlphaFold 2 in a Cloud GPU Environment

AlphaFold 2 requires the download of nearly 2.5 TB of genetic databases, including UniRef90, MGnify, BFD, and others. Because of that, you will need to attach block storage to store these databases. A capacity of around 2.5 TB is appropriate for AlphaFold 2, while AlphaFold 3 typically requires about 1 TB.

Step 1: Environment Setup (AF2)

Select an Ubuntu image prepared for AI and machine learning workloads so that NVIDIA drivers and Docker are already installed.

Connect to your GPU-enabled virtual machine through SSH:

You should also make sure the system is fully updated.

Refresh the local package index and upgrade installed packages.

sudo apt update && sudo apt upgrade -y

Next, download the genetic databases and the model parameters. This step can take a while to complete.

scripts/download_all_data.sh /path/to/your/storage > download.log 2> download_error.log &

Step 3: Build the Docker Image and Install Dependencies

docker build -f docker/Dockerfile -t alphafold .
pip3 install -r docker/requirements.txt

Step 4: Run the Model

python3 docker/run_docker.py \
  --fasta_paths=your_protein.fasta \
  --max_template_date=2022-01-01 \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

Running AlphaFold 3 in a Cloud GPU Environment

This section also explains how to run AlphaFold 3 on a GPU-enabled cloud instance. Keep in mind that this model is restricted to non-commercial use. You must submit a request to gain access to the model parameters, and approval is usually granted within two to three business days.

Step 1: Environment Setup (AF3)

Select an Ubuntu image tailored for AI and machine learning use so that NVIDIA drivers and Docker are already available.

Connect to your GPU-enabled server over SSH:

Step 2: Clone the Repository

Install git if needed and download the AlphaFold 3 repository:

git clone https://github.com/google-deepmind/alphafold3.git
cd alphafold3

Step 3: Run the AlphaFold 3 Model

docker build -t alphafold3 -f docker/Dockerfile .

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DB_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output

FAQ

What are the main differences between AlphaFold 2 and AlphaFold 3?

AlphaFold 2 transformed protein structure prediction, while AlphaFold 3 broadens the scope even further. The major differences include:

  • Molecular Scope: AlphaFold 2 concentrates primarily on proteins. AlphaFold 3 can predict complexes that involve DNA, RNA, ligands, and ions.
  • Architecture: AlphaFold 2 uses the Evoformer module, whereas AlphaFold 3 uses a simplified Pairformer together with a diffusion-based head.
  • Licensing: This is the most important distinction for many users. AlphaFold 2 is available under the Apache 2.0 license, which allows commercial use. AlphaFold 3 is currently limited to non-commercial use.

Why is 2.5 TB of storage required for AlphaFold 2?

The model itself does not consume most of that space. The real storage demand comes from the genetic databases. AlphaFold 2 depends on Multiple Sequence Alignment (MSA) to analyze how proteins evolved, and to do that it must search very large datasets such as UniRef90, MGnify, and the Big Fantastic Database (BFD).

Note: If storage is limited, AlphaFold 3 requires a smaller database footprint of roughly 1 TB compared with the full AlphaFold 2 setup.

How should the $pLDDT$ confidence scores be interpreted?

The pLDDT (predicted Local Distance Difference Test) is a residue-level confidence score on a scale from 0 to 100:

  • 90: High confidence; these regions are likely very accurate and are suitable for detailed structural analysis.
  • 70 – 90: Good confidence; the backbone is likely predicted correctly.
  • 50 – 70: Low confidence; interpret these regions carefully.
  • < 50: Very low confidence; these segments are often intrinsically disordered, meaning they may not adopt a fixed three-dimensional structure on their own.

Can AlphaFold run on a standard virtual machine without a GPU?

In principle, inference could be performed on a CPU, but it would be impractically slow. Complex structures that a GPU can process in minutes might require days or even weeks on a CPU. In addition, the AlphaFold Docker images are optimized for NVIDIA CUDA. For practical research use, a GPU-enabled cloud instance is essentially necessary.

What is a SMILES string, and why does AlphaFold 3 require it?

SMILES stands for Simplified Molecular-Input Line-Entry System. It is a notation format that expresses chemical structures as text strings. Because AlphaFold 3 can model how proteins interact with small-molecule ligands, the ligand structure is supplied as a SMILES string, such as CC(=O)OC1=CC=CC=C1C(=O)O for Aspirin.

Is the predicted structure the final answer?

Not always. Although AlphaFold is remarkably accurate, it remains a predictive system rather than an experimental measurement. In molecular biology and drug discovery, AlphaFold is most useful for producing testable hypotheses that can direct experimentally validated wet-lab research.

Conclusion

Congratulations on reaching the end. You have now, ideally successfully, deployed AlphaFold 2 and/or AlphaFold 3 in a GPU-based cloud environment using attached block storage. AlphaFold lowers the barrier to entry in structural biology and allows researchers around the world to gain insights that previously could have required years of costly experimental investigation. At the same time, some scientists warn that using AlphaFold predictions alone, especially in drug discovery, may lead to incorrect mechanistic interpretations if those predictions are not validated experimentally. This reinforces the ongoing importance of combining computational predictions with laboratory confirmation. Beyond its immediate scientific value, AlphaFold also demonstrates how artificial intelligence can address highly complex scientific problems and offers a preview of how computational tools will continue to transform our understanding of the natural world and our capacity to solve urgent challenges in scientific discovery.

Source: digitalocean.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

Moderne Hosting Services mit Cloud Server, Managed Server und skalierbarem Cloud Hosting für professionelle IT-Infrastrukturen

LangSmith Guide for Reliable AI Agents

AI/ML, Tutorial
VijonaYesterday at 9:25 Building More Reliable AI Agents with LangSmith Content1 Introduction2 Key Takeaways3 What Is LangSmith and When Is It Useful?4 The Agent Debugging Problem LangSmith Solves5 Quickstart: Trace…