Key Concepts - Amazon Bio Discovery

Projects

Projects are the top-level organizational unit in Amazon Bio Discovery. They provide a centralized workspace for related antibody design activities.

What Projects Include

Experiments: All computational runs and their results
Files: Input data, structures, and generated outputs
Collaborators: Team members with defined access levels
Resources: Shared recipes, modules, and configurations

Project Benefits

Organize related work in one place
Control access and permissions
Track resource usage and costs
Maintain experiment history and lineage

Best Practices

Use descriptive names that reflect the research objective
Include detailed descriptions and objectives
Set up appropriate collaborator permissions early
Regularly review and organize project contents

Recipes

Recipes define computational workflows for antibody design. They specify which modules to use, how they connect, and what parameters to apply.

Recipe Types

Hosted Recipes: Pre-built, validated workflows maintained by AWS
Custom Recipes: User-created workflows using the recipe builder
Shared Recipes: Custom recipes shared within organizations

Recipe Components

Input Requirements: File types and formats needed
Module Chain: Sequence of computational steps
Parameters: Configurable settings for each module
Output Specifications: Types of results generated

Common Recipe Patterns

De Novo Design: Generate new antibodies from scratch
Optimization: Improve existing antibody properties
Analysis Only: Evaluate and score existing sequences
Hybrid Workflows: Combine multiple design strategies

Experiments

Experiments are individual executions of recipes with specific input data and parameter configurations.

Experiment Lifecycle

Configuration: Select recipe, upload data, set parameters
Validation: Check inputs and estimate costs
Execution: Run computational workflow
Analysis: Review results and select candidates
Action: Download data or send to wet lab

Experiment Parameters

Input Files: Target structures, seed sequences
Design Goals: Number of candidates, optimization targets
Constraints: Sequence regions to preserve or modify
Scoring Criteria: Properties to evaluate and rank

Cost Considerations

Experiments consume Experiment Units (EU) based on computational complexity:

Simple (0.5 EU): Basic analysis and small-scale design
Standard (1.0 EU): Moderate complexity workflows
Complex (1.5 EU): Large-scale or computationally intensive tasks

Modules

Modules are the fundamental computational building blocks that perform specific tasks in antibody design workflows.

Module Categories

Design Modules: Generate new antibody sequences
- De novo design algorithms
- Directed evolution methods
- Structure-based design tools
Score Modules: Evaluate antibody properties
- Binding affinity prediction
- Developability assessment
- Immunogenicity risk scoring
Design and Score Modules: Combine sequence generation with property evaluation in a single workflow, enabling iterative design guided by scoring feedback
- IntelliFold
- Chai1
- EvoProtGrad
- Boltz1

Module Properties

Inputs: Required data types and formats
Outputs: Generated results and file types
Parameters: Configurable settings and options
Dependencies: Required upstream modules or data

Custom Modules / Import Modules (Beta)

Advanced users can create custom modules by:

Containerizing algorithms using Docker
Defining input/output schemas
Providing test data and validation
Publishing for team or organization use

Note

The Amazon Bio Discovery Bring Your Own Module and the Customer Model Training features will each be treated as a "Beta Service" under the AWS Service Terms.

AWS recommends consulting with your own legal and other advisors to understand and ensure your import and use of any models in Amazon Bio Discovery complies with the terms or licenses applicable to the models you've selected to import.

Data Flow

Understanding how data flows through Amazon Bio Discovery helps optimize your workflows:

Input Data

Target Structures: PDB files defining binding targets
Seed Antibodies: FASTA sequences for optimization
Experimental Data: Previous results for training

Processing Pipeline

Data validation and preprocessing
Module execution in defined sequence
Intermediate result storage and passing
Final result compilation and scoring

Output Data

Candidate Sequences: Generated antibody designs
Property Scores: Predicted characteristics
Analysis Reports: Summaries and visualizations
Raw Data: Detailed computational outputs

Integration Points

Wet Lab Integration

Amazon Bio Discovery connects computational design with experimental validation:

Direct submission to partner laboratories
Standardized assay protocols
Result integration and analysis
Iterative design-test cycles

External Tools

Integration with common research tools:

Structure visualization software
Sequence analysis platforms
Laboratory information systems
Data analysis and plotting tools

Cross-Region Support

Amazon Bio Discovery will automatically select the optimal region within your geography to process your inference requests. This maximizes available compute resources, model availability, and delivers the best customer experience. Your data will remain stored only in the region where the request originated, however, input prompts and output results may be processed outside that region. All data will be transmitted encrypted across Amazon's secure network.

Amazon Bio Discovery will securely route your inference requests to available compute resources within the geographic area where the request originated, as follows:

Inference requests originating in the United States will be processed within the United States.
If an inference request originates in an area not listed, they will be processed by default within the United States.