Help / Documentation

How to use the Abalign Web Server

Abalign is a comprehensive multiple sequence alignment platform for B-cell receptor immune repertoires. Follow these steps to analyze your antibody sequences:

Step-by-Step Guide
  1. Input Sequences:
    • Paste your antibody sequences in FASTA format into the text area, OR
    • Upload a FASTA file containing your sequences
    • Click "Load Example" to see the required format
    • Supports both protein and nucleotide sequences
  2. Configure Parameters:
    • Task Name: Optional name to identify your analysis
    • Numbering Scheme: IMGT, Kabat, Chothia, or Martin
    • Species: Select from 20+ supported species including human, mouse, macaque, etc.
    • Chain Type: Heavy or Light chain
    • Nucleotide Sequence: Check if input is nucleotide (will be translated to protein)
    • Sequence Deduplication: Remove duplicate sequences
  3. Clonotype Parameters (Optional):
    • CDR3 Identity Threshold: Set the threshold for CDR3 sequence identity to group sequences into the same clonotype (Range: 0-1, Default: 1.0). A value of 1.0 means CDR3 sequences must be identical, while lower values (e.g., 0.6) allow sequences with 60 percent CDR3 identity to be grouped together
    • Shannon Entropy Threshold: Set the threshold for Shannon entropy at each CDR3 position (Range: 0-10, Default: 1.0). Positions with entropy below this threshold will be represented with ? in the consensus sequence. Lower values provide more stringent filtering
  4. Notification (Optional): Provide email address for completion notification
  5. Submit Analysis: Click "Run Abalign" to start processing
  6. Results: You'll be redirected to a processing page. Bookmark the result URL for later access

Input Format

FASTA Format Example
>Seq1_Heavy
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKDRLGRGNFDYWGQGTLVTVSS
>Seq2_Heavy
EVQLVETGGGLVQPGGSLRLSCAASGFTFSDYYMYWVRQAPGKGLEWVSAINSGGRSTYYPDSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARGPYYAMDYWGQGTLVTVSS
Requirements
  • Standard FASTA format with sequence headers starting with ">"
  • Both protein and nucleotide sequences are supported
  • Maximum file size: 16 MB
  • No limit on number of sequences, but processing time increases with size

Output Files Description

Upon completion, you will receive a ZIP file containing multiple analysis results:

Universal Output Files (All Analyses)
File Name Description
alignment_output.fas Multiple sequence alignment of antibody variable regions
alignment_output.fas.overview.csv Main results file - Comprehensive sequence information and statistics
alignment_output.fas.temp.txt Antibody alignment separated by framework regions (FRs) and complementarity determining regions (CDRs) with * delimiters
alignment_output.fas.number.txt MSA unified column numbering file based on antibody numbering scheme, providing standardized position numbers for each column in the alignment
v_gene_info.txt V and J gene assignment information
v_gene_info.txt.vabundance.txt V and J gene abundance statistics
v_gene_info.txt.tempv.txt V and J gene temporary analysis file
v_gene_info.txt.clonotype.csv Clonotype analysis results
v_gene_info.txt.clonotype_index.csv Clonotype and corresponding sequence indices
v_gene_info.txt.clonotype_seqs.csv Comprehensive clonotype and sequence information
Additional Files for Nucleotide Input
File Name Description
translated_protein.fasta Protein sequences translated from nucleotide input using six reading frames
alignment_output.fas.pro Multiple sequence alignment of translated protein variable regions
alignment_output.fas.temp.txt.pro Protein alignment separated by FRs and CDRs with * delimiters
alignment_output.fas.number.txt.pro MSA unified column numbering file for protein sequences based on antibody numbering scheme, providing standardized position numbers for each column in the protein alignment
Recommendation: Start with alignment_output.fas.overview.csv for comprehensive sequence analysis results.

Processing Information

  • Processing Time: Varies based on sequence number and length (typically 1-10 minutes)
  • Concurrent Jobs: Maximum 4 jobs can run simultaneously
  • Results Storage: Results are temporarily stored and will be available for download after completion
  • Status Monitoring: The processing page automatically updates when your job is complete
Note: Make sure to bookmark or save the result URL provided on the processing page, as it contains your unique job identifier.
Important Data Retention Policy
  • Your uploaded data is not permanently stored - Input files are deleted immediately after processing
  • Results are available for 14 days only - All result files will be automatically deleted after 14 days
  • Download your results promptly - We recommend downloading your results immediately after completion
  • No data recovery - Once deleted, results cannot be recovered