User guide: igdiscover run_spatial

Overview

IgDiscover includes module run_spatial that works with single-cell/spatial BCR and TCR Pacbio data processed with MAS-seq bioinformatics workflow.

To start an analysis, you need:

  1. A FASTA demultiplexed file(s) with HiFi Pacbio reads obtained from the last step of MAS-seq bioinformatics workflow - isoseq3 groupdedup.

  2. A database of V/D/J/C genes. (four FASTA files named V.fasta, D.fasta, J.fasta, C.fasta)

  3. A joint reference FASTA file of BCR C genes and TCR C genes.

Description of input parameters

Module igdiscover run_spatial has five mandatory parameters.

smrt_barcodes

SMRT barcodes that correspond to the sample(s) that you want to analyse together. As result, the obtained clonotypes will be shared between them. If you specify more than one barcode, separate them with commas without spaces. Example:

--smrt_barcodes bc1001,bc1002,bc1003
folder

Directory where the analysis will take place. It should contain subdirectory reads with FASTA file(s) for the analysis.

receptor_type

Defines which receptor type will be analysed. Takes values Ig or TCR.

database

Path to the Ig or TCR database. The folder should include V, D, J and C fasta files.

C_fasta

Path to the constant genes reference FASTA file. The file should include all Ig and TCR constant genes.

Description of output files

final

In the analysis directory the final subdirectory is created at the end of the pipeline. It has two files for each chain (Ig or TCR): *chain*_clonotypes_members.tsv and *chain*_clonotypes.tsv, which are the standard output files of igdiscover clonotypes. Additionally, it includes .tsv files with count matrices for each chain for each SMRT barcode: *smrt_barcode*_*chain*_count_matrix.tsv. Th columns of the matrix are the clonotypes and the rows are cell barcodes (already reverse complemented and with ‘-1’ added at the end for compatibility with cell ranger output).

tmp

Temporary files are stored in the tmp subdirectory. Temporary files include minimap2_output.paf - the output of minimap2 aligner. It is used to split the initial FASTA file(s) to Ig and TCR related sequences based on their alignment scores on Ig or TCR constant genes (from C_fasta file). Output files of igblast and igdiscover augment functions are also stored here.

Analysis steps

To run an analysis, proceed as follows.

  1. Create the analysis directory.

    First, pick a name for your analysis. We will use spatial_experiment in the following. Then create the corresponding directory with reads subdirectory inside:

    mkdir spatial_experiment
    mkdir spatial_experiment/reads
    

    Add the demultiplexed FASTA files to the reads subdirectory naming them as *smrt_barcode*.fasta

  2. Run the analysis for BCR

    Once the necessary directories have been initialized, you can run the analysis itself. To obtain BCR clone count matrices run igdiscover run_spatial following the example:

    igdiscover run_spatial \
    --smrt_barcodes bc2002 \
    --folder ~/path/spatial_experiment \
    --receptor_type Ig \
    --database ~/path/BCR_db \
    --C_fasta ~/path/ref_C.fasta
    
  3. Run the analysis for TCR

    Similar to BCR analysis, run analysis of TCR clones:

    igdiscover run_spatial \
    --smrt_barcodes bc2002 \
    --folder ~/path/spatial_experiment \
    --receptor_type TCR \
    --database ~/path/TCR_db \
    --C_fasta ~/path/ref_C.fasta