User guide: igdiscover run_spatial¶
Overview¶
IgDiscover includes module run_spatial that works with single-cell/spatial BCR and TCR Pacbio data processed with MAS-seq bioinformatics workflow.
To start an analysis, you need:
A FASTA demultiplexed file(s) with HiFi Pacbio reads obtained from the last step of MAS-seq bioinformatics workflow - isoseq3 groupdedup.
A database of V/D/J/C genes. (four FASTA files named
V.fasta
,D.fasta
,J.fasta
,C.fasta
)A joint reference FASTA file of BCR C genes and TCR C genes.
Description of input parameters¶
Module igdiscover run_spatial
has five mandatory parameters.
- smrt_barcodes
SMRT barcodes that correspond to the sample(s) that you want to analyse together. As result, the obtained clonotypes will be shared between them. If you specify more than one barcode, separate them with commas without spaces. Example:
--smrt_barcodes bc1001,bc1002,bc1003
- folder
Directory where the analysis will take place. It should contain subdirectory
reads
with FASTA file(s) for the analysis.- receptor_type
Defines which receptor type will be analysed. Takes values Ig or TCR.
- database
Path to the Ig or TCR database. The folder should include V, D, J and C fasta files.
- C_fasta
Path to the constant genes reference FASTA file. The file should include all Ig and TCR constant genes.
Description of output files¶
- final
In the analysis directory the
final
subdirectory is created at the end of the pipeline. It has two files for each chain (Ig or TCR): *chain*_clonotypes_members.tsv and *chain*_clonotypes.tsv, which are the standard output files ofigdiscover clonotypes
. Additionally, it includes .tsv files with count matrices for each chain for each SMRT barcode: *smrt_barcode*_*chain*_count_matrix.tsv. Th columns of the matrix are the clonotypes and the rows are cell barcodes (already reverse complemented and with ‘-1’ added at the end for compatibility with cell ranger output).- tmp
Temporary files are stored in the
tmp
subdirectory. Temporary files includeminimap2_output.paf
- the output of minimap2 aligner. It is used to split the initial FASTA file(s) to Ig and TCR related sequences based on their alignment scores on Ig or TCR constant genes (from C_fasta file). Output files of igblast and igdiscover augment functions are also stored here.
Analysis steps¶
To run an analysis, proceed as follows.
Create the analysis directory.
First, pick a name for your analysis. We will use
spatial_experiment
in the following. Then create the corresponding directory withreads
subdirectory inside:mkdir spatial_experiment mkdir spatial_experiment/reads
Add the demultiplexed FASTA files to the
reads
subdirectory naming them as*smrt_barcode*.fasta
Run the analysis for BCR
Once the necessary directories have been initialized, you can run the analysis itself. To obtain BCR clone count matrices run
igdiscover run_spatial
following the example:igdiscover run_spatial \ --smrt_barcodes bc2002 \ --folder ~/path/spatial_experiment \ --receptor_type Ig \ --database ~/path/BCR_db \ --C_fasta ~/path/ref_C.fasta
Run the analysis for TCR
Similar to BCR analysis, run analysis of TCR clones:
igdiscover run_spatial \ --smrt_barcodes bc2002 \ --folder ~/path/spatial_experiment \ --receptor_type TCR \ --database ~/path/TCR_db \ --C_fasta ~/path/ref_C.fasta