Advanced topics¶
IgDiscover itself does not (yet) come with all imaginable analysis facilities built into it.
However, it creates many files (mostly with tables) that can be used for custom analysis.
For example, all .tsv
files (in particular assigned.tsv.gz
and
candidates.tsv
)
can be opened and inspected in a spreadsheet application such as LibreOffice. From there,
you can do basic tasks such as sorting from the menu of that application.
Often, these facilities are not enough, however, and some basic understanding of the command-line is helpful. Clearly, this is not as convenient as working in a graphical user interface (GUI), but we do not currently have the resources to provide one for IgDiscover. To alleviate this somewhat, we provide here instructions for a few things that you may want to do with the IgDiscover result files.
Extract all sequences that match any database gene exactly¶
The candidates.tsv
file tells you for each discovered sequence how often an exact match
of that sequence was found in your input reads. A high number of exact matches is a good
indication that the candidate is actually a new gene or allele. In order to find the original
reads that correspond to those matches, you can run this command in the analysis directory,
replacing iteration-01
with the directory in which the filtered.tsv.gz
file is
located:
igdiscover run iteration-01/exact.tsv
This command will extract all rows from iteration-01/filtered.tsv.gz
for which the
V_errors
column is zero.
Extra configuration settings¶
Some configuration settings are not documented in the default igdiscover.yaml
file
since they rarely need to be changed.
# Leave empty or choose a species name supported by IgBLAST:
# human, mouse, rabbit, rat, rhesus_monkey
# This setting is not used anywhere except that it is passed
# to IgBLAST using the -organism option. Since we provide IgBLAST
# with our own gene databases, it seems this has no effect.
species:
# Which program to use for computing multiple alignments. This is used for
# computing consens sequences.
# Choose 'mafft', 'clustalo', 'muscle' or 'muscle-fast'.
# 'muscle-fast' runs muscle with parameters "-maxiters 1 -diags".
#
#multialign_program: muscle-fast