Command Line Tools
sequence_unet also contains several convinience scripts to help work with trained models.
sequence_unet
Generate model predictions from Sequence UNET like models.
Three modes are supported:
- ClinVar: Use –proteinnet path/to/protiennet, –clinvar and –tsv path/to/clinvar_tsv where
the ClinVar TSV is formated like the output of extract_clinvar.R. The ProteinNet file must contain the PDB ids specified in the TSV for predictions to be generated for them.
- Fasta: Use –fasta path/to/fasta and optionally –tsv to specify a list of variants
to keep predictions for, where the TSV has columns gene, position, wt, mut and genes match the fasta IDs.
- ProteinNet: Use –proteinnet path/to/protiennet and optionally –tsv path/to/tsv to specify
predictions to keep, where the TSV has columns pdb_id, chain, position, wt, mut.
usage: sequence_unet [-h] [--tsv TSV] [--proteinnet PROTEINNET]
[--fasta FASTA] [--contacts] [--layers LAYERS] [--wide]
[--pssm] [--model_dir MODEL_DIR] [--download]
M
Positional Arguments
- M
Model to predict with
Input Data
- --tsv, -t
TSV file containing variants of interest
- --proteinnet, -p
ProteinNet file
- --fasta, -f
Fasta file
Options
- --contacts, -c
Use contact graph input
Default: False
- --layers, -l
Number of layers in bottom UNET model
Default: 6
- --wide, -w
Output a wide table
Default: False
- --pssm, -s
Convert output frequency predictions to PSSMs
Default: False
- --model_dir, -m
Directory to locate/download model files to
- --download, -d
Download model if not located
Default: False
filter_fasta
Filter a Fasta file to only include records based on a list of IDs
usage: filter_fasta [-h] [--gzip] [--uniprot] [--iupac] F I
Positional Arguments
- F
Fasta file
- I
File containing a list of IDs
Named Arguments
- --gzip, -g
Fasta file is gzipped
Default: False
- --uniprot, -u
Fasta file is Uniprot formatted and ID list contains Uniprot IDs
Default: False
- --iupac, -i
Filter to only include cannonical amino acids
Default: False
split_fasta
Split a fasta file into sub-files
usage: split_fasta [-h] [--outdir OUTDIR] [--files FILES] [--seqs SEQS] F
Positional Arguments
- F
Fasta file to split
Named Arguments
- --outdir, -o
Output directory
Default: “.”
- --files, -n
Number of files to split into
Default: 0
- --seqs, -s
Number of sequences per file
Default: 0