Command Line Tools
sequence_unet also contains several convinience scripts to help work with trained models.
sequence_unet
Generate model predictions from Sequence UNET like models from Fasta or ProteinNet data.
usage: sequence_unet [-h] [--proteinnet PROTEINNET] [--fasta FASTA]
[--contacts] [--layers LAYERS] [--wide] [--pssm]
[--model_dir MODEL_DIR] [--download]
M
Positional Arguments
- M
Model to predict with
Input Data
- --proteinnet, -p
ProteinNet file
- --fasta, -f
Fasta file
Options
- --contacts, -c
Use contact graph input (required for pregraph models)
Default: False
- --layers, -l
Number of layers in bottom UNET modeli (all pretrained models have 6 layers)
Default: 6
- --wide, -w
Output a wide table
Default: False
- --pssm, -s
Convert output frequency predictions to PSSMs
Default: False
- --model_dir, -m
Directory to locate/download model files to
- --download, -d
Download model if not located
Default: False
filter_fasta
Filter a Fasta file to only include records based on a list of IDs
usage: filter_fasta [-h] [--gzip] [--uniprot] [--iupac] F I
Positional Arguments
- F
Fasta file
- I
File containing a list of IDs
Named Arguments
- --gzip, -g
Fasta file is gzipped
Default: False
- --uniprot, -u
Fasta file is Uniprot formatted and ID list contains Uniprot IDs
Default: False
- --iupac, -i
Filter to only include cannonical amino acids
Default: False
split_fasta
Split a fasta file into sub-files
usage: split_fasta [-h] [--outdir OUTDIR] [--files FILES] [--seqs SEQS] F
Positional Arguments
- F
Fasta file to split
Named Arguments
- --outdir, -o
Output directory
Default: “.”
- --files, -n
Number of files to split into
Default: 0
- --seqs, -s
Number of sequences per file
Default: 0