Command Line Tools

sequence_unet also contains several convinience scripts to help work with trained models.

sequence_unet

Generate model predictions from Sequence UNET like models from Fasta or ProteinNet data.

usage: sequence_unet [-h] [--proteinnet PROTEINNET] [--fasta FASTA]
                     [--contacts] [--layers LAYERS] [--wide] [--pssm]
                     [--model_dir MODEL_DIR] [--download]
                     M

Positional Arguments

M

Model to predict with

Input Data

--proteinnet, -p

ProteinNet file

--fasta, -f

Fasta file

Options

--contacts, -c

Use contact graph input (required for pregraph models)

Default: False

--layers, -l

Number of layers in bottom UNET modeli (all pretrained models have 6 layers)

Default: 6

--wide, -w

Output a wide table

Default: False

--pssm, -s

Convert output frequency predictions to PSSMs

Default: False

--model_dir, -m

Directory to locate/download model files to

--download, -d

Download model if not located

Default: False

filter_fasta

Filter a Fasta file to only include records based on a list of IDs

usage: filter_fasta [-h] [--gzip] [--uniprot] [--iupac] F I

Positional Arguments

F

Fasta file

I

File containing a list of IDs

Named Arguments

--gzip, -g

Fasta file is gzipped

Default: False

--uniprot, -u

Fasta file is Uniprot formatted and ID list contains Uniprot IDs

Default: False

--iupac, -i

Filter to only include cannonical amino acids

Default: False

split_fasta

Split a fasta file into sub-files

usage: split_fasta [-h] [--outdir OUTDIR] [--files FILES] [--seqs SEQS] F

Positional Arguments

F

Fasta file to split

Named Arguments

--outdir, -o

Output directory

Default: “.”

--files, -n

Number of files to split into

Default: 0

--seqs, -s

Number of sequences per file

Default: 0