sequence_unet.graph_cnn

Simple GraphCNN TensorFlow Keras layer and supporting functions.

class GraphCNN(*args, **kwargs)

Bases: Layer

Graph CNN Keras Layer

A simple GraphCNN Keras layer, which takes a feature matrix and edge matrix to calculate new features for each node of the graph. The matrix multiplication makes this opperation able to be performed on any sized input graph.

units

Number of features calculated in the output layer

Type:

int

activation

Name of activation function applied

Type:

str

activation_function

Activation function applied

Type:

function

Notes

The layer implements a simple GraphCNN, which calculates new features by combining features from neighbouring nodes using the trained weights and weighting by distance to each node.

Expected input/output:

Input
Node features:

(B, N, F) feature arrays with B batches of N x F matrices, representing the F features for each of the N nodes in the graph.

Contact graph:

(B, N, N) edge weight matrix giving the weighting for each edge in the graph.

Output

(B, N, F) array of B batches of new N x F feature arrays.

build(input_shape)

Build this layer.

Calculate and initialise layer parameters based on the input tensors.

Parameters:

input_shape (tuple) – Tuple of integer dimension defining the input shape.

call(inputs)

Call the layer on input tensors.

Parameters:

input (list) – List of input tensors, expected to contain B batches of N x F node feature matrices and paired N x N edge matrics.

Returns:

B x N x F output tensor with new node features.

Return type:

tf.Tensor

get_config()

Generate a configuration dictionary for serialisation

Create the serialisation dictionary used by Keras save_model for serialising models including this layer.

Returns:

Serialisation dictionary

Return type:

dict

contact_graph(record, binary=False, contact_distance=1000)

Calculate a residue contact graph from a proteinnetpy.record.ProteinNetRecord.

Calculate a contact matrix for the amino acids in a ProteinNet Record. This defines the edges of a weighted graph showing which residues connect to each other. The graph is determined from the proteins distance matrix, filtering to only include cells below the specified contact distance cutoff. Additional self connections and any missing connections to the neighbouring two residues in the primary sequence are added. Then each row is normalised to weight connections by inverse distance.

Parameters:
  • record (proteinnetpy.record.ProteinNetRecord) – ProteinNet Record to calculate the contact graph for.

  • binary (bool) – Use binary contacts instead of similarity scaled.

  • contact_distance (numeric) – Maximum distance in nanometers between two residue for them to be considered in contact.

Returns:

A contact graph matrix, where each cell represents the similarity scaled contact between a pair of amino acids.

Return type:

numpy.array