Unfolding the universe of possibilities..

Navigating the waves of the web ocean

Building A Graph Convolutional Network for Molecular Property Prediction

Artificial Intelligence

Tutorial to make molecular graphs and develop a simple PyTorch-based GCN

Photo by BoliviaInteligente on Unsplash

Artificial intelligence has taken the world by storm. Every week, new models, tools, and applications emerge that promise to push the boundaries of human endeavor. The availability of open-source tools that enable users to train and employ complex machine learning models in a modest number of lines of code have truly democratized AI; at the same time, while many of these off-the-shelf models may provide excellent predictive capabilities, their usage as black box models may deprive inquisitive students of AI of a deeper understanding of how they work and why they were developed in the first place. This understanding is particularly important in the natural sciences, where knowing that a model is accurate is not enough — it is also essential to know its connection to other physical theories, its limitations, and its generalizability to other systems. In this article, we will explore the basics of one particular ML model — a graph convolutional network — through the lens of chemistry. This is not meant to be a mathematically rigorous exploration; instead, we will try to compare features of the network with traditional models in the natural sciences and think about why it works as well as it does.

1. The need for graphs and graph neural networks

A model in chemistry or physics is usually a continuous function, say y=f(x₁, x₂, x₃, …, xₙ), in which x₁, x₂, x₃, …, xₙ are the inputs and y is the output. An example of such a model is the equation that determines the electrostatic interaction (or force) between two point charges q₁ and q₂ separated by a distance r present in a medium with relative permittivity εᵣ, commonly termed as Coulomb’s law.

Figure 1: The Coulomb equation as a model for electrostatic interactions between point charges (Image by author)

If we did not know this relationship but, hypothetically, had multiple datapoints each including the interaction between point charges (the output) and the corresponding inputs, we could fit an artificial neural network to predict the interaction for any given point charges for any given separation in a medium with a specified permittivity. In the case of this problem, admittedly ignoring some important caveats, creating a data-driven model for a physical problem is relatively straightforward.

Now consider the problem of prediction of a particular property, say solubility in water, from the structure of a molecule. First, there is no obvious set of inputs to describe a molecule. You could use various features, such as bond lengths, bond angles, number of different types of elements, number of rings, and so forth. However, there is no guarantee that any such arbitrary set is bound to work well for all molecules.

Second, unlike the example of the point charges, the inputs may not necessarily reside in a continuous space. For example, we can think of methanol, ethanol, and propanol as a set of molecules with increasing chain lengths; there is no notion, however, of anything between them — chain length is a discrete parameter and there is no way to interpolate between methanol and ethanol to get other molecules. Having a continuous space of inputs is essential to calculate derivatives of the model, which can then be used for optimization of the chosen property.

To overcome these problems, various methods for encoding molecules have been proposed. One such method is textual representation using schemes such as SMILES and SELFIES. There is a large body of literature on this representation, and I direct the interested reader to this helpful review. The second method involves representing molecules as graphs. While each method has its advantages and shortcomings, graph representations feel more intuitive for chemistry.

A graph is a mathematical structure consisting of nodes connected by edges that represent relationships between nodes. Molecules fit naturally into this structure — atoms become nodes, and bonds become edges. Each node in the graph is represented by a vector that encodes properties of the corresponding atom. Usually, a one-hot encoding scheme suffices (more on this in the next section). These vectors can be stacked to create a node matrix. Relationships between nodes — denoted by edges — can be delineated through a square adjacency matrix, wherein every element aᵢⱼ is either 1 or 0 depending on whether the two nodes i and j are connected by an edge or not respectively. The diagonal elements are set to 1, indicating a self-connection, which makes the matrix amenable to convolutions (as you will see in the next section). More complex graph representations can be developed, in which edge properties are also one-hot encoded in a separate matrix, but we shall leave that for another article. These node and adjacency matrices will serve as inputs to our model.

Figure 2: Representation of an acetamide molecule as a graph with one-hot encodings of atomic numbers of nodes (Image by author)

Typically, artificial neural network models accept a 1-dimensional vector of inputs. For multidimensional inputs, such as images, a class of models called convolutional neural networks was developed. In our case too we have 2-dimensional matrices as inputs, and therefore, need a modified network that can accept these as inputs. Graph neural networks were developed to operate on such node and adjacency matrices to convert them into appropriate 1-dimensional vectors that can then be passed through hidden layers of a vanilla artificial neural network to generate outputs. There are many types of graph neural networks, such as graph convolutional networks, message passing networks, graph attention networks, and so forth, which primarily differ in terms of the functions that exchange information between the nodes and edges in the graph. We shall take a closer look at graph convolutional networks due to their relative simplicity.

2. Graph convolution and pooling layers

Consider the initial state of your inputs. The node matrix represents the one-hot encoding of each atom in each row. For the sake of simplicity, let us consider a one-hot encoding of atomic numbers, wherein an atom with atomic number n will have a 1 at the nᵗʰ index and 0s everywhere else. The adjacency matrix represents the connections between the nodes. In its current state, the node matrix cannot be used as an input to an artificial neural network for the following reasons: (1) it is 2-dimensional, (2) it is not permutation-invariant, and (3) it is not unique. Permutation-invariance in this case implies that the input should remain the same no matter how you order the nodes; currently, the same molecule can be represented by multiple permutations of the same node matrix (assuming an appropriate permutation in the adjacency matrix as well). This is a problem since the network would treat different permutations as different inputs, when they should be treated as the same.

There is an easy solution to the first two issues — pooling. If the node matrix is pooled along the column dimension, then it will be reduced to a 1-dimensional vector that is permutation-invariant. Typically, this pooling is a simple mean pooling, which means that the final pooled vector contains the means of every column in the node matrix. However, this still does not solve the third problem — pooling the node matrices of two isomers, such as n-pentane and neo-pentane, will produce the same pooled vector.

To make the final pooled vectors unique, we need to incorporate some neighbor information in the node matrix. In the case of isomers, while their chemical formula is the same, their structure is not. A simple way to incorporate neighbor information is to perform some operation, such as a sum, for each node with its neighbors. This can be represented as the multiplication of the node and adjacency matrices (try it out on paper: the adjacency matrix times the node matrix produces an updated node matrix with each node vector equal to the sum of its neighbor node vectors with itself). Often, this sum is normalized by the degree (or number of neighbors) of each node by pre-multiplying with the inverse of the diagonal degree matrix, making this a mean over neighbors. Lastly, this product is post-multiplied by a weight matrix to make this operation parameterizable. This whole operation is termed as a graph convolution. An intuitive and simple form of a graph convolution is shown in Figure 3. A more mathematically rigorous and numerically stable form was provided in Thomas Kipf and Max Welling’s work, with a modified normalization of the adjacency matrix. The combination of convolution and pooling operations can also be interpreted as a non-linear form of an empirical group contribution method.

Figure 3: Graph convolution for an acetamide molecule (Image by author)

The final structure of the graph convolutional network is as follows — first, node and adjacency matrices are calculated for a given molecule. Multiple graph convolutions are then applied to these followed by pooling to produce a single vector containing all the information regarding the molecule. This is then passed through the hidden layers of a standard artificial neural network to produce an output. The weights of the hidden layers, pooling layer, and convolution layers are simultaneously determined through backpropagation applied to a regression-based loss function like mean-squared loss.

3. Implementation in code

Having discussed all the key ideas related to graph convolutional networks, we are ready to start building one using PyTorch. While there exists a flexible, high-performance framework for GNNs called PyTorch Geometric, we shall not make use of it, since our goal is to look under the hood and develop our understanding.

The tutorial is split into four major subsections — (1) creating graphs in an automated fashion using RDKit, (2) packaging the graphs into a PyTorch Dataset, (3) building the graph convolutional network architecture, and (4) training the network. The complete code, along with instructions to install and import the required packages, is provided in a GitHub repository with a link at the end of the article.

3.1. Creating graphs using RDKit

RDKit is a cheminformatics library that allows high-throughput access to properties of small molecules. We will need it for two tasks — getting the atomic number of each atom in a molecule to one-hot encode the node matrix and getting the adjacency matrix. We assume that molecules are provided in terms of their SMILES strings (which is true for most cheminformatics data). Additionally, to ensure that the sizes of node and adjacency matrices are uniform across all molecules — which they would not be by default, since the sizes of both are dependent on the number of atoms in a molecule — we pad the matrices with 0s. Finally, we shall try a small modification to the convolution that we have proposed above — we will replace the “1”s in the adjacency matrix with the reciprocals of the corresponding bond lengths. This way, the network will have more information regarding the geometry of the molecule, and it will also weight the convolutions around each node based on the bond lengths of the neighbors.

class Graph:
def __init__(
self, molecule_smiles: str,
node_vec_len: int,
max_atoms: int = None
):
# Store properties
self.smiles = molecule_smiles
self.node_vec_len = node_vec_len
self.max_atoms = max_atoms

# Call helper function to convert SMILES to RDKit mol
self.smiles_to_mol()

# If valid mol is created, generate a graph of the mol
if self.mol is not None:
self.smiles_to_graph()

def smiles_to_mol(self):
# Use MolFromSmiles from RDKit to get molecule object
mol = Chem.MolFromSmiles(self.smiles)

# If a valid mol is not returned, set mol as None and exit
if mol is None:
self.mol = None
return

# Add hydrogens to molecule
self.mol = Chem.AddHs(mol)

def smiles_to_graph(self):
# Get list of atoms in molecule
atoms = self.mol.GetAtoms()

# If max_atoms is not provided, max_atoms is equal to maximum number
# of atoms in this molecule.
if self.max_atoms is None:
n_atoms = len(list(atoms))
else:
n_atoms = self.max_atoms

# Create empty node matrix
node_mat = np.zeros((n_atoms, self.node_vec_len))

# Iterate over atoms and add to node matrix
for atom in atoms:
# Get atom index and atomic number
atom_index = atom.GetIdx()
atom_no = atom.GetAtomicNum()

# Assign to node matrix
node_mat[atom_index, atom_no] = 1

# Get adjacency matrix using RDKit
adj_mat = rdmolops.GetAdjacencyMatrix(self.mol)
self.std_adj_mat = np.copy(adj_mat)

# Get distance matrix using RDKit
dist_mat = molDG.GetMoleculeBoundsMatrix(self.mol)
dist_mat[dist_mat == 0.] = 1

# Get modified adjacency matrix with inverse bond lengths
adj_mat = adj_mat * (1 / dist_mat)

# Pad the adjacency matrix with 0s
dim_add = n_atoms – adj_mat.shape[0]
adj_mat = np.pad(
adj_mat, pad_width=((0, dim_add), (0, dim_add)), mode=”constant”
)

# Add an identity matrix to adjacency matrix
# This will make an atom its own neighbor
adj_mat = adj_mat + np.eye(n_atoms)

# Save both matrices
self.node_mat = node_mat
self.adj_mat = adj_mat

3.2. Packaging graphs in a Dataset

PyTorch provides a handy Dataset class to store and access various kinds of data. We will use that to store the node and adjacency matrices and output for each molecule. Note that it is not mandatory to use this Dataset interface to handle data; nonetheless, using this abstraction makes subsequent steps simpler. We need to define two main methods for our class GraphData that inherits from the Dataset class: a __len__ method to get the size of the dataset and crucially a __getitem__ method to fetch the input and output for a given index.

class GraphData(Dataset):
def __init__(self, dataset_path: str, node_vec_len: int, max_atoms: int):
# Save attributes
self.node_vec_len = node_vec_len
self.max_atoms = max_atoms

# Open dataset file
df = pd.read_csv(dataset_path)

# Create lists
self.indices = df.index.to_list()
self.smiles = df[“smiles”].to_list()
self.outputs = df[“measured log solubility in mols per litre”].to_list()

def __len__(self):
return len(self.indices)

def __getitem__(self, i: int):
# Get smile
smile = self.smiles[i]

# Create MolGraph object using the Graph abstraction
mol = Graph(smile, self.node_vec_len, self.max_atoms)

# Get node and adjacency matrices
node_mat = torch.Tensor(mol.node_mat)
adj_mat = torch.Tensor(mol.adj_mat)

# Get output
output = torch.Tensor([self.outputs[i]])

return (node_mat, adj_mat), output, smile

Since we have defined our own customized manner of returning the node and adjacency matrices, outputs, and SMILES strings, we need to define a custom function to collate the data, that is, package it into a batch, which is then passed on to the network. This ability to train neural networks by passing batches of data, rather than individual datapoints, and using mini-batch gradient descent provides a delicate balance between accuracy and compute efficiency. The collate function that we will define below will essentially collect all the data objects, stratify them into their categories, stack them in lists, convert them into PyTorch Tensors, and recombine these tensors such that they are returned in the same manner as that of our GraphData class.

def collate_graph_dataset(dataset: Dataset):
# Create empty lists of node and adjacency matrices, outputs, and smiles
node_mats = []
adj_mats = []
outputs = []
smiles = []

# Iterate over list and assign each component to the correct list
for i in range(len(dataset)):
(node_mat,adj_mat), output, smile = dataset[i]
node_mats.append(node_mat)
adj_mats.append(adj_mat)
outputs.append(output)
smiles.append(smile)

# Create tensors
node_mats_tensor = torch.cat(node_mats, dim=0)
adj_mats_tensor = torch.cat(adj_mats, dim=0)
outputs_tensor = torch.stack(outputs, dim=0)

# Return tensors
return (node_mats_tensor, adj_mats_tensor), outputs_tensor, smiles

3.3. Building the graph convolutional network architecture

Having completed the data processing aspects of the code, we now turn towards building the model itself. We shall build our own convolution and pooling layers for the sake of perspicuity, but more advanced developers among you can easily swap these out with more complex, pre-defined layers from the PyTorch Geometric module. The ConvolutionLayer essentially does three things — (1) calculation of the inverse diagonal degree matrix from the adjacency matrix, (2) multiplication of the four matrices (D⁻¹ANW), and (3) application of a non-linear activation function to the layer output. As with other PyTorch classes, we will inherit from the Module base class that already has definitions for methods like forward.

class ConvolutionLayer(nn.Module):
def __init__(self, node_in_len: int, node_out_len: int):
# Call constructor of base class
super().__init__()

# Create linear layer for node matrix
self.conv_linear = nn.Linear(node_in_len, node_out_len)

# Create activation function
self.conv_activation = nn.LeakyReLU()

def forward(self, node_mat, adj_mat):
# Calculate number of neighbors
n_neighbors = adj_mat.sum(dim=-1, keepdims=True)
# Create identity tensor
self.idx_mat = torch.eye(
adj_mat.shape[-2], adj_mat.shape[-1], device=n_neighbors.device
)
# Add new (batch) dimension and expand
idx_mat = self.idx_mat.unsqueeze(0).expand(*adj_mat.shape)
# Get inverse degree matrix
inv_degree_mat = torch.mul(idx_mat, 1 / n_neighbors)

# Perform matrix multiplication: D^(-1)AN
node_fea = torch.bmm(inv_degree_mat, adj_mat)
node_fea = torch.bmm(node_fea, node_mat)

# Perform linear transformation to node features
# (multiplication with W)
node_fea = self.conv_linear(node_fea)

# Apply activation
node_fea = self.conv_activation(node_fea)

return node_fea

Next, let us construct the PoolingLayer. This layer only performs one operation, that is, a mean along the second dimension (number of nodes).

class PoolingLayer(nn.Module):
def __init__(self):
# Call constructor of base class
super().__init__()

def forward(self, node_fea):
# Pool the node matrix
pooled_node_fea = node_fea.mean(dim=1)
return pooled_node_fea

Finally, we shall define create the ChemGCN class containing the definitions of convolutional, pooling, and hidden layers. Typically, this class should have a constructor that defines the structure and ordering of each of these layers, and a forward method that accepts the input (in our case, the node and adjacency matrices) and produces the output. We will apply the LeakyReLU activation function to all of the layer outputs. Also, we shall use dropout to minimize overfitting.

class ChemGCN(nn.Module):
def __init__(
self,
node_vec_len: int,
node_fea_len: int,
hidden_fea_len: int,
n_conv: int,
n_hidden: int,
n_outputs: int,
p_dropout: float = 0.0,
):
# Call constructor of base class
super().__init__()

# Define layers
# Initial transformation from node matrix to node features
self.init_transform = nn.Linear(node_vec_len, node_fea_len)

# Convolution layers
self.conv_layers = nn.ModuleList(
[
ConvolutionLayer(
node_in_len=node_fea_len,
node_out_len=node_fea_len,
)
for i in range(n_conv)
]
)

# Pool convolution outputs
self.pooling = PoolingLayer()
pooled_node_fea_len = node_fea_len

# Pooling activation
self.pooling_activation = nn.LeakyReLU()

# From pooled vector to hidden layers
self.pooled_to_hidden = nn.Linear(pooled_node_fea_len, hidden_fea_len)

# Hidden layer
self.hidden_layer = nn.Linear(hidden_fea_len, hidden_fea_len)

# Hidden layer activation function
self.hidden_activation = nn.LeakyReLU()

# Hidden layer dropout
self.dropout = nn.Dropout(p=p_dropout)

# If hidden layers more than 1, add more hidden layers
self.n_hidden = n_hidden
if self.n_hidden > 1:
self.hidden_layers = nn.ModuleList(
[self.hidden_layer for _ in range(n_hidden – 1)]
)
self.hidden_activation_layers = nn.ModuleList(
[self.hidden_activation for _ in range(n_hidden – 1)]
)
self.hidden_dropout_layers = nn.ModuleList(
[self.dropout for _ in range(n_hidden – 1)]
)

# Final layer going to the output
self.hidden_to_output = nn.Linear(hidden_fea_len, n_outputs)

def forward(self, node_mat, adj_mat):
# Perform initial transform on node_mat
node_fea = self.init_transform(node_mat)

# Perform convolutions
for conv in self.conv_layers:
node_fea = conv(node_fea, adj_mat)

# Perform pooling
pooled_node_fea = self.pooling(node_fea)
pooled_node_fea = self.pooling_activation(pooled_node_fea)

# First hidden layer
hidden_node_fea = self.pooled_to_hidden(pooled_node_fea)
hidden_node_fea = self.hidden_activation(hidden_node_fea)
hidden_node_fea = self.dropout(hidden_node_fea)

# Subsequent hidden layers
if self.n_hidden > 1:
for i in range(self.n_hidden – 1):
hidden_node_fea = self.hidden_layers[i](hidden_node_fea)
hidden_node_fea = self.hidden_activation_layers[i](hidden_node_fea)
hidden_node_fea = self.hidden_dropout_layers[i](hidden_node_fea)

# Output
out = self.hidden_to_output(hidden_node_fea)

return out

3.4. Training the network

We have built the tools required to train our model and make predictions. In this section, we shall write helper functions to train and test our model, and a write script to run a workflow that makes graphs, builds the network, and trains the model.

First, we shall define a Standardizer class to standardize our outputs. Neural networks like to deal with relatively small numbers that do not vary wildly from each other. Standardization helps with that.

class Standardizer:
def __init__(self, X):
self.mean = torch.mean(X)
self.std = torch.std(X)

def standardize(self, X):
Z = (X – self.mean) / (self.std)
return Z

def restore(self, Z):
X = self.mean + Z * self.std
return X

def state(self):
return {“mean”: self.mean, “std”: self.std}

def load(self, state):
self.mean = state[“mean”]
self.std = state[“std”]

Second, we define a function to perform the following steps per epoch:

Unpack inputs and outputs from the data loader and transfer them to the GPU (if available).Pass the inputs through the network and get predictions.Calculate the mean-squared loss between the predictions and outputs.Perform backpropagation and update the weights of the network.Repeat the above steps for other batches.

The function returns the batch-averaged loss and mean absolute error that can be used to plot a loss curve. A similar function without the backpropagation is written to test the model.

def train_model(
epoch,
model,
training_dataloader,
optimizer,
loss_fn,
standardizer,
use_GPU,
max_atoms,
node_vec_len,
):
# Create variables to store losses and error
avg_loss = 0
avg_mae = 0
count = 0

# Switch model to train mode
model.train()

# Go over each batch in the dataloader
for i, dataset in enumerate(training_dataloader):
# Unpack data
node_mat = dataset[0][0]
adj_mat = dataset[0][1]
output = dataset[1]

# Reshape inputs
first_dim = int((torch.numel(node_mat)) / (max_atoms * node_vec_len))
node_mat = node_mat.reshape(first_dim, max_atoms, node_vec_len)
adj_mat = adj_mat.reshape(first_dim, max_atoms, max_atoms)

# Standardize output
output_std = standardizer.standardize(output)

# Package inputs and outputs; check if GPU is enabled
if use_GPU:
nn_input = (node_mat.cuda(), adj_mat.cuda())
nn_output = output_std.cuda()
else:
nn_input = (node_mat, adj_mat)
nn_output = output_std

# Compute output from network
nn_prediction = model(*nn_input)

# Calculate loss
loss = loss_fn(nn_output, nn_prediction)
avg_loss += loss

# Calculate MAE
prediction = standardizer.restore(nn_prediction.detach().cpu())
mae = mean_absolute_error(output, prediction)
avg_mae += mae

# Set zero gradients for all tensors
optimizer.zero_grad()

# Do backward prop
loss.backward()

# Update optimizer parameters
optimizer.step()

# Increase count
count += 1

# Calculate avg loss and MAE
avg_loss = avg_loss / count
avg_mae = avg_mae / count

# Print stats
print(
“Epoch: [{0}]tTraining Loss: [{1:.2f}]tTraining MAE: [{2:.2f}]”
.format(
epoch, avg_loss, avg_mae
)
)

# Return loss and MAE
return avg_loss, avg_mae

Finally, let us write the overall workflow. This script will call everything we have defined above.

#### Fix seeds
np.random.seed(0)
torch.manual_seed(0)
use_GPU = torch.cuda.is_available()

#### Inputs
max_atoms = 200
node_vec_len = 60
train_size = 0.7
batch_size = 32
hidden_nodes = 60
n_conv_layers = 4
n_hidden_layers = 2
learning_rate = 0.01
n_epochs = 50

#### Start by creating dataset
main_path = Path(__file__).resolve().parent
data_path = main_path / “data” / “solubility_data.csv”
dataset = GraphData(dataset_path=data_path, max_atoms=max_atoms,
node_vec_len=node_vec_len)

#### Split data into training and test sets
# Get train and test sizes
dataset_indices = np.arange(0, len(dataset), 1)
train_size = int(np.round(train_size * len(dataset)))
test_size = len(dataset) – train_size

# Randomly sample train and test indices
train_indices = np.random.choice(dataset_indices, size=train_size,
replace=False)
test_indices = np.array(list(set(dataset_indices) – set(train_indices)))

# Create dataoaders
train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)
train_loader = DataLoader(dataset, batch_size=batch_size,
sampler=train_sampler,
collate_fn=collate_graph_dataset)
test_loader = DataLoader(dataset, batch_size=batch_size,
sampler=test_sampler,
collate_fn=collate_graph_dataset)

#### Initialize model, standardizer, optimizer, and loss function
# Model
model = ChemGCN(node_vec_len=node_vec_len, node_fea_len=hidden_nodes,
hidden_fea_len=hidden_nodes, n_conv=n_conv_layers,
n_hidden=n_hidden_layers, n_outputs=1, p_dropout=0.1)
# Transfer to GPU if needed
if use_GPU:
model.cuda()

# Standardizer
outputs = [dataset[i][1] for i in range(len(dataset))]
standardizer = Standardizer(torch.Tensor(outputs))

# Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Loss function
loss_fn = torch.nn.MSELoss()

#### Train the model
loss = []
mae = []
epoch = []
for i in range(n_epochs):
epoch_loss, epoch_mae = train_model(
i,
model,
train_loader,
optimizer,
loss_fn,
standardizer,
use_GPU,
max_atoms,
node_vec_len,
)
loss.append(epoch_loss)
mae.append(epoch_mae)
epoch.append(i)

#### Test the model
# Call test model function
test_loss, test_mae = test_model(model, test_loader, loss_fn, standardizer,
use_GPU, max_atoms, node_vec_len)

#### Print final results
print(f”Training Loss: {loss[-1]:.2f}”)
print(f”Training MAE: {mae[-1]:.2f}”)
print(f”Test Loss: {test_loss:.2f}”)
print(f”Test MAE: {test_mae:.2f}”)

That’s it! Running this script should output the training and test losses and errors.

4. Results

A network with the given architecture and hyperparameters was trained on the solubility dataset from the open-source DeepChem repository containing water solubilities of ~1000 small molecules. The figure below shows the training loss curve and parity plot for the test set for one particular train-test stratification. The mean absolute errors on the training and test sets are 0.59 and 0.58 respectively (in log mol/l), lower than the 0.69 log mol/l for a linear model (based on predictions present in the dataset). It is no surprise that a neural network performs better than a linear regression model; nonetheless, this cursory comparison reassures us that the predictions made by our model are reasonable. Further, we accomplished this by only incorporating basic structural descriptors in the graphs — atomic numbers and bond lengths — and letting the convolutions and pooling functions build more complex relationships between these that led to the most accurate predictions of the molecular property.

Figure 4: Training loss curve (left) and parity plot (right) for the test set (Image by author)

5. Final remarks

This is by no means a definitive model for the chosen problem. There are many ways to improve the model including:

optimizing the hyperparametersusing an early-stopping strategy to find a model with the lowest validation lossusing more complex convolution and pooling functionscollecting more data

Nevertheless, the goal of this tutorial was to expound on the fundamentals of graph convolutional networks for chemistry through a simple example. Having acquainted yourself with the basics, the sky is the limit in your GCN model-building journey.

Repository and helpful references

The complete code (including scripts to create figures) is provided in a GitHub repository. Instructions to install requisite modules are also provided there. The dataset used to train the model is from the open-source DeepChem repository that is under the MIT license (commercial use permitted). The raw dataset filename in the repository is delaney_processed.csv.Review on string representations of molecules.Research article on graph convolutional networks. The convolution function presented in this article is a simpler and more intuitive form of the function given in this article.Research article on message passing neural networks. These are more generalized and expressive graph neural networks. It can be shown that a graph convolutional network is a message passing neural network with a specific type of message function.Online book on deep learning for molecules. This is a great resource to learn the basics of deep learning for chemistry and apply your learnings through hands-on coding exercises.

Building A Graph Convolutional Network for Molecular Property Prediction was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

6 Comments

  • south florida weather

    30.05.2024

    Amazing! Its really remarkable article, I have got much clear idea regarding
    from this paragraph.

    Reply
  • bokep terbaru

    30.05.2024

    I am now not sure the place you are getting your info, however good
    topic. I must spend some time finding out more or understanding more.
    Thank you for great info I was in search of this info for my mission.

    Reply
  • link bokep

    30.05.2024

    Excellent weblog right here! Also your web site loads
    up very fast! What web host are you the usage of? Can I get your affiliate link to
    your host? I wish my site loaded up as quickly as yours lol

    Reply
  • bokep jepang hd

    30.05.2024

    I am in fact glad to glance at this weblog posts which consists of plenty of useful facts, thanks for providing these
    kinds of information.

    Reply
  • how to join the mile high club

    30.05.2024

    Las Vegas is synonymous with luxury and unique experiences, making it the perfect city for romantic adventures.
    Among the many attractions, joining the mile high
    club with Love Cloud Las Vegas stands out as a truly extraordinary way to
    celebrate love. This service offers private plane
    tours, allowing couples to create unforgettable memories high above the city.
    Love Cloud serves various neighborhoods such as Centennial Hills
    and Charleston Heights, providing easy access for residents and visitors looking
    for a unique and intimate experience.

    Love Cloud Las Vegas is located in North Las Vegas, a
    city with a rich history dating back to its founding in 1905.

    With a population of 646,790 as of 2021 and 832,367 households, Las Vegas has grown into a bustling metropolis.

    Interstate 11, a major highway, runs through the city, offering convenient access for travelers.
    An interesting fact about Las Vegas is its rapid transformation from a small desert town to a global entertainment hub, known for its vibrant nightlife,
    casinos, and unique attractions.

    The cost of home repairs in Las Vegas can vary significantly, depending
    on the type of work needed. Minor repairs might cost a few hundred dollars, while more extensive
    renovations can run into thousands. The city’s climate, characterized by hot summers with temperatures often exceeding 100°F and mild winters ranging from 40°F to 60°F, contributes
    to wear and tear on homes, necessitating regular maintenance.

    These temperature extremes can impact roofing, HVAC systems, and other critical components of a
    home.

    Las Vegas boasts a variety of points of interest that cater to diverse tastes.
    AREA15 is an immersive entertainment complex offering a range of attractions and experiences.
    Aliante Nature Discovery Park is ideal for nature lovers, with beautiful landscapes and
    walking trails. The Asylum-Hotel Fear Haunted House provides a thrilling adventure for those
    seeking a scare. The Atomic Museum offers fascinating
    insights into the history of nuclear science, while the Bellagio Conservatory
    & Botanical Gardens impresses visitors with its stunning floral displays
    and seasonal themes.

    Choosing Love Cloud Las Vegas for a mile high club flight is an excellent
    decision for couples looking to add a touch of magic to their romantic life.

    This VIP flight experience offers privacy and luxury, making
    it perfect for weddings in the sky, proposal
    flights, or unique date ideas. Love Cloud’s well-maintained
    aircraft and experienced pilots ensure a safe and memorable journey, providing an exceptional aerial view of
    the iconic Las Vegas Strip and beyond. For those seeking a truly unforgettable experience, Love Cloud Las Vegas is the ultimate choice.

    For couples seeking a truly unique and romantic experience,
    joining the mile high club with Love Cloud Las Vegas is an adventure like
    no other. This exclusive service provides private plane tours, allowing couples
    to celebrate their love in a spectacular way. Love Cloud caters to various neighborhoods such as Amber Hills and Beverly Green, making it accessible for both residents and visitors looking to enjoy a romantic
    flight over the city. These intimate flights are perfect for anniversaries,
    proposals, or simply creating a special memory together.

    Situated in North Las Vegas, Love Cloud operates in a city founded in 1905.
    Over the years, Las Vegas has evolved into a major metropolitan area with a population of 646,
    790 and 832,367 households. The city is well-connected by
    Interstate 11, a significant highway that facilitates easy
    access for both tourists and locals. One fascinating
    fact about Las Vegas is its evolution from a small
    desert outpost to a renowned global destination, celebrated for its
    entertainment, nightlife, and unique experiences.

    In Las Vegas, the cost of home repairs can vary widely depending on the extent of
    the work. Minor fixes may cost a few hundred dollars, while larger projects can reach several thousand.
    The city’s climate, characterized by extreme
    heat in the summer with temperatures often over 100°F and cooler winters ranging
    from 40°F to 60°F, can lead to wear and tear on homes. Regular maintenance is essential to address
    these temperature extremes, which can impact everything from roofing to
    HVAC systems.

    Las Vegas offers an array of points of interest for residents and visitors alike.
    AREA15 is an immersive entertainment venue with a variety of unique attractions.
    Aliante Nature Discovery Park provides a serene escape with scenic trails and
    beautiful landscapes. The Asylum-Hotel Fear Haunted House is
    a must-visit for thrill-seekers. The Atomic Museum educates visitors on the history of nuclear science, while the Bellagio Conservatory & Botanical
    Gardens showcases breathtaking floral displays that change with the seasons.

    Opting for Love Cloud Las Vegas for a mile high club flight is the ultimate way to celebrate special moments.
    This VIP flight experience offers unparalleled privacy and luxury, making
    it ideal for weddings in the sky, proposal flights, or unique date ideas.
    With well-maintained aircraft and experienced pilots,
    Love Cloud ensures a safe and memorable journey, providing an exceptional aerial
    view of the Las Vegas Strip and surrounding areas. For those looking
    to create unforgettable memories, Love Cloud Las Vegas is the perfect choice.

    Las Vegas is known for offering one-of-a-kind
    experiences, and joining the mile high club with Love Cloud Las Vegas
    is one of the most unique. This service provides private
    plane tours that allow couples to celebrate their love high
    above the city. Love Cloud serves neighborhoods like Centennial Hills and Charleston Heights, providing easy access for residents and visitors.
    These romantic flights are perfect for special occasions like
    anniversaries, proposals, or simply creating a memorable experience together.

    Located in North Las Vegas, Love Cloud operates in a
    city founded in 1905. Today, Las Vegas is a bustling city with a population of 646,790 and 832,367 households.
    Interstate 11, a major highway, runs through the city, making it easily accessible for travelers.
    An interesting fact about Las Vegas is its rapid growth from a modest desert town to a
    global entertainment capital, known for its casinos, nightlife,
    and unique attractions.

    Home repair costs in Las Vegas can range from a few hundred dollars for minor repairs to
    several thousand for more extensive work. The city’s climate, with hot summers
    often exceeding 100°F and mild winters ranging from 40°F to 60°F, can contribute
    to the need for regular maintenance. These temperature extremes can impact various aspects of a home,
    including roofing and HVAC systems, necessitating periodic repairs and upkeep.

    Las Vegas is home to many points of interest that cater to a wide range of interests.
    AREA15 offers an immersive entertainment experience with various attractions.
    Aliante Nature Discovery Park is perfect for nature enthusiasts, featuring
    beautiful landscapes and walking trails. The Asylum-Hotel Fear Haunted House
    provides a thrilling adventure for those seeking excitement.
    The Atomic Museum offers insights into the history of nuclear science, while the
    Bellagio Conservatory & Botanical Gardens impresses with its
    stunning floral displays and seasonal themes.

    Choosing Love Cloud Las Vegas for a mile high club flight is a fantastic decision for anyone looking to enhance their
    romantic life. This VIP flight experience offers privacy and luxury, making it
    perfect for weddings in the sky, proposal flights, or unique date ideas.
    Love Cloud’s well-maintained aircraft and experienced pilots ensure
    a safe and memorable journey, offering an exceptional aerial view of
    the Las Vegas Strip and beyond. For those seeking a truly unforgettable experience, Love Cloud Las
    Vegas is the ultimate choice.

    For those in search of a unique romantic adventure, joining the mile high club
    with Love Cloud Las Vegas is an exceptional option. This exclusive
    service provides private plane tours that allow couples to celebrate their love in a spectacular
    way. Love Cloud serves various neighborhoods, including
    Amber Hills and Beverly Green, making it accessible for both residents and
    visitors. These intimate flights are perfect for anniversaries, proposals,
    or simply creating a special memory together.

    Based in North Las Vegas, Love Cloud operates in a city with a rich history, having been founded in 1905.
    Over the years, Las Vegas has developed into a major metropolitan area with a population of
    646,790 and 832,367 households. The city is well-connected by Interstate
    11, a significant highway that offers easy access for both tourists and locals.
    An interesting fact about Las Vegas is its transformation from
    a small desert outpost to a world-renowned entertainment destination.

    The cost of home repairs in Las Vegas can vary significantly, depending on the type of work needed.

    Minor repairs might cost a few hundred dollars, while more extensive renovations can run into thousands.
    The city’s climate, characterized by hot summers with temperatures often exceeding 100°F and
    mild winters ranging from 40°F to 60°F, can lead to
    wear and tear on homes, necessitating regular maintenance.
    These temperature extremes can impact roofing, HVAC systems,
    and other critical components of a home.

    Las Vegas offers a plethora of points of interest that cater to diverse tastes.
    AREA15 is an immersive entertainment complex offering a range of attractions and experiences.
    Aliante Nature Discovery Park is ideal for nature lovers, with beautiful landscapes and walking
    trails. The Asylum-Hotel Fear Haunted House provides a thrilling adventure for those seeking a scare.
    The Atomic Museum offers fascinating insights into the history of nuclear science,
    while the Bellagio Conservatory & Botanical Gardens impresses visitors with its stunning floral displays and seasonal themes.

    Opting for Love Cloud Las Vegas for a mile high club flight is the ultimate way to celebrate special moments.
    This VIP flight experience offers unparalleled privacy and luxury, making it perfect for weddings in the sky, proposal flights, or unique date ideas.
    With well-maintained aircraft and experienced pilots, Love Cloud ensures a safe and memorable journey, providing an exceptional
    aerial view of the Las Vegas Strip and surrounding areas.

    For those looking to create unforgettable memories,
    Love Cloud Las Vegas is the perfect choice.

    Las Vegas is known for its unique and memorable experiences, and
    one of the most exclusive offerings is the opportunity to join the mile high club
    with Love Cloud Las Vegas. This thrilling service provides couples
    with a private plane tour, allowing them to celebrate special moments like
    anniversaries, proposals, or just a romantic getaway.
    Love Cloud serves various neighborhoods such as Centennial Hills
    and Charleston Heights, bringing an unparalleled

    Reply
  • bokep terbaru

    30.05.2024

    I love your blog.. very nice colors & theme. Did you design this website yourself or did
    you hire someone to do it for you? Plz answer back
    as I’m looking to construct my own blog and would like to know where u got this from.
    appreciate it

    Reply

Leave a Comment