

Configuration¶

The CGNN program is a Python script, and you can run it with a basic configuration as follows:

python ${CGNN_HOME}/src/cgnn.py \
  --num_epochs 100 \
  --batch_size 512 \
  --lr 0.001 \
  --n_node_feat ${NodeFeatures} \
  --n_hidden_feat 64 \
  --n_graph_feat 128 \
  --n_conv 3 \
  --n_fc 2 \
  --dataset_path ${DATASET} \
  --split_file ${DATASET}/split.json \
  --target_name formation_energy_per_atom \
  --milestones 80 \
  --gamma 0.1 \

You can configure your CGNN model and training strategy using the following options:

Device¶

--device String (Default: cuda)

This string value must be cpu or cuda. If no CUDA device is available, the CPU device will be used.

Features¶

Node Features¶

--n_node_feat Integer (Default: 4)

This integer value is the number of node features, $d_{v}$ . If the one-hot encoding is used, it is the number of node species, $K$ .

Attention

The value must be equal to the size of the node vectors defined in the configuration file in the database directory (config.json).

Hidden Features¶

--n_hidden_feat Integer (Default: 16)

This integer value is the number of features of the hidden states, $d_{h}$ .

Graph Features¶

--n_graph_feat Integer (Default: 32)

This integer value is the number of features of the graph states, $d_{g}$ .

EdgeNet Features¶

--n_edge_net_feat Integer (Default: 16)

This integer value is the number of features for the EdgeNet layers, $d_{e}$ .

Convolution¶

Convolution Blocks¶

--n_conv Integer (Default: 3)

This integer value is the number of convolution blocks, $T$ .

Node-Level Activation¶

--node_activation String (Default: none)

This string value is a name of the activation function lastly used in the convolution blocks.

Node-Level Batch Normalization¶

--use_node_batch_norm

If this option is used, the batch normalization is applied before the node-level activation.

Edge-Level Activation¶

--edge_activation String (Default: none)

This string value is a name of the activation function used in the gated convolutions.

Edge-Level Batch Normalization¶

--use_edge_batch_norm

If this option is used, the batch normalization is applied before the sigmoid and edge-level activation in the convolution.

Convolution Type¶

--conv_type Integer (Default: 0)

If this value is greater than 0, the gated convolution is gvien by

$h_{i}^{\rm out} = \sum_{j \in {\cal N}_{i}} \sigma(e_{ij}^{\rm in} W_{cg}) \odot f(h_{j}^{\rm in} W_{ch}),$

and otherwise the default

$h_{i}^{\rm out} = \sum_{j \in {\cal N}_{i}} \sigma(e_{ij}^{\rm in} W_{cg}) \odot f(e_{ij}^{\rm in} W_{ch}),$

where $h_{j}^{\rm in}$ and $e_{ij}^{\rm in}$ are the input and output of the EdgeNet, respectively.

Graph-Level MFCNet¶

Graph-Level Layers¶

--n_fc Integer (Default: 2)

This integer value is the number of layers for the graph-level MFCNet, $L_{g}$ .

Graph-Level Activation¶

--activation String (Default: softplus)

This string value is a name of the activation function used in the graph-level fully connected layers and the pooling layer.

Graph-Level Batch Normalization¶

--use_batch_norm

If this option is used, the batch normalization is applied before the graph-level activation, and the default bias terms are removed except for one in the linear regression.

EdgeNet¶

Warning

When using an original or aggregate EdgeNet, the hidden and edge state sizes are practically limited to small numbers (e.g., $d_{h}=16$ and $d_{e}=24$ ) because the bilinear transformation used in the EdgeNet is an extremely time-consuming process.

Example

The original EdgeResNet with 2 layers is given by the following configuration:

--n_edge_net_layers 2 \
--use_edge_net_shortcut \

EdgeNet Layers¶

--n_edge_net_layers Integer (Default: 0)

This integer value is the number of EdgeNet layers, $L_{e}$ .

EdgeNet Activation¶

--edge_net_activation String (Default: elu)

This string value is a name of the activation function used in the EdgeNet layers.

EdgeNet Batch Normalization¶

--use_edge_net_batch_norm

If this option is used, the batch normalization is applied before the EdgeNet activation.

Shortcut Option¶

--use_edge_net_shortcut

If this option is used, the EdgeResNet is employed.

Fast EdgeNet¶

Example

The CGNN paper uses the fast EdgeResNet with only a single layer given by the following configuration:

--n_edge_net_layers 1 \
--use_fast_edge_network \
--fast_edge_network_type 1 \
--use_edge_net_shortcut \

Fast EdgeNet Option¶

--use_fast_edge_network

If this option is used, one of the two fast EdgeNet is used in the convolution blocks.

Fast EdgeNet Type¶

--fast_edge_network_type Integer (Default: 0)

If this value is 0, the original fast edge network is used, and otherwise the modified one.

Aggregate EdgeNet¶

Example

The aggregate EdgeResNet with 2 layers for $C=2$ and $d_{b}=12$ is given by the following configuration:

--n_edge_net_layers 2 \
--use_aggregated_edge_network \
--edge_net_cardinality 12 \
--edge_net_width 2 \
--use_edge_net_shortcut \

Aggregate EdgeNet Option¶

--use_aggregated_edge_network

If this option is used, the aggregate EdgeNet is employed in the convolution blocks.

EdgeNet Cardinality¶

--edge_net_cardinality Integer (Default: 32)

The integer value is the number of aggregated transformations (cardinality), $C$ .

EdgeNet Width¶

--edge_net_width Integer (Default: 4)

The integer value is the feature size for all the bilinear transformations in the aggregate EdgeNet, $d_{b}$ .

Convolution-Block MFCNet¶

Example

In the CGNN paper, the following configuration is used for the default convolution-block MFCNet.

--n_postconv_net_layers 2 \
--use_postconv_net_batch_norm \

CB-MFCNet Layers¶

--n_postconv_net_layers Integer (Default: 0)

This integer value is the number of layers for the convolution-block MFCNet, $L_{c}$ .

CB-MFCNet Activation¶

--postconv_net_activation String (Default: elu)

The string value is a name of the activation function used in the convolution-block MFCNet layers.

CB-MFCNet Batch Normalization¶

--use_postconv_net_batch_norm

If this option is used, the batch normalization is applied before the activation in every layer in the convolution-block MFCNet.

Bias Terms¶

Note

The following bias terms are not included in the models used in the CGNN paper.

Convolution Bias¶

--conv_bias

If this option is used, a bias term is added to every linear transformation in the gated convolution.

EdgeNet Bias¶

--edge_net_bias

If this option is used, a bias term is added to every bilinear transformation in the original and aggregate EdgeNet, and to every linear transformation in the fast and aggregate EdgeNet.

CB-MFCNet Bias¶

--postconv_net_bias

If this option is used, the bias term is added to the linear transformation in every CB-MFCNet layer.

Pooling¶

Example

In the CGNN paper, the following configuration is used for the full and gated pooling.

--full_pooling \
--gated_pooling \

Full Pooling¶

--full_pooling

If this option is used, the pooling layer uses all outputs of the convolution blocks to produce a graph-level state.

Gated Pooling¶

--gated_pooling

If this option is used, the pooling layer employs the gating mechanism.

Optimization¶

Batch Size¶

--batch_size Integer (Default: 8)

The integer value is the mini-batch size for the stochastic optimization.

Optimization Methods¶

--optim String (Default: adam)

The string value is a name of the optimizer, which must be one of sgd, adam, and amsgrad. sgd is the stochastic gradient descent with the Nesterov momentum (the momentum factor = 0.9). adam and amsgrad use the standard parameters $\beta_{1}=0.9$ , $\beta_{2}=0.999$ , and $\epsilon=10^{-8}$ .

Learning Rate¶

--lr Float (Default: $10^{-3}$ )

This floating-point value is the learning rate for the stochastic optimization.

Weight Decay¶

--weight_decay Float (Default: 0)

This floating-point value is the weight decay for the stochastic optimization (i.e., the L2 regularization).

Gradient Clipping¶

--clip_value Float (Default: 0)

This floating-point value is used for the gradient clipping.

Milestones¶

--milestones Integer [Integer ...] (Default: 10)

This integer sequence $M_{1}, M_{2}, \ldots, M_{n}$ must satisfy the condition $M_{i} < M_{i+1}$ . At $M_{i}$ epochs, the learning rate is multiplied by $\gamma$ . If the first value $M_{1}$ is negative, its absolute value will be used as the step size for the step LR scheduler.

Learning Rate Decay¶

--gamma Float (Default: 0.1)

This floating-point value is used as the learning rate decay, which is the $\gamma$ value of the LR scheduler. If you want to use the step LR scheduler, for example:

--milestones -2 --gamma 0.98

which sets the step size to 2 epochs, and the $\gamma$ value to 0.98.

Cosine Annealing¶

--cosine_annealing

If this option is used, the cosine annealing scheduler is employed.

$\eta_{t} = \eta_{min} + \frac{1}{2} (\eta_{max} - \eta_{min}) ( 1 + \cos(\frac{T_{cur}}{T_{max}}\pi))$

$\eta_{min}$ and $T_{max}$ are set by the options --gamma and --milestones, respectively. $\eta_{max}$ is the learning rate set by the option --lr.

Epochs¶

--num_epochs Integer (Default: 5)

This integer value is the total number of epochs for the stochastic optimization.

Dataset¶

Dataset Path¶

--dataset_path String

This string value must be a path to the directory containing dataset files, config.json, graph_data.npz, and targets.csv.

Target¶

--target_name String

This string value must be one of the target names in the header of the target file targets.csv.

Dataset Splitting¶

--split_file String

This string value must be a path to a split file split.json

Workers¶

--num_workers Integer (Default: 0)

This value should be 0.

Random Seed¶

--seed Integer (Default: 12345)

This value is the seed of the random number generator for Pytorch.

Loading Model¶

--load_model

If this option is used, the initial model weights are loaded from a model file model.pth in the current directory.

Extension¶

--use_extension

If this option is used, the extension layer multiplies the output of the regression layer by the number of nodes. This is usually used for extensive properties.

Activation Functions¶

The following keywords can be used as activation names.

Softplus¶

Keyword

Softplus

$\text{Softplus}(x) = \log(1+\exp(x))$

Shifted Softplus¶

Keyword

SSP

$\text{SSP}(x) = \text{Softplus}(x) - \text{Softplus}(0)$

Exponential Linear Units (ELU)¶

Keyword

ELU or 'ELU(alpha)' (default alpha = 1.0)

$\text{ELU}(x) = \max(0,x) + \min(0, \alpha (\exp(x) − 1))$

Rectified Linear Units (ReLU)¶

Keyword

ReLU

$\text{ReLU}(x) = \max(0,x)$

Scaled Exponential Linear Units (SELU)¶

Keyword

SELU

$\text{SELU}(x) = \lambda (\max(0,x)+\min(0,\alpha (\exp(x)−1)))$

Info

Ref: "Self-Normalizing Neural Networks" arXiv

Continuously Differentiable Exponential Linear Units (CELU)¶

Keyword

CELU or 'CELU(alpha)' (default alpha = 1.0)

$\text{CELU}(x)=\max(0,x)+\min(0,\alpha (\exp(x/\alpha)−1))$

Info

Ref: "Continuously Differentiable Exponential Linear Units" arXiv

The Identity Activation¶

Keyword

None

$\text{Identity}(x) = x$

This is unavailable for --activation and --postconv_net_activation.