Configuration¶
The CGNN program is a Python script, and you can run it with a basic configuration as follows:
python ${CGNN_HOME}/src/cgnn.py \ --num_epochs 100 \ --batch_size 512 \ --lr 0.001 \ --n_node_feat ${NodeFeatures} \ --n_hidden_feat 64 \ --n_graph_feat 128 \ --n_conv 3 \ --n_fc 2 \ --dataset_path ${DATASET} \ --split_file ${DATASET}/split.json \ --target_name formation_energy_per_atom \ --milestones 80 \ --gamma 0.1 \
You can configure your CGNN model and training strategy using the following options:
Device¶
--device
String (Default: cuda
)
This string value must be cpu
or cuda
. If no CUDA device is available, the CPU device will be used.
Features¶
Node Features¶
--n_node_feat
Integer (Default: 4)
This integer value is the number of node features, d_{v}. If the one-hot encoding is used, it is the number of node species, K.
Attention
The value must be equal to the size of the node vectors defined in the configuration file in the database directory (config.json
).
Hidden Features¶
--n_hidden_feat
Integer (Default: 16)
This integer value is the number of features of the hidden states, d_{h}.
Graph Features¶
--n_graph_feat
Integer (Default: 32)
This integer value is the number of features of the graph states, d_{g}.
EdgeNet Features¶
--n_edge_net_feat
Integer (Default: 16)
This integer value is the number of features for the EdgeNet layers, d_{e}.
Convolution¶
Convolution Blocks¶
--n_conv
Integer (Default: 3)
This integer value is the number of convolution blocks, T.
Node-Level Activation¶
--node_activation
String (Default: none
)
This string value is a name of the activation function lastly used in the convolution blocks.
Node-Level Batch Normalization¶
--use_node_batch_norm
If this option is used, the batch normalization is applied before the node-level activation.
Edge-Level Activation¶
--edge_activation
String (Default: none
)
This string value is a name of the activation function used in the gated convolutions.
Edge-Level Batch Normalization¶
--use_edge_batch_norm
If this option is used, the batch normalization is applied before the sigmoid and edge-level activation in the convolution.
Convolution Type¶
--conv_type
Integer (Default: 0)
If this value is greater than 0, the gated convolution is gvien by
and otherwise the default
where h_{j}^{\rm in} and e_{ij}^{\rm in} are the input and output of the EdgeNet, respectively.
Graph-Level MFCNet¶
Graph-Level Layers¶
--n_fc
Integer (Default: 2)
This integer value is the number of layers for the graph-level MFCNet, L_{g}.
Graph-Level Activation¶
--activation
String (Default: softplus
)
This string value is a name of the activation function used in the graph-level fully connected layers and the pooling layer.
Graph-Level Batch Normalization¶
--use_batch_norm
If this option is used, the batch normalization is applied before the graph-level activation, and the default bias terms are removed except for one in the linear regression.
EdgeNet¶
Warning
When using an original or aggregate EdgeNet, the hidden and edge state sizes are practically limited to small numbers (e.g., d_{h}=16 and d_{e}=24) because the bilinear transformation used in the EdgeNet is an extremely time-consuming process.
Example
The original EdgeResNet with 2 layers is given by the following configuration:
--n_edge_net_layers 2 \ --use_edge_net_shortcut \
EdgeNet Layers¶
--n_edge_net_layers
Integer (Default: 0)
This integer value is the number of EdgeNet layers, L_{e}.
EdgeNet Activation¶
--edge_net_activation
String (Default: elu
)
This string value is a name of the activation function used in the EdgeNet layers.
EdgeNet Batch Normalization¶
--use_edge_net_batch_norm
If this option is used, the batch normalization is applied before the EdgeNet activation.
Shortcut Option¶
--use_edge_net_shortcut
If this option is used, the EdgeResNet is employed.
Fast EdgeNet¶
Example
The CGNN paper uses the fast EdgeResNet with only a single layer given by the following configuration:
--n_edge_net_layers 1 \ --use_fast_edge_network \ --fast_edge_network_type 1 \ --use_edge_net_shortcut \
Fast EdgeNet Option¶
--use_fast_edge_network
If this option is used, one of the two fast EdgeNet is used in the convolution blocks.
Fast EdgeNet Type¶
--fast_edge_network_type
Integer (Default: 0)
If this value is 0, the original fast edge network is used, and otherwise the modified one.
Aggregate EdgeNet¶
Example
The aggregate EdgeResNet with 2 layers for C=2 and d_{b}=12 is given by the following configuration:
--n_edge_net_layers 2 \ --use_aggregated_edge_network \ --edge_net_cardinality 12 \ --edge_net_width 2 \ --use_edge_net_shortcut \
Aggregate EdgeNet Option¶
--use_aggregated_edge_network
If this option is used, the aggregate EdgeNet is employed in the convolution blocks.
EdgeNet Cardinality¶
--edge_net_cardinality
Integer (Default: 32)
The integer value is the number of aggregated transformations (cardinality), C.
EdgeNet Width¶
--edge_net_width
Integer (Default: 4)
The integer value is the feature size for all the bilinear transformations in the aggregate EdgeNet, d_{b}.
Convolution-Block MFCNet¶
Example
In the CGNN paper, the following configuration is used for the default convolution-block MFCNet.
--n_postconv_net_layers 2 \ --use_postconv_net_batch_norm \
CB-MFCNet Layers¶
--n_postconv_net_layers
Integer (Default: 0)
This integer value is the number of layers for the convolution-block MFCNet, L_{c}.
CB-MFCNet Activation¶
--postconv_net_activation
String (Default: elu
)
The string value is a name of the activation function used in the convolution-block MFCNet layers.
CB-MFCNet Batch Normalization¶
--use_postconv_net_batch_norm
If this option is used, the batch normalization is applied before the activation in every layer in the convolution-block MFCNet.
Bias Terms¶
Note
The following bias terms are not included in the models used in the CGNN paper.
Convolution Bias¶
--conv_bias
If this option is used, a bias term is added to every linear transformation in the gated convolution.
EdgeNet Bias¶
--edge_net_bias
If this option is used, a bias term is added to every bilinear transformation in the original and aggregate EdgeNet, and to every linear transformation in the fast and aggregate EdgeNet.
CB-MFCNet Bias¶
--postconv_net_bias
If this option is used, the bias term is added to the linear transformation in every CB-MFCNet layer.
Pooling¶
Example
In the CGNN paper, the following configuration is used for the full and gated pooling.
--full_pooling \ --gated_pooling \
Full Pooling¶
--full_pooling
If this option is used, the pooling layer uses all outputs of the convolution blocks to produce a graph-level state.
Gated Pooling¶
--gated_pooling
If this option is used, the pooling layer employs the gating mechanism.
Optimization¶
Batch Size¶
--batch_size
Integer (Default: 8)
The integer value is the mini-batch size for the stochastic optimization.
Optimization Methods¶
--optim
String (Default: adam
)
The string value is a name of the optimizer, which must be one of sgd
, adam
, and amsgrad
. sgd
is the stochastic gradient descent with the Nesterov momentum (the momentum factor = 0.9). adam
and amsgrad
use the standard parameters \beta_{1}=0.9, \beta_{2}=0.999, and \epsilon=10^{-8}.
Learning Rate¶
--lr
Float (Default: 10^{-3})
This floating-point value is the learning rate for the stochastic optimization.
Weight Decay¶
--weight_decay
Float (Default: 0)
This floating-point value is the weight decay for the stochastic optimization (i.e., the L2 regularization).
Gradient Clipping¶
--clip_value
Float (Default: 0)
This floating-point value is used for the gradient clipping.
Milestones¶
--milestones
Integer [Integer ...] (Default: 10)
This integer sequence M_{1}, M_{2}, \ldots, M_{n} must satisfy the condition M_{i} < M_{i+1}. At M_{i} epochs, the learning rate is multiplied by \gamma. If the first value M_{1} is negative, its absolute value will be used as the step size for the step LR scheduler.
Learning Rate Decay¶
--gamma
Float (Default: 0.1)
This floating-point value is used as the learning rate decay, which is the \gamma value of the LR scheduler. If you want to use the step LR scheduler, for example:
--milestones -2 --gamma 0.98
which sets the step size to 2 epochs, and the \gamma value to 0.98.
Cosine Annealing¶
--cosine_annealing
If this option is used, the cosine annealing scheduler is employed.
\eta_{min} and T_{max} are set by the options --gamma
and --milestones
, respectively. \eta_{max} is the learning rate set by the option --lr
.
Epochs¶
--num_epochs
Integer (Default: 5)
This integer value is the total number of epochs for the stochastic optimization.
Dataset¶
Dataset Path¶
--dataset_path
String
This string value must be a path to the directory containing dataset files, config.json
, graph_data.npz
, and targets.csv
.
Target¶
--target_name
String
This string value must be one of the target names in the header of the target file targets.csv
.
Dataset Splitting¶
--split_file
String
This string value must be a path to a split file split.json
Workers¶
--num_workers
Integer (Default: 0)
This value should be 0.
Random Seed¶
--seed
Integer (Default: 12345)
This value is the seed of the random number generator for Pytorch.
Loading Model¶
--load_model
If this option is used, the initial model weights are loaded from a model file model.pth
in the current directory.
Extension¶
--use_extension
If this option is used, the extension layer multiplies the output of the regression layer by the number of nodes. This is usually used for extensive properties.
Activation Functions¶
The following keywords can be used as activation names.
Softplus¶
Keyword
Softplus
\text{Softplus}(x) = \log(1+\exp(x))
Shifted Softplus¶
Keyword
SSP
\text{SSP}(x) = \text{Softplus}(x) - \text{Softplus}(0)
Exponential Linear Units (ELU)¶
Keyword
ELU
or 'ELU(alpha)'
(default alpha
= 1.0)
\text{ELU}(x) = \max(0,x) + \min(0, \alpha (\exp(x) − 1))
Rectified Linear Units (ReLU)¶
Keyword
ReLU
\text{ReLU}(x) = \max(0,x)
Scaled Exponential Linear Units (SELU)¶
Keyword
SELU
\text{SELU}(x) = \lambda (\max(0,x)+\min(0,\alpha (\exp(x)−1)))
Info
Ref: "Self-Normalizing Neural Networks" arXiv
Continuously Differentiable Exponential Linear Units (CELU)¶
Keyword
CELU
or 'CELU(alpha)'
(default alpha
= 1.0)
\text{CELU}(x)=\max(0,x)+\min(0,\alpha (\exp(x/\alpha)−1))
Info
Ref: "Continuously Differentiable Exponential Linear Units" arXiv
The Identity Activation¶
Keyword
None
\text{Identity}(x) = x
This is unavailable for --activation
and --postconv_net_activation
.