

Architectures¶

Embedding¶

The $i$ -th initial hidden state $h_{i}^{(0)} \in \mathbb{R}^{d_{h}}$ is given by the embedding of the $i$ -th node state $v_{i}$ using the embedding matrix $E \in \mathbb{R}^{d_{v} \times d_{h}}$ . The hidden states $\{ h_{i}^{(t)} \}_{t=1}^{T}$ are sequentially produced by stacked convolution blocks (CB), as shown in the figure of a CGNN architecture below.

CGNN Architectures

Convolution Block¶

The CB is composed of an edge neural network (EdgeNet), a gated convolution layer, and a multi-layer fully connected neural network (MFCNet), as shown below.

Convolution Block

The EdgeNet produces edge states $e_{ij} \in \mathbb{R}^{d_{e}}$ . The CB output $h_{i}^{\rm out}$ is the sum of the shortcut state $h_{i}^{\rm in}$ and the MFCNet output. The EdgeNet and MFCNet are optional components.

Multilayer Fully Connected Neural Networks¶

The MFCNet is composed of $L_{c}$ layers, each of which is given by

$h^{\rm out} = f(h^{\rm in} W_{c}),$

where $W_{c} \in \mathbb{R}^{d_{h} \times d_{h}}$ denotes a weight matrix, and $f(\cdot)$ denotes an activate function.

In neural network components presented below, $f(\cdot)$ appears repeatedly but is not needed to be the same activation function.

Gated Convolution¶

For $i$ -th hidden state, given a sequence of vectors $\{ h_{j}^{\rm in} \}_{j \in \mathcal{N}_{i}}$ , where $h_{j}^{\rm in} \in \mathbb{R}^{d_{c}}$ is either a hidden state ( $d_{c}=d_{h}$ ) or an edge state ( $d_{c}=d_{e}$ ), the CB outputs $h_{i}^{\rm out} \in \mathbb{R}^{d_{h}}$ , as shown below.

Gated Convolution

The $h_{i}^{\rm out}$ is given by

$h_{i}^{\rm out} = \sum_{j \in {\cal N}_{i}} \sigma(h_{j}^{\rm in} W_{cg}) \odot f(h_{j}^{\rm in} W_{ch}),$

where $W_{cg} \in \mathbb{R}^{d_{c} \times d_{h}}$ and $W_{ch} \in \mathbb{R}^{d_{c} \times d_{h}}$ denote weight matrices, $\sigma(\cdot)$ denotes the sigmoid function, and $\odot$ element-wise multiplication.

Edge Neural Networks¶

The EdgeNet is a multi-layer neural network composed of $L_{e}$ layers, as shown below.

EdgeNet

Given $i$ -th hidden states $h_{i}$ and $j$ -th hidden state $h_{j}$ where $j \in \mathcal{N}_{i}$ , the EdgeNet outputs an edge state $e_{ij} \in \mathbb{R}^{d_{e}}$ .

Three variants of the EdgeNet layer are presented below.

Original EdgeNet Layer¶

The EdgeNet layer first developed, as shown below, is made of a bilinear transformation.

Original EdgeNet Layer

It is expressed as $$ e_{ij}^{\rm out} = f(\mathcal{B}(h_{i}, e_{ij}^{\rm in})), $$

and the bilinear transformation $\mathcal{B}(\cdot,\cdot)$ is defined by

$\mathcal{B}(h, e) = h B e = \left\{ \sum_{p, q} h(p)B(p,q,r)e(q) \right\}_{r=1}^{d_{e}},$

where $B$ is a weight tensor of order 3.

Fast EdgeNet Layer¶

The second EdgeNet layer is a fast version of $\mathcal{B}(\cdot,\cdot)$ , and is composed of two fully connected layers and the element-wise multiplication, as shown below.

Fast EdgeNet Layer

In the fast EdgeNet layer, the weight tensor is decomposed as

$B(p,q,r) = W_{he}(p,r) W_{ee}(q,r),$

where $W_{he}$ and $W_{ee}$ denote weight matrices. Then, this layer is expressed as

$e_{ij}^{\rm out} = f((h_{i}W_{he}) \odot (e_{ij}^{\rm in}W_{ee})).$

Moreover, the activation can be applied just after the two linear transformations, as expressed by

$e_{ij}^{\rm out} = f(h_{i}W_{he}) \odot f(e_{ij}^{\rm in}W_{ee}).$

Aggregate EdgeNet Layer¶

The last EdgeNet layer is based on aggregated transformations $\sum_{l=1}^{C} \mathcal{T}_{l}(h_{i}, e_{ij}^{\rm in})$ , where $C$ is the cardinality, and $\mathcal{T}_{l}$ is a bilinear transformation block (BTB), as shown below.

Aggregate EdgeNet Layer

As shown in the left panel of the figure above, $\mathcal{T}_{l}$ is given by

$e_{ij}^{\rm out} = \mathcal{B}_{l}(f(h_{i} W_{hb}), f(e_{ij}^{\rm in} W_{eb})) W_{be},$

where $\mathcal{B}_{l}$ denotes a bilinear transformation $\mathbb{R}^{d_{b}} \times \mathbb{R}^{d_{b}} \to \mathbb{R}^{d_{b}}$ ( $d_{b} \cdot C \approx d_{e}$ under normal use), and $W_{hb} \in \mathbb{R}^{d_{h} \times d_{b}}$ , $W_{eb} \in \mathbb{R}^{d_{e} \times d_{b}}$ , and $W_{be} \in \mathbb{R}^{d_{b} \times d_{e}}$ denote weight matrices.

As shown in the right panel, the aggregate EdgeNet layer outputs

$e_{ij}^{\rm out} = f(\sum_{l=1}^{C} \mathcal{T}_{l}(h_{i}, e_{ij}^{\rm in})).$

Edge Residual Neural Networks¶

The EdgeNet becomes a residual neural network when every EdgeNet layer is wrapped by the EdgeResNet layer, as shown below.

EdgeResNet Layer

$e_{ij}^{\rm out} = e_{ij}^{\rm in} W_{s} + \mathfrak{E}(h_{i}, e_{ij}^{\rm in}),$

where $W_{s}$ denotes a weight matrix, and $\mathfrak{E}(\cdot, \cdot)$ an EdgeNet layer.

Pooling¶

The graph-level representation $\Gamma^{(0)} \in \mathbb{R}^{d_{h}}$ is made from all the hidden states $\{ h_{i}^{(t)} \}_{t=1,i=1}^{T,N}$ except for the initial ones. At each step $t$ , the hidden states $\{ h_{i}^{(t)} \}_{i=1}^{N}$ are pooled with the gating mechanism as

$\gamma_{t} = \frac{1}{N} \sum_{i=1}^{N} \sigma(h_{i}^{(t)} W_{\gamma}^{(t)} + b_{\gamma}^{(t)}) \odot h_{i}^{(t)},$

where $W_{\gamma}^{(t)} \in \mathbb{R}^{d_{h} \times d_{h}}$ denotes a weight matrix, and $b_{\gamma}^{(t)} \in \mathbb{R}^{d_{h}}$ a bias vector. If the gating mechanism is not used, they are simply averaged as

$\gamma_{t} = \frac{1}{N} \sum_{i=1}^{N} h_{i}^{(t)}.$

Then, the graph-level states $\gamma_{1},\ldots,\gamma_{T}$ are weightedly averaged as

$\Gamma^{(0)} = f(\sum_{t} \gamma_{t}W_{\Gamma}^{(t)}),$

where $W_{\Gamma}^{(t)} \in \mathbb{R}^{d_{h} \times d_{h}}$ denotes a weight matrix. If only the final graph-level state $\gamma_{T}$ is used, it is simply activated as

$\Gamma^{(0)} = f(\gamma_{T})$

Graph-Level Neural Networks¶

The graph-level MFCNet is composed of $L_{g}$ layers, each of which outputs $\Gamma^{\rm out} \in \mathbb{R}^{d_{g}}$ given by

$\Gamma^{\rm out} = f(\Gamma^{\rm in} W_{g} + b_{g}),$

where $W_{g}$ denotes a weight matrix, and $b_{g} \in \mathbb{R}^{d_{g}}$ a bias vector. For the first layer $\Gamma^{\rm in} = \Gamma^{(0)}$ and $W_{g} \in \mathbb{R}^{d_{h} \times d_{g}}$ , and otherwise $\Gamma^{\rm in} \in \mathbb{R}^{d_{g}}$ and $W_{g} \in \mathbb{R}^{d_{g} \times d_{g}}$ .

The final layer's output $\Gamma^{(L_{g})}$ is used as the input vector for the linear regression

$\hat{y} = \Gamma^{(L_{g})} \cdot w_{r} + b_{r},$

where $w_{r} \in \mathbb{R}^{d_{g}}$ denotes a weight vector, and $b_{r} \in \mathbb{R}$ a bias scalar.

Given true values $Y = \{ y_{i} \}_{i=1}^{N}$ and predicted values $\hat{Y} = \{ \hat{y}_{i} \}_{i=1}^{N}$ , the mean squared error is calculated by

$L(Y, \hat{Y}) = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^{2},$

which serves as the loss function for training a CGNN model. The mean absolute error (MAE) is also calculated by

$\mathrm{MAE}(Y, \hat{Y}) = \frac{1}{N} \sum_{i=1}^{N} | y_i - \hat{y}_i |,$

which is used as the validation metric to determine the best model in training. The root mean squared error $\sqrt{L(Y, \hat{Y})}$ is employed as an evaluation metric in testing as well as the MAE.