
TL;DR
- I read this because.. : I read it because the title seems to be about the concept of pulling relations through attention and his methodology sounds
wow. - task : Graph representation
- problem : Represent the graph with a transformer. Unlike graphs, transformers take an unordered set as input, and unlike GNNs, they have no relational inductive bias.
- idea : Concatenate edge and node to get Q, K, V
- architecture : transformer + edge update for neighbor nodes and edges
- baseline : Deep Sets, Graph ATtention network, Message Passing Neural Network, Pointer Graph Networks
- data : CLRS-30 This seems to be a benchmark for solving algorithms with graphs.
- result : SOTA
- limitation/things I cannot understand : No this access really didn’t exist? lol
Details
motivation

You can think of a transformer as a model that takes a set, a graph without edges, and represents it. Add an edge to the transformer
Previous work

https://arxiv.org/pdf/2207.02505.pdf
Graph Neural Networks
notation description.
- A graph G consists of nodes $N$ and edges $\varepsilon$.
- $N$ : consisting of the set of unordered node vectors $n_i\in \mathbb{R}^{d_n}$
- $\varepsilon$ : consists of the set of edge vectors $e_{ij}\in\mathbb{R}^{d_e}$
- At each layer $l$, take a graph $G^l$ and return $G^{l+1}$ with the same structure.

- $\phi$ : update function
- $\oplus$ : aggregation function
- $L_i$ : neighbors of the i-th node
- The basic GNN is $e_{ij}^{l+1}=e_{ij}^l$ on all edges (i, j)
Relational Transformer
Elegant and simple… edge node concat and write it as x in qkv projection

To simplify the math, we truncate the matrix to look like this

To illustrate, here’s what it looks like

Edge update
If we update the edge for all nodes and all edges, the complexity is $O(n^3)$, so we only aggregate message passing for four things: two neighboring nodes, ourselves, and an edge going in the opposite direction.


