image

paper

TL;DR

  • I read this because.. : I read it because the title seems to be about the concept of pulling relations through attention and his methodology sounds wow.
  • task : Graph representation
  • problem : Represent the graph with a transformer. Unlike graphs, transformers take an unordered set as input, and unlike GNNs, they have no relational inductive bias.
  • idea : Concatenate edge and node to get Q, K, V
  • architecture : transformer + edge update for neighbor nodes and edges
  • baseline : Deep Sets, Graph ATtention network, Message Passing Neural Network, Pointer Graph Networks
  • data : CLRS-30 This seems to be a benchmark for solving algorithms with graphs.
  • result : SOTA
  • limitation/things I cannot understand : No this access really didn’t exist? lol

Details

motivation

image

You can think of a transformer as a model that takes a set, a graph without edges, and represents it. Add an edge to the transformer

Previous work

image

https://arxiv.org/pdf/2207.02505.pdf

Graph Neural Networks

notation description.

  • A graph G consists of nodes $N$ and edges $\varepsilon$.
  • $N$ : consisting of the set of unordered node vectors $n_i\in \mathbb{R}^{d_n}$
  • $\varepsilon$ : consists of the set of edge vectors $e_{ij}\in\mathbb{R}^{d_e}$
  • At each layer $l$, take a graph $G^l$ and return $G^{l+1}$ with the same structure. image
  • $\phi$ : update function
  • $\oplus$ : aggregation function
  • $L_i$ : neighbors of the i-th node
  • The basic GNN is $e_{ij}^{l+1}=e_{ij}^l$ on all edges (i, j)

Relational Transformer

Elegant and simple… edge node concat and write it as x in qkv projection image

To simplify the math, we truncate the matrix to look like this image

To illustrate, here’s what it looks like image

Edge update

If we update the edge for all nodes and all edges, the complexity is $O(n^3)$, so we only aggregate message passing for four things: two neighboring nodes, ourselves, and an edge going in the opposite direction. image

image

image