[94] Recipe for a General, Powerful, Scalable Graph Transformer

TL;DR

I read this because.. : NeurIPS 2023
task : graph representation
problem : message passing approaches (MPNN) suffer from over-smoothing and over-squashing, a problem similar to long-term dependency in NLP. Putting the transformer directly into the graph can lead to quadratic computation when there are too many nodes for global attention.
Idea :** MPNN + Transformer, clean up the existing Positional Embedding and Structural Embedding and see how each affects MPNN.
architecture : global attention + MPNN does
baseline : GCN, GAT, SAN, Graphormer, …
data : ZINC, PATTERN, CLIST, MNIST, CIFAR10, ….
evaluation : MAE, Accuracy, …
result : Several of the benchmarks SOTA, compliant grades

first fully transformer graph netowrk https://arxiv.org/pdf/2012.09699.pdf

Aiming to increase the expressiveness and generalization of GNNs by embedding the structure of graphs or subgraphs

$A\in\mathbb{R}^{N\times N}$ : adjacency matric of a graph with N nodes and E edges
X^l\in \mathbb{R}^{N\times d_l}$ : $d_l$ dimensional node feature
Edge feature in $E^l\in \mathbb{R}^{N\times d_l}$ : $d_l$ dimensions