
TL;DR
- I read this because.. : NeurIPS 2023
- task : graph representation
- problem : message passing approaches (MPNN) suffer from over-smoothing and over-squashing, a problem similar to long-term dependency in NLP. Putting the transformer directly into the graph can lead to quadratic computation when there are too many nodes for global attention.
- Idea :** MPNN + Transformer, clean up the existing Positional Embedding and Structural Embedding and see how each affects MPNN.
- architecture : global attention + MPNN does
- baseline : GCN, GAT, SAN, Graphormer, …
- data : ZINC, PATTERN, CLIST, MNIST, CIFAR10, ….
- evaluation : MAE, Accuracy, …
- result : Several of the benchmarks SOTA, compliant grades

Details
Related work
first fully transformer graph netowrk https://arxiv.org/pdf/2012.09699.pdf
Positional Encoding(PE)
- local Knowing your position in the node cluster.
- global Knowing your position in the graph
- relative Knowing the relative distance between a pair of nodes.
Structural Encoding(SE)
Aiming to increase the expressiveness and generalization of GNNs by embedding the structure of graphs or subgraphs

GPS Layer: an MPNN + Transformer Hybrid

- $A\in\mathbb{R}^{N\times N}$ : adjacency matric of a graph with N nodes and E edges
- X^l\in \mathbb{R}^{N\times d_l}$ : $d_l$ dimensional node feature
- Edge feature in $E^l\in \mathbb{R}^{N\times d_l}$ : $d_l$ dimensions
Result

Ablation
