[94] Recipe for a General, Powerful, Scalable Graph Transformer

TL;DR

I read this because.. : NeurIPS 2023
task : graph representation
problem : message passing 접근론들(MPNN)은 NLP의 long-term dependency와 비슷한 문제인 over-smoothing, over-squashing 문제가 있음. transformer를 graph에 바로 넣으면 global attention을 하기에 node 들이 많아질 경우 연산량이 quadratic.
idea : MPNN + Transformer, 기존에 있었던 Positional Embedding과 Structural Embedding을 정리하고 각각이 MPNN에 얼마나 영향을 미치는지 봄.
architecture : global attention + MPNN 한다
baseline : GCN, GAT, SAN, Graphormer, …
data : ZINC, PATTERN, CLIST, MNIST, CIFAR10, ….
evaluation : MAE, Accuracy, …
result : benchmark 중 몇개 sota, 준수한 성적

first fully transformer graph netowrk https://arxiv.org/pdf/2012.09699.pdf

graph나 subgraph의 구조를 임베딩해서 GNN의 표현력과 일반화를 늘리려는 목표

$A\in\mathbb{R}^{N\times N}$ : adjacency matric of a graph with N nodes and E edges
$X^l\in \mathbb{R}^{N\times d_l}$ : $d_l$ 차원의 node feature
$E^l\in \mathbb{R}^{N\times d_l}$ : $d_l$ 차원의 edge feature