Vision GNN: An Image is Worth Graph of Nodes

代码开源链接（暂无）：https://github.com/sanbuphy/CV-Backbones

最近都在关注GNN在图像识别上的应用，发现已经被Huawei Noah’s Ark Lab做完了。。。

但我很好奇他们是怎么处理训练集本身（loss是什么），对拆分后图结构的标注应该也是一个相对大的工程，这点需要等开源后仔细研究（也许还需要多读几遍Graph Representation of Image以及VIG BLock）

该文基于图像的图表示（graph representation）提出了 vision graph neural network (ViG)。该文应该是首次将图神经网络用于视觉任务，同时取得很好的效果，在 ImageNet 分类任务上超过了 CNN (ResNet), MLP (CycleMLP) 和 transformer (Swin-T)

These parts linked by joints naturally form a graph structure. By analyzing the graph.

通过图结构分解，各部分间的连接显得更加紧密：（相似的语义能够互相跨区域关联）

“we are able to recognize the human. Moreover, graph is a generalized data structure that grid and sequence can be viewed as a special case of graph. Viewing an image as a graph is more flexible and effective for visual perception”

VIG Block

Huawei Noah’s Ark Lab提出了一个特殊的VIG块结构，相比GCNs可以提高特征的多样性，减少因网络深度加强导致的特征减少现象。ViG 块是构成 ViG 网络的基本构建单元，其由Grapher 模块和 FFN 模块叠加而成的。

如何避免Feature diversity of nodes as layer changes 降低，即：

“The over-smoothing phenomenon in deep GCNs will decrease the distinctiveness of node features and lead to performance degradation for visual recognition,”