摘要:

本文提出了利用邻节点及其与中心节点邻边相乘的方法,提取节点间交互信息;同时设计了新颖的采样策略加速训练

方法关键:

在原有简单属性相加的基础上

h(i+1)(u)=σ(vN(u)W[h(i)(v)f(eu,v)])=σ(W1vN(u)h(i)(v)+W2vN(u)f(eu,v))\begin{aligned} h^{(i+1)}(u) &=\sigma\left(\sum_{v \in N(u)} \mathbf{W}\left[\begin{array}{c}{h^{(i)}(v)} \\ {f\left(e_{u, v}\right)}\end{array}\right]\right) \\ &=\sigma\left(\mathbf{W}_{1} \sum_{v \in N(u)} h^{(i)}(v)+\mathbf{W}_{2} \sum_{v \in N(u)} f\left(e_{u, v}\right)\right) \end{aligned}

可以存在异构的问题

本文提出让邻节点及其相应的边相乘的方法

f((v,eu,v)):=f(v)f(eu,v)f\left(\left(v, e_{u, v}\right)\right):=f(v) \otimes f\left(e_{u, v}\right)

细节上,本文进一步解决了high dimensionalities and heavy redundancies的问题

LASE原理

LASE can be divided into three common modules, namely a gate, an amplifier and an aggregator

门:the gate evaluates v’s influence in u’s neighborhood

放大器:The amplifier amplifies the node attributes using link information

控制器:The aggregator sums up neighbor embeddings and combines them with the central node embedding using various strategies

Aggregators proposed in [Hamilton et al.,2017] may also be used in LASE.

实验

数据集

Reddit:节点为post,边为两条post的共同评论用户在不同社区的平均评论分布

dblp:文章为节点,以文章提取出来的tf-idf向量为节点属性,边属性为one-hot embeddings of the common authors。pca降维到200维

email和fmobile,节点为联系人,边为时间切片后节点之间的联系

结果分析

应用数据集做节点分类

汇总邻居属性的能力上,GCNs模型强于proximity-based models(LINE DeepWalk)

在利用边属性的能力上,LASE强于其他

LASE-RW和LASE-SAGE强于原始的简单拼接LASE-concat

Although there is no original features in two temporal networks, LASE still outperforms pre-trained features by exploring edge attributes, while GCN and GraphSAGE do not capture these additional information and struggles in over-fitting the proximity-based features.

贡献:

LASE provides a ubiquitous solution to a wider class of graph data by incorporating link attributes;

数据集适用广泛

LASE outperforms strong baselines and naive concatenating implementations by adequately leveraging the information in the link attributes;

充分利用边的属性信息,胜过原始方法

LASE adopt a more explainable approach in determining the neural architecture and thus enjoys better explainability

神经网络结构中采用了一种更具解释性的方法

不足:

  • Sampling setup不够好,在超大图上很笨拙
  • 模型为保持通用性而不够简洁优雅。应用到具体领域时可以进行优化

Sampling: To control batch scales, we leverage the Monte Carlo method to estimate the summed neighborhood information by sampling a fixed number of neighbors.