《Aspect Level Sentiment Classification with Deep Memory Network》笔记

2019年5月11日 214次阅读来源: bupt_周小瑜

写在前面

网上已经有一篇笔记很好的记录了这篇论文《Aspect Level Sentiment Classification with Deep Memory Network》，见：西土城搬砖日常

笔者重新记录的目的是，把自己看的过程中的想法记录下来，力求行文更加清晰。

概述

读了论文《Aspect Level Sentiment Classification with Deep Memory Network》

这篇论文的知识点涵盖了：

记忆网络（Memory Network）
多层Attention 机制

应用场景跟上一篇分析的内容一样，都是多层次语义情感分析的。

大概框架

整体架构思路就是计算得到context的importance和文本表示,怎么计算呢？就是利用多层计算层进行计算，每个计算层又由MN和attention组合在一起。attention机制又分成了传统的content attention，和新提出来的location attention…

优点

和目前最好的features+SVM对比，达到了state-of-art的水平
和序列模型LSTM和attention+LSTM相比，表现要更好
相同条件下，运行速度要比LSTM快15倍

memory network

memory network是Jason Weston在14年提出来的想法，Sainbayar Sukhbaatar在15年提出了让memory network进行end to end的训练方法，并在QA上取得了较好的效果。

关于memory network的相关内容可参考下面两篇论文：

[Weston et al.2014] MEMORY NETWORKS
[Sukhbaatar et al.2015] End-To-End Memory Networks

大致思想：

a memory network consists of a memory m and four components I, G, O and R,

where m is an array of objects such as an array of vectors. 

Among these four components, I converts input to internal feature representation, 

G updates old memories with new input, 

O generates an output representation given a new input and the current memory state, 

R outputs a response based on the output representation.

MN的例子如下：

《《Aspect Level Sentiment Classification with Deep Memory Network》笔记》 image

这里想要提到的是，O组件是可以包含多层计算层的。
计算层称为hop.主要原因是多层次的hop可以提取更多的抽象语义信息。

框架设计

整体框图如下：

《《Aspect Level Sentiment Classification with Deep Memory Network》笔记》 image

word embedding:

这些word vectors包括context vectors和aspect vectors。

aspect vectors:

如果aspect word是单个词，aspect vectors就是aspect word的word embedding；如果aspect word是多个词组成的，aspect vectors就是几个词的embedding的平均值。

context word vectors:

《《Aspect Level Sentiment Classification with Deep Memory Network》笔记》 image

即sentence中除了aspect word之外的所有词的word embedding堆叠(拼成一个矩阵d*n-1维)到一起，这就是模型中的memory。（n为句子的长度）

compute layer

模型包括多个computational layers,每个computational layer包括一个attention layer和一个linear layer。
第一个computational layer，attention layer的输入是aspect vector，输出memory中的比较重要的部分，linear layer的输入是aspect vector。第一个computational layer的attention layer和linear layer的输出结果求和作为下一个computational layer的输入;
其它computational layer执行同样的操作，上一层的输出作为输入，通过attention机制获取memory中较重要的信息，与线性层得到的结果求和作为下一层的输入。
最后一层的输出作为结合aspect信息的sentence representation，作为aspect-level情感分类的特征，送到softmax。

tips: 参数共享

It is helpful to note that the parameters of attention and linear layers are shared in different hops. Therefore,the model with one layer and the model with nine layers have the same number of parameters.