WMT15 单句评价任务的分析

关于baseline

使用的SVM regression, RBF kernel. 用 grid search 设定 hpyer parameter. 使用了17个feature:

<http://www.quest.dcs.shef.ac.uk/quest_files/features_blackbox_baseline_17>
number of tokens in the source sentence
number of tokens in the target sentence
average source token length
LM probability of source sentence
LM probability of target sentence
number of occurrences of the target word within the target hypothesis (averaged for all words in the hypothesis - type/token ratio)
average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob(t|s) > 0.2)
average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob(t|s) > 0.01) weighted by the inverse frequency of each word in the source corpus
percentage of unigrams in quartile 1 of frequency (lower frequency words) in a corpus of the source language (SMT training corpus)
percentage of unigrams in quartile 4 of frequency (higher frequency words) in a corpus of the source language
percentage of bigrams in quartile 1 of frequency of source words in a corpus of the source language
percentage of bigrams in quartile 4 of frequency of source words in a corpus of the source language
percentage of trigrams in quartile 1 of frequency of source words in a corpus of the source language
percentage of trigrams in quartile 4 of frequency of source words in a corpus of the source language
percentage of unigrams in the source sentence seen in a corpus (SMT training corpus)
number of punctuation marks in the source sentence
number of punctuation marks in the target sentence

关于任务背景

翻译评价任务有3个: Task 1 是句子级别的; Task 2 是单词级别的; Task 3 是文档级别的。
下边是所有参赛(评测任务)的小组,这里只关注句子级别(Task 2)的。

IDTasksParticipating teamPaper
DCU-SHEFF2Dublin City University, Ireland and University of Sheffield, UKLogachevaet al., 2015
HDCL2Heidelberg University, GermanyKreutzer et al., 2015
LORIA1Lorraine Laboratory of Research in Computer Science and its Applications,FranceLanglois, 2015
RTM-DCU1,2,3Dublin City University, IrelandBicici et al., 2015
SAU-KERC2Shenyang Aerospace University, ChinaShang et al., 2015
SHEFF-NN1,2University of Sheffield Team 1, UKShah et al., 2015
UAlacant2Alicant University, SpainEsplà-Gomis et al., 2015a
UGENT1,2Ghent University, BelgiumTezcan et al., 2015
USAAR-USHEF3University of Sheffield, UK and Saarland University, GermanyScarton et al.,2015a
USHEF3University of Sheffield, UKScarton et al., 2015a
HIDDEN3Undisclose

评测的结果有两种,HTER 和 ranking。HTER (Human-targeted Translation Error Rate) 越小越好。评价指标是 MAE 和 RMSE。(通过计算 ranking 是将翻译的句子从好到坏排序,不考虑。)

IDSystemMAE↓RMSE↓
RTM-DCURTM-FS+PLS-SVR13.2517.48
LORIA17+LSI+MT+FILTRE13.3417.35
RTM-DCURTM-FS-SVR13.3517.68
LORIA17+LSI+MT13.4217.45
UGENT-LT3SCATE-SVM13.7117.45
UGENT-LT3SCATE-SVM-single13.7617.79
SHEFSVM13.8318.01
BaselineSVM14.8219.13
SHEFGP15.1618.97

可以看出 RTM-DCU 和 LORIA 两组的效果最好, 后边就分析这两组的工作

所有论文都在这里: http://www.statmt.org/wmt15/W…

RTM-DCU

实际上就是一个Transductive Learning和Active Learning的组合,优化特征选择。

LORIA

    原文作者:winterdawn
    原文地址: https://segmentfault.com/a/1190000006796395
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞