OmegaT使用java写的,autoit就不好使了,得用它支持的JavaScript或者Groovy写脚本。
默认已经带了一些Groovy脚本示例了,可以供我们参考。
我这次的目标是把原文片段和译文片段以Taus DQF要求的csv格式存储,并分成30份给同学们使用。
import groovy.json.JsonOutput
files = project.projectFiles;
segment_count=0
fileLoop:
def map1 = [:]
for (i in 0 ..< files.size())
{
fi = files[i];
//console.println(fi.filePath);
for (j in 0 ..< fi.entries.size())
{
if (java.lang.Thread.interrupted()) {
break fileLoop;
}
ste = fi.entries[j];
changer="没有修改者";
source = ste.getSrcText();
target = project.getTranslationInfo(ste) ? project.getTranslationInfo(ste).translation : null;
changer=project.getTranslationInfo(ste).changer;
if (changer==null){
changer="没有修改者";
}
if (target==null){
target="未翻译";
}
num=ste.entryNum()
map1.put(num, [source,target,fi.filePath]) //结果保存在map里,以便导出为json。
segment_count++;
}
}
//获取项目路径,将结果保存为json文件
def prop = project.projectProperties
if (!prop) {
showMessageDialog null, res.getString("noProjectMsg"), res.getString("noProject"), INFORMATION_MESSAGE
return
}
def root = prop.projectRoot;
def srcTextFile = new File(root, 'project_source_content.txt');
def json = JsonOutput.toJson(map1);
console.println(json);
srcTextFile.write(json)
srcTextFile.close();
通过以上groovy代码,我们可以把结果导出。
然后我们按每人多少片段进行分割就可以了。
TAUS的DQF(Dynamic Quality Framework )一般是用来评估机器翻译的,我们也可以用来评价人工翻译。它有几个维度:Fluency、Adequacy和Typology Errors。
以下是具体的介绍:
Fluency: captures to what extent the translation is well-formed grammatically, contains correct spellings, adheres to common use of terms, titles and names, is intuitively acceptable and can be sensibly interpreted by a native speaker.: captures to what extent the translation is well-formed grammatically, contains correct spellings, adheres to common use of terms, titles and names, is intuitively acceptable and can be sensibly interpreted by a native speaker.
Fluency的评分分为四级:Incomprehensible、Disfluent、Good和Flawless。
Adequacy: captures to what extent the meaning in the source text is als expressed in the translation.
Adequacy的评分也分为四级:None、Little、Most和Everything。
Typology Errors更加详细,需要数译文出现的错误。具体见以下表格:
High-level | Granular levels |
---|---|
Accuracy | Addition |
Omission | |
Mistranslation | |
Over-translation | |
Under-translation | |
Untranslated | |
Improper exact TM match | |
Fluency | Punctuation |
Spelling | |
Grammar | |
Grammatical register | |
Inconsistency | |
Link/cross-reference | |
Character encoding | |
Terminology | Inconsistent with termbase |
Inconsistent use of terminology | |
Style | Awkward |
Company style | |
Inconsistent style | |
Third-party style | |
Unidiomatic | |
Locale convention | Address format |
Date format | |
Currency format | |
Measurement format | |
Shortcut key | |
Telephone format |
DQF需要上传翻译记忆文件,tmx或者tab分割的文件都行。生成tmx比较麻烦,我导入后还报错,我就选择的tab分割文本。但是omegat的原文片段可以包含换行,tab等信息,需要进一步处理。这个问题我反馈给了omegat,说片段是可以包含多行内容的。
建立审校项目需要分配给别人或自己,审校完成后,便可以查看报告了。