OmegaT使用java写的,autoit就不好使了,得用它支持的JavaScript或者Groovy写脚本。
默认已经带了一些Groovy脚本示例了,可以供我们参考。
我这次的目标是把原文片段和译文片段以Taus DQF要求的csv格式存储,并分成30份给同学们使用。
import groovy.json.JsonOutput
files = project.projectFiles;
segment_count=0
fileLoop:
def map1 = [:]
for (i in 0 ..< files.size())
{
fi = files[i];
//console.println(fi.filePath);
for (j in 0 ..< fi.entries.size())
{
if (java.lang.Thread.interrupted()) {
break fileLoop;
}
ste = fi.entries[j];
changer="没有修改者";
source = ste.getSrcText();
target = project.getTranslationInfo(ste) ? project.getTranslationInfo(ste).translation : null;
changer=project.getTranslationInfo(ste).changer;
if (changer==null){
changer="没有修改者";
}
if (target==null){
target="未翻译";
}
num=ste.entryNum()
map1.put(num, [source,target,fi.filePath]) //结果保存在map里,以便导出为json。
segment_count++;
}
}
//获取项目路径,将结果保存为json文件
def prop = project.projectProperties
if (!prop) {
showMessageDialog null, res.getString("noProjectMsg"), res.getString("noProject"), INFORMATION_MESSAGE
return
}
def root = prop.projectRoot;
def srcTextFile = new File(root, 'project_source_content.txt');
def json = JsonOutput.toJson(map1);
console.println(json);
srcTextFile.write(json)
srcTextFile.close();
通过以上groovy代码,我们可以把结果导出。
然后我们按每人多少片段进行分割就可以了。
TAUS的DQF(Dynamic Quality Framework )一般是用来评估机器翻译的,我们也可以用来评价人工翻译。它有几个维度:Fluency、Adequacy和Typology Errors。
以下是具体的介绍:
Fluency: captures to what extent the translation is well-formed grammatically, contains correct spellings, adheres to common use of terms, titles and names, is intuitively acceptable and can be sensibly interpreted by a native speaker.: captures to what extent the translation is well-formed grammatically, contains correct spellings, adheres to common use of terms, titles and names, is intuitively acceptable and can be sensibly interpreted by a native speaker.
Fluency的评分分为四级:Incomprehensible、Disfluent、Good和Flawless。
Adequacy: captures to what extent the meaning in the source text is als expressed in the translation.
Adequacy的评分也分为四级:None、Little、Most和Everything。
Typology Errors更加详细,需要数译文出现的错误。具体见以下表格:
| High-level | Granular levels |
|---|---|
| Accuracy | Addition |
| Omission | |
| Mistranslation | |
| Over-translation | |
| Under-translation | |
| Untranslated | |
| Improper exact TM match | |
| Fluency | Punctuation |
| Spelling | |
| Grammar | |
| Grammatical register | |
| Inconsistency | |
| Link/cross-reference | |
| Character encoding | |
| Terminology | Inconsistent with termbase |
| Inconsistent use of terminology | |
| Style | Awkward |
| Company style | |
| Inconsistent style | |
| Third-party style | |
| Unidiomatic | |
| Locale convention | Address format |
| Date format | |
| Currency format | |
| Measurement format | |
| Shortcut key | |
| Telephone format |
DQF需要上传翻译记忆文件,tmx或者tab分割的文件都行。生成tmx比较麻烦,我导入后还报错,我就选择的tab分割文本。但是omegat的原文片段可以包含换行,tab等信息,需要进一步处理。这个问题我反馈给了omegat,说片段是可以包含多行内容的。
建立审校项目需要分配给别人或自己,审校完成后,便可以查看报告了。