星期二, 8月 09, 2005

机器翻译的需求与发展

Types of translation demand
几种翻译需求

When giving any general overview of the development and use of machine translation (MT) systems and translation tools, it is important to distinguish four basic types of translation demand. The first, and traditional one, is the demand for translations of a quality normally expected from human translators, i.e. translations of publishable quality – whether actually printed and sold, or whether distributed internally within a company or organisation. The second basic demand is for translations at a somewhat lower level of quality (and particularly in style), which are intended for users who want to find out the essential content of a particular document – and generally, as quickly as possible. The third type of demand is that for translation between participants in one-to-one communication (telephone or written correspondence) or of an unscripted presentation (e.g. diplomatic exchanges.) The fourth area of application is for translation within multilingual systems of information retrieval, information extraction, database access, etc.

当我们讨论机器翻译系统和翻译工具的发展时,首先需要区分四种基本的翻译需求:第一是传统型,它要求翻译结果和人(翻译家)翻得一样好,即翻译结果达到出版水平;第二种需求对翻译质量的要求稍低一些,尤其是对文体的要求较低,用户这时最感兴趣的是了解某篇文章的基本内容,因此希望翻译速度越快越好;第三种需求是对话双方一对一的交谈(打电话或者在Internet聊天室里聊天)或无需写在纸上的演讲(如外交场合的谈话);第四种需求是在信息检索、信息抽取、数据库访问等多语言系统里所需进行的翻译。

The first type of demand illustrates the use of MT for dissemination. It has been satisfied, to some extent, by machine translation systems ever since they were first developed in the 1960s. However, MT systems produce output which must invariably be revised or ‘post-edited’ by human translators if it is to reach the quality required. Sometimes such revision may be substantial, so that in effect the MT system is producing a ‘draft’ translation. As an alternative, the input text may be regularised(or ‘controlled’ in vocabulary and sentence structure) so that the MT system produces few errors which have to be corrected. Some MT systems have, however, been developed to deal with a very narrow range of text content and language style, and these may require little or no preparation or revision of texts.

第一种机器翻译需求是为了传播思想。自机器翻译系统出现之日起,这种需求可以说在某种程度上得到了满足。然而,要想达到用户需要的质量,机器翻译输出结果常常还需要由翻译家修改或进行"译后编辑"。在很多情况下,这些修改都是必需的,因此机器翻译系统实际上只是产生了一个"草稿型"译文。如果要减少后续的修改,就必须在翻译前对输入文件进行规整,对所用词语和句子结构进行"限制",使机器翻译系统不至于产生太多必须修改的错误。

The second type of demand – the use of MT for assimilation – has been met in the past as, in effect, a by-product of systems designed originally for the dissemination application. Since MT systems did not (and still cannot) produce high quality translations, some users have found that they can extract what they needed to know from the unedited output. They would rather have some translation, however poor, than no translation at all. With the coming of cheaper PC-based systems on themarket, this type of use has grown rapidly and substantially.

第二种需求是为了了解信息而使用机器翻译系统,这一需求实际上已经作为第一种需求的副产品得到了实现。既然机器翻译系统尚不能直接产生高质量的译文,因此用户能从未经编辑的译文中找出或猜出他们需要的东西也是很有帮助的,毕竟翻译出一部分总比一点没有翻译要好。在这种情况下,尽管机器翻译的译文结果很糟糕,但随着PC价格越来越低廉,这类机器翻译系统的需求量也大大增加了。

With the third type – MT for interchange – the situation is changing quickly. The demand for translations of electronic texts on the Internet, such as Web pages, electronic mail and even electronic ‘chat’ lists, is developing rapidly. In this context, the possibility of human translation is out of the question. The need is for immediate translation in order to convey the basic content of messages, however poor the input. MT systems are finding a ‘natural’ role, since they can operate virtually or in fact in real-time and on-line and there has been little objection to the inevitable poor quality. Another context for MT in personal interchange is the focus of much research. This is the development of systems for spoken language translation, e.g. in telephone conversations and in business negotiations. The problems of integrating speech recognition and automatic translation are obviously formidable, but progress is nevertheless being made. In the future – still distant, perhaps – we may expect on-line MT systems for the translation of speech in highly restricted domains.

第三种需求是以交流信息为目的的机器翻译。由于信息更新速度很快,不可能由人来翻译,用户需要马上得出翻译结果以便传达信息的基本内容。例如基于Internet的在线翻译系统,它能实时进行翻译,但翻译质量难尽人意。有些机器翻译系统目前正在探索如何"自然"地扮演自己的角色。另一种用于人际交流的机器翻译系统是口语翻译系统,它可以用在电话交谈、商务会谈等场合。目前有很多专家正在研发这类系统,其难点在于语音合成和自动翻译。这一领域的研究尽管进展缓慢,但我们仍然可以希望将来在非常受限的领域里应用在线口语机器翻译系统。

The fourth type of MT application – as components of information accesssystems – is the integration of translation software into: (i) systems for the search and retrieval of full texts of documents from databases (generally electronic versions of journal articles in science, medicine and technology), or for the retrieval of bibliographic information; (ii) systems for extracting information (e.g. product details) from texts, in particular from newspaper reports; (iii) systems for summarising texts; and (iv) systems for interrogating non-textual databases. This field is the focus of a number of projects in Europe at the present time, which have the aimof widening access for all members of the European Union to sources of data and information whatever the source language.

第四种机器翻译需求是信息访问系统提出的。在这里,机器翻译软件被集成到一系列子系统中,这些子系统包括如下几类:

1、 数据库的全文搜索和检索系统,一般是科学、医学和技术期刊杂志的电子版,或文献信息检索系统;
2、 从文本,特别是新闻报道中提取信息;
3、 对文本进行综述的系统;
4、 查询非文本数据库系统。

目前,这方面有几个项目正在欧洲进行,目的是使所有欧盟成员国都能访问数据和信息源,无论用什么源语言。

Future needs and developments
未来需求及发展

Despite the recent growth of systems for personal computers and of Internet services, it is still true to say that there is nothing yet really suitable for the independent professional translator, i.e. for those not working for large companies or in translation organizations. It is known that some translators have tried to apply commercial PC-based software to their needs, but the amount of adaptation required and the generally poor output has made them unsatisfactory and uneconomic. More suitable for the independent translator would be a cost-effective translation workstation. However, current workstations on the market are still too expensive for the individual translator. Although there is promise of low-cost computer tools for this potentially large market – e.g. terminology and concordancing software, and perhaps alignment software – there is no doubt that this segment is not being covered as well as many other areas.

尽管近年来针对微机和Internet的机器翻译服务有上升趋势,但实事求是地说,还没有一个机器翻译系统特别适合于自由职业的翻译工作者,也就是那些既不隶属于一个大公司也不在一个翻译组织里工作的人。据调查,有些翻译工作者曾试图使用商用PC翻译软件,但需要进行"译后编辑"的工作量太大,机器翻译输出结果太差,无法满足他们的需求。尽管人们希望能针对这一潜在的大市场开发出低成本的翻译辅助工具,例如术语协调软件、对齐软件等,但目前还没有产品面市。

Another area at present poorly served is the need for reliable but low-cost translation of documents into unknown foreign languages where users do not want to engage expert bilingual translators. There is no problem with translation into recipients’ own languages – PC systems can give adequate ‘rough’ versions for users 12 to get some idea of the basic message – but for translation into an unknown language there are still no solutions. There have been recently some cheap Japanese products which serve this specific ‘foreign language authoring’ demand in the case of writing business letters (based on standard phrases and document templates), but for other areas and for longer documents, where there is less ‘stereotyping’, there is nothing as yet. For translation into another language unknown (or poorly known) by the sender, what is really required is software which can be relied upon to provide good quality output (and most PC products are not good enough). A number of research groups are investigating interactive systems, where the sender composes an MT-friendly version of a letter or document in collaboration with the computer. With a sufficiently ‘normalised’ input text, the MT system can guarantee grammatically and stylistically correct output. As yet, however, this work (e.g. at GETA in France) is still at the laboratory stage (Boitet and Blanchon 1995).

目前面临挑战的另一个应用领域是将用户的输入译成用户所知甚少或未知的外国语,这时用户并不想充当双语翻译家的角色。机器翻译系统可以给出大致"粗略"的译文,至少可以告诉用户大致说的是什么。但对那些不知道目标语言的翻译,目前还没有什么解决办法。最近日本研制出一些廉价的产品,可以对特定的"外语授权(foreign language authoring)"提供服务。例如,写一封商务信函(基于标准短语和文件模板),但对其他领域或较长的文件,因为"规矩套路"很少,所以还不能编写。目前有几个研究小组正在研究交互式系统,发送者按照模板要求编写文档,如果输入文件足够"正规化",机器翻译系统就能保证语法和语言风格的正确输出。

The same is true for software combining MT with information access,information extraction, and summarisation software. There are no commercial systems yet on the market; developments are still at the research stages. The potential and the demand has been recognised: for example, in recent years, most research funds of the European Union have been focused not on MT or ‘pure’ natural language processing (as it was during the 1980s), but on projects for multilingual tools with direct applications in mind; many involve translation of some kind, usually within a restricted subject field and often in controlled conditions (Hutchins 1996; Schütz 1996). As just one example, the AVENTINUS project is developing a system for police forces in the area of drug control and law enforcement: information about drugs, criminals and suspects will be available on databases accessible in any of the European Union languages.

同样,将机器翻译技术与信息访问、信息提取和文摘软件结合在一起的尝试也处于研究阶段,目前市场上还没有商用产品,但开发商已经意识到其潜在的市场。例如,AVENTINUS项目是专门为警察部队在辑毒和执法方面开发的,用欧盟任何一种语言都可以访问中央数据库并查询关于毒品、犯罪和嫌疑犯的信息。目前,世界各国对这类跨语言应用的兴趣越来越大。最吸引人的应用是"跨语言信息检索",即允许用户用自己的语言搜索外语数据库。在这一系统中,大部分工作集中于如何建立和操作合适的翻译字典,以便将查询词串与数据库文档中的词和词组相匹配。相信在不久的将来会有这方面的商用软件出现。

The future application that is probably most desired by the general public is the translation of spoken language. But, from a commercial (and even research) perspective, the prospects for automatic speech translation are still distant (Krauwer et al. 1997). It was only in the 1980s that developments in speech recognition and synthesis made spoken language translation a feasible objective. In Japan a joint government and industry company ATR was established in 1986 near Osaka, and it is now one of the main centres for automatic speech translation. The aim is to develop a speaker-independent real-time telephone translation system for Japanese to English and vice versa, initially for hotel reservation and conference registration transactions. Other speech translation projects have been set up subsequently. The JANUS system is a research project at Carnegie-Mellon University and at Karlsruhe in Germany. The researchers are collaborating with ATR in a consortium (C-STAR), each developing speech recognition and synthesis modules for their own languages (English, German, Japanese). (One by-product of this research was mentioned earlier: the rapiddeployment project for custom-built systems in less-common languages.)

The fourth major effort in speech translation is the long-term VERBMOBIL project funded by the German Ministry for Research and Technology which began in May 1993. The aim is a portable aid for business negotiations as a supplement to users’ own knowledge of the languages (German, Japanese, English). Numerous German university groups are involved in fundamental research on dialogue linguistics, speech recognition and MT design; a prototype is nearing completion, and a demonstration product is targeted for early in the next century.

未来还有一种应用是公众迫切需要的,这就是口语翻译。但从商业角度或者研究角度看,全自动口语翻译还是一件十分遥远的事情。20世纪80年代,语音识别和语音合成技术取得的进展使人们感到口语翻译是可行的目标。日本ATR 公司建立了一个自动语音识别中心,目标是开发一个依赖于讲话者的实时日英、英日电话翻译系统。这一系统开始是面向旅馆预定房间和办理会议注册手续,后来增加了其他一些口语翻译系统。JANUS系统是卡耐基梅隆大学与德国Karlsruhe公司的合作研究项目。研究者与ATR合作形成一个合作体(C-STAR),每个研究者开发其母语(英语、德语、日语)的识别和生成模块。
口语翻译可能是目前机器翻译研究中最富有创新意义的领域,吸引了最多的资金和公众注意力。但观察家们并不相信这一领域在近期能取得迅速进展,因为书面语机器翻译花了数十年才达到现在的水平。口语翻译方面的另一项努力始于1993年5月由德国科学技术部出资支持的VERBMOBIL项目。该项目的目标是开发一个便携式商务谈判的辅助工具,好几所德国大学参与了这项对话语言学、言语识别和机器翻译设计的基础性研究工作。目前系统原型的开发已经接近尾声,很快将有演示产品出现。

Comparison of human and machine translation
人与机器翻译的比较

From this survey it should be apparent that the application of computers to the task of translating natural languages has not been and is unlikely to be a threat to the livelihood of professional translators. Those skills which the human translator can contribute will continue always to be in demand. There is no prospect, for example, that machine translation could ever attempt the translation of literary or legal texts. By contrast, for the rough translation of electronic texts on the Internet there is no rivalry for machine translation – human translators cannot compete in terms of speed, even if they were prepared to undertake poor quality translation of ephemeral material.

审视机器翻译的发展与现状,我们可以看到,使用计算机进行自然语言翻译并没有也不可能对职业翻译家的饭碗有什么威胁。翻译家的翻译技巧将继续得到重视。例如,机器翻译从来没有也不敢试图涉猎文学或法律文件的翻译。与之相对应的是,在Internet上粗略翻译电子邮件文本方面也没有什么方法能与机器翻译相比--人在速度方面比不过机器,即使翻译家愿意承担这类毫无保留价值的并常常是写得很差的文件的翻译工作,也难与机器翻译软件匹敌。

We may compare the relative merits of human and machine translationaccording to the categories of need and use outlined at the beginning of this paper. As far as the dissemination function (production of publishable translations) is concerned, human translation is more satisfactory and less costly overall whenever it is a question of translating one particular text in a unique subject domain (whether scientific, technical, medical, legal or literary). Machine translation demands the costly investment of dictionary maintenance and updating and the costly involvement of post-editing. This can be justifiable (i.e. cost-effective) only when large volumes of documentation within a particular domain are being translated. It is even more justifiable if translation is into more than one target language (when pre-editing and/or vocabulary and grammar control of original texts is possible), and when there is considerable repetition. For such tasks, the human translator would be overwhelmed by the scale of the task, by the boring repetitiveness and by the need to maintain terminological consistency. By contrast, the computer can handle large volumes and can automatically maintain consistency. In brief, machine translation is ideal for large scale and/or rapid translation of (boring) technical documentation, (highly repetitive) software localisation manuals, and real-time translation of weather reports. The human translator is (and will remain) unrivalled for non-repetitive linguistically sophisticated texts (e.g. in literature and law).

我们可以根据几种翻译需求来比较人与机器翻译的相对优缺点。对于"传播思想"的需求来说,凡是需要翻译某个特定领域(科学、技术、医学、法律或文学)的某段特殊文字,由人工翻译的质量更可靠且成本较低。而另一方面,机器翻译的字典维护和更新以及译后编辑需要较高的成本,因此只有当需要翻译某个领域的大量重复性文件时才是划算的。对这类翻译任务,翻译工作者会望而却步,因为工作量太大,且重复度太高,而且还要保持术语的一致性。简而言之,机器翻译适合于处理大量的、重复度高的技术资料、软件本地化手册、实时天气预报等资料,而人工翻译在语言非重复性的复杂文本方面有着无可替代的作用。

For the translation of texts for assimilation, where the quality of output can be poorer than that for texts to be published, it is clear that machine translation is an ideal solution. Human translators are not prepared (and resent being asked) to produce ‘rough’ translations of scientific and technical documents that may be read by only one person who wants to merely find out the general content and information and is unconcerned whether everything is intelligible or not, and who is certainly not deterred by stylistic awkwardness or grammatical errors. Of course, they might prefer to have output better than that presently provided by most MT systems, but if the only alternative option is no translation at all then machine translation is fully acceptable.

对于为了解信息而需要翻译的情形,显然使用机器翻译比较理想,因为此时对翻译质量要求不高。翻译家不打算而且也很反感被要"粗略"地翻译科学技术资料。当一个人只是想大致了解一下某篇文章的内容,并不想知道该文的一切细节,而且他也并不讨厌看到译文文体拙劣、语法错误百出时,机器翻译足可满足这种需求。 对于信息交流来说,在未来的一段时间里,人工翻译在翻译商务信函方面将继续起着主要作用,尤其是翻译那些内容比较敏感或与法律有关的文件。但对个人信件来说,机器翻译可能会用得越来越多。而对电子邮件、网络页面的信息提取以及基于计算机的信息服务来说,机器翻译可能是唯一可行的解决方案。 对于口语翻译而言,口语翻译家将继续占领市场,因为还没有迹象表明自动口语翻译会取代外交和商贸领域的口译家。尽管人们正在开展在高度受限领域的电话翻译研究,且未来也有希望实现,但对大量电话交谈来说,不可能出现什么系统来代替口译家。

Finally, MT systems are opening up new areas where human translation has never featured: the production of draft versions for authors writing in a foreign language, who need assistance in producing an original text; the on-line translation of television subtitles, the translation of information from databases; and no doubt, more such new applications will appear in the future. In these areas, as in others mentioned, there is no threat to the human translator because they were never included in the sphere of professional translation. There is no doubt that MT and human translation can and will co-exist in harmony and without conflict.

机器翻译系统正在开拓人工翻译从未涉及的领域:帮助需要用外语写作的作家生成文章草稿、在线电视解说词翻译、翻译数据库信息等。也许将来还会出现更多的新应用领域,但这些应用不会对职业翻译家构成威胁,因为这些领域是职业翻译家未曾涉猎的。今后,机器翻译与人工翻译将会各司其职、和谐共存。

英文原文:
http://ourworld.compuserve.com/homepages/wjhutchins/Beijing.pdf

沒有留言: