Computer-aided Translation 電腦（計算機）輔助翻譯

星期一, 8月 28, 2006

International Conference-cum-Software Exhibition on "Computer-aided Translation: Theory and Practice"

An International Conference-cum-Software Exhibition on "Computer-aided Translation: Theory and Practice" will be held in The Chinese University of Hong Kong on 2 September 2006, to celebrate the 5th anniverary of the M. A. in Computer-aided Translation Programme. This conference provides a platform for researchers and scholars in the fields of translation and computing to discuss the theoretical and practical aspects of computer-aided translation in the present age of informatoin technology.

This conference will include keynote speeches, invited talks and software demonstrations by world-class experts on the following topics:

Computer-aided translation
Computer-aided translation systems
Computer translation
Computer applications in translation
The use of corpora in translation

PROGRAMME

List of speakers and their papers：

Prof. Feng Zhiwei -- The Role of Translation Tools in the Information Age

Prof. Anthony Hartley -- Evaluating Machine Translation: Challenges and Techniques

Prof. Jason S. Chang -- Machine Translation Fast Forward

Prof. Zhang Yihua -- Computer-aided Lexicography

Prof. Chan Sin-wai -- A Review of the Literature on Computer(-aided) Translation, 1948-2006

Prof. Lynne Bowker -- Terminography in the Age of Translation Memory Tools: Reflections on Past Developments and Suggestions for the Future

Prof. Shih Chung-ling -- Computer-aided Translation Teaching of the Passive Construction

Prof. Kit Chunyu-- Lexical Resources from the Web for MT

Prof. Shi Xiaodong -- Weaving the Net and CAT Together

Prof. Sun Le -- A User Adaptive Framework for Computer-aided Translation Systems

Prof. Chang Baobao -- Developing CAT Tools for Translating Chinese Scientific Monographs

Ms Kerry Qiu -- New Features of SDL Trados 2006: A Tutorial

POSTER

星期四, 1月 19, 2006

第二期《電腦輔助翻譯通訊》出版

《電腦輔助翻譯通訊》是香港中文大學翻譯系電腦輔助翻譯碩士課程與校內外各界人士溝通的刊物，一方面提供課程結構、科目內容、教師資歷、學術活動、公開講座、論著出版、研究成果等各方面的資訊，另一方面亦會透過各種形式，例如學術會議的論文、「翻譯技術研討會」的講詞、及修讀學生的專題作業，來介紹電腦輔助翻譯的最新發展，讓大家對電腦輔助翻譯的範疇及本系的碩士課程都有較深入的認識。

《電腦輔助翻譯通訊》第二期已經出版了。本期內容有：

迎新日 Orientation Day

翻譯技術講座 Translation Technology Seminar
全球資訊管理 Global Information Management (GIM)
盧東林先生 Mr. George Lu

科目介紹 CAT Courses

電腦翻譯方法 Approaches to Computer Translation
王淑雯博士 Dr. Cecilia Wong

電腦翻譯Computer Translation
梁志康博士Dr. Leung Chi Hong

教員介紹Teacher Profile - Dr. Tom McArthur

教學資源Teaching Resources

全文下載： http://www.cuhk.edu.hk/tra/macat/bulletin/issue2/CATbulletin-issue2.pdf

《電腦輔助翻譯通訊》創刊號： http://traserver.tra.cuhk.edu.hk/macat/bulletin/issue1/CATbulletin-issue1.pdf

______________________________________
《電腦輔助翻譯通訊》
CAT Bulletin
香港中文大學翻譯系電腦輔助翻譯碩士課程季刊
A Quarterly Newsletter of the Master of Arts in Computer-aided Translation Programme
編輯Editors: 陳善偉教授馮恬瑩小姐 Professor Chan Sin-wai and Miss Tiffany Fung
出版Publisher: 香港中文大學翻譯系Department of Translation, The Chinese University of Hong Kong
電話Telephone: 2609 8551
傳真Fax: 2603 7843
電郵E-mail: ma-cat@arts.cuhk.edu.hk

星期二, 10月 25, 2005

NEC Develops Speech-to-Speech Translation Software

NEC Develops Speech-to-Speech Translation Software for Low Power Consumption Multi-Core Processors Optimal for Small Devices such as Mobile Phones

Tokyo, October 24, 2005 --- NEC Corporation today announced that it has succeeded in the development of Japanese-English/English-Japanese, automatic speech translation software for single-chip multi-core processors for small devices such as mobile phones, capable of operation at high speeds with low power consumption. NEC verified the high-speed automatic speech translation processing capability of this software on NEC Electronics' MP211 (note 1*) application processor for mobile phones, at an operating frequency of 200MHz, proving that operation of interpretation applications is technologically feasible on small devices like mobile phones.

Supporting a 50,000-word rich vocabulary, this software realizes automatic speech-to-speech interpretation of travel conversation through the development of a new parallel speech recognition method (

note 2*) for single-chip processors with several CPU cores, and a compact, lexical-rule-based, machine translation engine that unites dictionaries with grammar (note 3*) that is operable on small devices.

The features of this software include:

(1) A parallel, large-vocabulary, continuous speech recognition engine, which is built with a database consisting of a wide-range of conversation sounds and words that enables accurate speech recognition of spoken words.

(2) A lexical-rule-based, machine translation engine, which achieves high-performance translation of spoken words utilizing dictionaries/grammar, compiled from a wide range of language knowledge data.

(3) An advanced wave-concatenative speech synthesis engine, which realizes high-performance reading through an advanced, wave-concatenative speech synthesis method based on a wide-range of speech data.

(4) A total integration module that controls collaborative operation of the speech recognition engine, the machine translation engine, and the speech synthesis engine realizing automatic translation on a single processor for mobile phones.

With the advancement of an information society and increased freedom of movement across borders, the dynamic development of technology supporting automatic speech interpretation and translation to support communication between different languages is rapidly progressing.

NEC's developments in this area include:

Automatic Japanese – English/English – Japanese translation software for notebook PCs in 1999

Commercial launch of "Tabitsu" (American English version),communication software supporting English travel conversation, in 2001

PDA-operational Japanese – English travel conversation, automatic speech translation software in 2002

The next natural development for NEC was to expand this technology to small, light-weight portable devices that can be used anytime, anywhere. However, in order to achieve this goal it was necessary to realize large CPU power, required for speech recognition, and machine translation technology for interpretation, which are both exceedingly difficult to achieve on low-power multi-core processors for small devices such as mobile phones. NEC has accomplished this development through the synthesis of its proprietary parallel speech recognition technology and its compact machine translation technology with its multi-core processor technology.

NEC will continue to advance research of its speech recognition and language processing technologies toward the realization of a society where communication is possible anytime, anywhere.

-------------------------------------------------

Note:

(1) MP211:

With the combination of three ARM926EJ-S CPU cores and NEC Electronics' digital signal processor, the single-chip MP211 processor offers high-performance parallel processing capabilities optimized for applications such as mobile phones that are sensitive to power consumption requirements.

(2) Parallel speech recognition method (parallel, large-vocabulary, continuous speech recognition engine)

Through adoption of an acoustic look-ahead technique that can reduce word-search space, NEC's proprietary speech recognition method realizes acceleration of the entire interpretation process, dividing it into three steps comprised of recognition processing, reconstruction and maintenance of accuracy.

(3) Lexical-rule-based, machine translation engine:

This is a translation engine based on dictionary storing of each word's lexical rules that can easily expand both general translation rules and individual translation rules for fixed form expression, and which realizes excellent software downsizing and enhancement of translation quality.

星期一, 10月 17, 2005

Trados 7.0 搶先體驗

一、首先，從軟體版本相容性方面說起。

1. Trados 7.0

剛下載的7.0版本爲7.0.0.615 (如圖1)，使用 license 進行授權。我沒有卸載6.5版本，直接安裝Trados7.0，發現6.5版本的Trados Translator's Workbench 不能再使用了。但是原來創建的 TM 庫仍然保留在 Trados 7.0 中。

圖1

如果再次運行 TWB, 版本資訊會變成原來 6.5 的資訊。如圖2。

圖2

還好，Trados 7.0 帶有一個 license manager, 如圖3，可以用它來重新授權。

圖3

7.0可以使用三種方式獲得授權：

1. soft key,

2. dongle,

3. license。

原來的6.5是靠dongle來破解的，產生上面版本資訊問題的原因估計是再次運行Trados6.5 TWB，程式再次去載入6.5 的dongle資訊造成的。　　

Trados 從5.5/6.0到6.5升級時，如果先安裝6.5，再安裝6.5之前的版本，也必須再次運行6.5的dongle才能將版本恢復到6.5。　　

經過測試，發現Trados產品有一點很有趣，5.5可以和6.5共存，5.5可以和7.0共存，但是6.5和7.0就有衝突，不能在系統中同時安裝。

2. TRADOS MultiTerm 7 Desktop

現在的 Multiterm 7的版本是 7.0.1.320，如圖4。

圖4

MultiTerm 7 在安裝時就會提示，原系統中有 Multiterm 的舊版本存在，必須先卸載原來的Multiterm IX 版本才能繼續安裝。MultiTerm 7 會提示用戶先備份原來的 Termbase, 然後再卸載。

安裝MultiTerm 7 的步驟基本和原來的MultiTerm安裝步驟一致，只不過多了一步尋找 license。由於剛安裝完 trados 7.0，MultiTerm 7 會自動載入 license。　　

安裝完成前，MultiTerm 7 還會提示用戶是否恢復備份的 Termbase，如果選擇“是”，就可以將原來 Multiterm IX 中的 Termbase 恢復到新的版本中。(不知道是什麽原因，haha 恢復的 Termbase 不能用，提示：“Microsoft Jet 資料庫引擎找不到輸入表或查詢 "mtIndexes"，如圖5)

圖5

和trados 7.0同樣情況，MultiTerm 5.5 可以和MultiTerm IX 共存，MultiTerm 5.5可以和 MultiTerm 7.0共存，但是 MultiTerm IX 和 MultiTerm 7.0 不能在系統中同時安裝。

二、再談一談功能方面　　

產品的很多功能都有增強，在這裏就不一一細說了，大家瀏覽一下産品新功能的介紹就行了。haha 只想談一談和我們譯員們最息息相關的，或者讓 haha 感受最深刻的一些新變化。

1. Trados TWB 　　

1) trados 7.0 中的TWB可以使用“多 Termbase” 了，如圖 6。

圖6

這可真是 trados 的大進步。多年來 trados 的一個專案只能載入一個 termbase，如果需要在翻譯過程中轉換 termbase， TWB 還經常會死掉，這一點原來一直讓 DejavuX 看笑話。現在的 trados 7.0 終於可以出一口氣了。

2)　trados TWB 的 maintenance 終於可以“繼續搜索”了。如圖7

圖7

原來在維護 TM 時總是會覺得很麻煩，必須記住原來維護到了第幾頁，第幾個 TU，以便於下次再維護時從上次結束的地方開始。即使這樣，在維護一個大的TM庫時仍然會找不到，這一點非常不方便。現在trados 7.0 加上一個 pointer，每次維護完，下次可以繼續搜索，方便多了。　　　　

2. Multiterm 7.0　　　　

除了介面漂亮了之外，Multiterm 7.0 終於在主介面和工具欄上添加了 Entry 控制按鈕了。如圖8

圖8

在此之前的Multiterm 版本中沒有 Entry 控制按鈕，用戶只能通過 F3和F10來手動添加辭彙。非常不方便，而且不易於新手入門。現在的 Multiterm 7.0 終於可以算成是一個獨立的軟體了。　　　　

3. TagEditor 　　　　

歷來 TagEditor 都是作爲 Trados 的輔助工具出現的，其目的主要是做本地化專案，即翻譯 Microsoft Word 以外的一些帶有標記(Tag)的文檔。Trados 7.0 中的 TagEditor 開始自立了，TagEditor 現在可以以“所見即所得(WYSIWYG)”方式翻譯 word.doc 文檔了。如圖9。　　

圖9

Trados努力開發 TagEditor的功能，也是爲了儘量擺脫微軟的控制。當然，Trados 沒有做的那麽絕，TWB 還可以接合著 Word 來用。　　　　

哈哈，囉嗦了那麽多，Trados 7.0 確實進步了很多，haha 只是測試了一下，說了說自己的一點小體會。Trados 7.0 還有更多更好的功能等待著大家去發掘哪!~

---------------------------------------------------

原創：「翻譯中國」哈哈站長。在此表示感謝
http://www.fane.cn/forum_view.asp?forum_id=42&view_id=14672

口語機器翻譯系統

• ATR-ITL口語翻譯系統：近年來，國外開始自動翻譯電話的研究，在日本關西地區成立了自動電話研究所（Advanced Telecommunications Research Institute International – Interpreting Telecommunications Research laboratories, 簡稱ATR-ITL）,其目的在於把語音識別、語音合成技術用於機器翻譯中，實現語音機器翻譯。1989年，日本ATR研製了SL-TRANS系統。

• SpeechTrans系統和JANUS系統：由美國卡內基-梅隆大學(CMU)研製。

• KITANO系統：90年代初期，日本學者北野(Kitano)在京都大學期間，使用大規模平行計算，採用基於實例的方法進行語音翻譯實驗，證明了毫秒級的即時口語語音翻譯是可實現的。

• Verbmobil計劃：由德國聯邦政府教育、科學、研究與技術部(BMBF)支援，其目的在於“通過工業及科學界盡可能多的分支領域的合作與集中，在下一個世紀的語言技術及其經濟應用領域中為德國謀取國際領先地位”。

• Verbmobil制定了1993-2001年的研製計劃，其中自1993年至1996年的第一階段計劃吸收了德國、美國和日本的32個企業和高等學校的成員參加，政府投入資金4690萬馬克，企業投入資金310萬馬克，第一階段的目標是建立非特定人的、面向會面安排交談的口語語音翻譯系統。

C-STAR計劃：1991年成立了國際口語翻譯聯盟(Consortium for Speech Translation Advanced Research, 簡稱C-STAR)。C-STAR是一個以口語語音翻譯爲基本研究目標的國際合作組織，由來自12個國家的20個成員組成。

• 核心成員有來自7個國家7個單位：美國的卡內基-梅隆大學(CMU)、日本的ATR-ITL、德國的卡爾斯魯爾大學UKA (University Karlsruhe)、法國格勒諾布林大學自動翻譯研究中心GETA-CLIPS、義大利的科學技術研究所ITC-IRST、韓國的高級網路服務技術部ETRI、中國科學院自動化研究所國家模式識別重點實驗室(NLPR)。其他成員有德國西門子公司(Siemens)、香港科技大學等。

• C-STAR把多種語言的口語直接翻譯作爲一個科學工程來進行，通過建立平臺和演示來推動口語語音翻譯技術的迅速發展，使C-STAR成爲國際口語翻譯技術轉向工業應用的搖籃，以掃除人類的語言障礙。

• 作爲C-STAR核心成員的中國科學院自動化所NLPR已經建立了口語翻譯的試驗系統的相關平臺，完成了一個面向會面安排的漢英口語語音機器翻譯原型系統EasySchedule，正在開發可初步實用的漢英口語語音機器翻譯系統。

星期日, 10月 02, 2005

中國機器翻譯技術新突破

在剛剛結束的「國際口語翻譯研究聯盟（C-STAR Ⅲ）」組織的國際機器翻譯核心技術評測中，中國科學院自動化所「網絡內容管理與信息服務」團隊提交的中—英翻譯系統取得了BLEU得分0.528，NIST得分10.25的最好成績。

接近人工翻譯水準

C-STAR是國際上最早從事口語翻譯的國際性組織，這次中─英翻譯評測吸引了包括美國IBM公司、日本ATR、德國亞琛大學、意大利ITC、日本NTT等十二個著名研究機構參加。

據介紹，由中文到英文人工翻譯句子的BLEU得分一般在0.5~0.6，而機械翻譯能取得0.528分，說明該所的翻譯技術在評測應用場景下的翻譯結果已經接近人工翻譯的水平。

長期以來，科技資料的翻譯是科研機構、大學、情報部門以及大型企業的重要工作之一，隨著國際交往的增多，資料翻譯也顯得越來越重要。特別是對於一些大型的引進項目，其外文資料往往數以噸計，這些資料若僅靠人工翻譯，難度可想而知，並且不適應規模化生產，因而，依靠機器翻譯就顯得非常必要了。機器翻譯的發展史表明，伴隨著信息技術的發展以及全球網絡的一體化趨勢，機器翻譯技術得以不斷提高，翻譯軟件的輔助翻譯作用愈發明顯。

據了解，目前，機器翻譯軟件有上百種，根據這些軟件的翻譯特點，大致可以分為三大類﹕詞典翻譯類、漢化翻譯類和專業翻譯類。

機器翻譯領域佳作

詞典類翻譯軟件堪稱是多快好省的電子詞典，它可以迅速查詢英文單詞或詞組的詞義，並提供單詞的發音，為用戶了解單詞或詞組含義提供了極大的便利。漢化翻譯軟件的「智能漢化集成環境」，則為不會英語或英語水平不高的人提供了「語言障礙的全面解決方案」，包括內碼轉換、動態漢化和電子詞典等，很好地滿足了用戶漢化英文軟件、英文網頁，實現對屏幕英文信息的了解和文章的初步翻譯等，對信息獲取、了解文章大意有相當實際的作用。而專業翻譯系統，則專門面向專業或行業用戶。

根據國際上有關專家的分析，機器翻譯要想達到類似人工翻譯一樣的流暢程度，至少還要經歷十五年時間的持續研究。也就是說，在人類還無法明瞭「人腦是如何進行語言的模糊識別和判斷」的情況下，機器翻譯要想達到百分之一百的準確率是不可能的。即使如此，中國科學院自動化所的這套新研製的機械中─英翻譯系統，以接近於人工翻譯的最佳水準，還是為目前的機器翻譯領域奉獻了一篇佳作。

星期四, 9月 08, 2005

玄奘翻譯佛經的10個步驟

玄奘翻譯佛經的步驟，我們至今還採用，在網上找到了一篇英文文章，特張貼如下，供大家參考：

Xuan Zang, Possibly China's Greatest Translator

His 10-stage quality control process initiated more than 1300 years ago is far more thorough and exacting than any existing today.

Introduction

Every Chinese, young and old, within and outside China, knows the classical language rendering of the exploits of Xuan Zang, the pious Tang dynasty monk and his three storybook disciples: the indestructible Monkey King, the Great Sage; Brother Pig, the Eight Denials (of Buddhism); and Sand Monk, the third disciple. In real life, Xuan Zang was a truly remarkable Buddhist monk. He travelled on land, across mountains and deserts, through hostile and uncharted territories, to the birthplace of Buddha in the Indian sub-continent and thereafter returned to Chang’an (modern day Xi’an) with a set of Buddhist sutras. The voluminous sutras were written in the extremely difficult Sanskrit language. Together with his doyens of pupils, he completed the translation of some 75 volumes of the sutras into an equally difficult Chinese languages.

The 10 Stages

Buddhist sutras, translated into Chinese earlier than the Tang dynasty, were difficult to read and comprehend because the responsible translators were all Buddhist monks of non-Han Chinese origin. It took China several hundred years to groom its own selection of Buddhist monks who could master both the Sanskrit language and the complex Buddhism doctrines, which were written in Sanskrit. And Xuan Zang was recognized as the foremost among them. He was appointed chief of the Tang dynasty imperial translation centre. It was he who designed and implemented a translation workflow that would guarantee the quality of the final product. The process detailed below is well worth adopting by any modern day translation team.

Stage 1

Master Translator and Buddhism expert to jointly study and interpret the original text written in Sanskrit. It could involve one or more persons.

Stage 2

Members of the translation team to attend a recital of the text in question by the Master Translator. The purpose is to verify the accuracy of the interpretation undertaken in Stage 1. A recital is necessary because the scriptures were originally written for recitation.

Stage 3

A team of junior translators produces the first Chinese draft from the Sanskrit text. The draft includes is a transliteration of Sanskrit terms into Chinese equivalents.

Stage 4

Production of a complete Chinese version by a senior Han Chinese Buddhist monk trained to undertake scripture translation. This is the most important stage in the entire process involving a monk with an in-depth knowledge of Chinese culture and language.

Stage 5

Refinement of the complete Chinese version, construction and structure of sentences. This is a necessary stage because of the vast linguistic differences between the source and target language.

Stage 6

Reverse translation of the Chinese version into Sanskrit in order to verify the accuracy in the interpretation of the original text. Mistakes in interpretation are to be promptly rectified in the Chinese version.

Stage 7

Review of the verified Chinese version to identify errors in usage of characters, and refinement in linguistic expressions to improve readability.

Stage 8

Further polishing to improve the literary beauty of the language, adding linguistic colours to the otherwise monotonous writing.

Stage 9

Verification of the audio quality by reciting the translation aloud. The audio effect is important because scriptures are for preaching aloud to an audience.

Stage 10

Final check by the Master Translator.

Conclusion

Xuan Zang did a great job in the translation of the Buddhist sutras. He was not only an outstanding linguist but he was also wholeheartedly committed to the task as a devout Buddhist possessing an extraordinary understanding of its contents. Indeed, he dedicated thirteen years of his life to the task. The task was not simply limited to transforming one language into another. In order to effectively spread Buddhism, the sutras in Chinese would have to be spiritually understood by the Chinese devotees. In today's context, he had to take into account the marketplace. Was it intended for the gentry for whom Buddhism depended for their financial support or was it for the less literate populace to widen market share or was it for both?

星期三, 8月 24, 2005

機器翻譯測試 Google最準

搜尋大廠Google力求要讓Web更加國際化的野心最近因美國政府所做的一份機器翻譯軟體測試而更上一層，擊敗了對手包括學界與IBM的軟體。

在阿拉伯文翻譯至英文，以及中文翻譯至英文的測試上，Google獲得美國國家科學技術學院(National Institute of Science and Technology)的最高分。每一道測試包含翻譯100篇涵蓋從法新社（AFP）至新華社的新聞文章，日期從2004年12/1日2005年1/24日。測試結果已在本月稍早公布。

過去，電腦化翻譯的品質一直為人所詬病，但隨著運算性能的增加，加上資料樣本數更大，科學家已經有辦法改善機器的翻譯精確度。

例如，新創公司Language Weaver就寫出一種可翻譯半島電台(Al Jazeera)廣播的軟體。包括卡內基美隆大學(CMU)的語言科技研究所在內的多所大學都有此一領域的專門研究(但上述兩家今年都沒參加此次測試)。

Google的機器翻譯雖不完美，但卻足以領先對手甚多。以滿分1分來計算，Google的阿拉伯文翻譯得分0.5137，中文則得分0.3531。排名第二的是南加大資訊科學學院，得分前者為0.4657，中文則為0.3073。IBM排名第三，前者.4646，中文則為.2571。

其他參與者還包括英國愛丁堡大學(University of Edinburgh)、以及中國哈爾賓工業大學。NIST表示多數參加測式的軟體都是來自研發實驗室。

Google勝出的優勢可能是來自於該公司網羅了龐大的資料來源。一般而言，電腦翻譯軟體會隨著資料匯入的多寡而有表現上的差異。透過本身的搜尋業務，Google蒐集了上億的翻譯網頁。

Google跟Yahoo一樣，都將新客戶來源瞄準開發中國家。Google在自家網站中包含一些機器翻譯工具，並同時擁有多種國際版本。

CNET新聞專區：Michael Kanellos
23/8/2005

訂閱：文章 (Atom)

Computer-aided Translation
電腦（計算機）輔助翻譯