Course Glossaries: Machine Translation early modern and modern history

Course Glossaries

2. Machine Translation early modern and modern history

Machine Translation (MT): The process of automatically translating text or speech from one language to another using computer algorithms.
Rule-Based Machine Translation (RBMT): An MT approach that uses linguistic rules and bilingual dictionaries to translate text, focusing on syntax, morphology, and grammar.
Statistical Machine Translation (SMT): An MT approach that uses statistical models based on bilingual text corpora to predict the probability of a translation.
Neural Machine Translation (NMT): An advanced MT approach that uses deep learning models, specifically neural networks, to translate text by analyzing large datasets and capturing context.
Example-Based Machine Translation (EBMT): An MT approach that relies on a database of previously translated examples, finding the closest matches to translate new sentences.
Hybrid Machine Translation: A combination of different MT approaches, often integrating RBMT and SMT/NMT to leverage the strengths of each method.
Bilingual Text Corpora: Large collections of text in two languages, used to train and evaluate MT systems by providing parallel examples of translations.
Parallel Corpora: A type of bilingual corpus where texts in two languages are aligned at the sentence level, facilitating the training of SMT and NMT systems.
Phrase-Based Machine Translation: A specific type of SMT that breaks down text into phrases rather than individual words, improving the fluency of translations.
Sequence-to-Sequence (Seq2Seq) Model: A deep learning model used in NMT that processes sequences of text to generate translations, maintaining the order and context of words.
Translation Model: In SMT, a model that predicts the most likely translation of a word or phrase based on bilingual text data.
Language Model: A model that assesses the fluency of the translated text by predicting the likelihood of word sequences in the target language.
Decoding Algorithm: In MT, the process that selects the best translation hypothesis based on the probabilities generated by the translation and language models.
Reordering Model: A component in SMT that predicts the correct word order in the target language, addressing differences in syntax between languages.
Neural Networks: Computational models inspired by the human brain, used in NMT to learn patterns and relationships in language data.
Attention Mechanism: A technique in NMT that allows the model to focus on specific parts of the input sentence, improving translation accuracy, especially for long sentences.
Encoder-Decoder Architecture: A framework used in NMT where the encoder processes the input text and the decoder generates the translation, often using an attention mechanism.
BLEU Score (Bilingual Evaluation Understudy): A metric for evaluating the quality of machine-generated translations by comparing them to one or more reference translations.
Pre-trained Models: In NMT, models that have been trained on large datasets and can be fine-tuned for specific tasks or languages, speeding up the development process.
Transfer Learning: The practice of applying knowledge gained from one task (e.g., translating English to French) to another related task (e.g., translating English to Spanish), commonly used in NMT.
Back-Translation: A method in NMT training where target language data is translated back into the source language to create additional training data, improving translation quality.
Subword Units: Smaller language components, such as prefixes or suffixes, used in NMT to handle rare or compound words more effectively.
Tokenization: The process of breaking down text into smaller units, such as words or subwords, to facilitate processing in MT systems.
Alignment: The process of matching corresponding words or phrases between the source and target languages in a parallel corpus, crucial for training SMT and NMT systems.
Word Embeddings: Dense vector representations of words used in NMT to capture semantic meanings and relationships between words in different languages.
Domain Adaptation: The process of fine-tuning an MT system to perform better in a specific domain, such as legal or medical translation.
Cross-Lingual Transfer: The ability of an MT system to apply knowledge from one language pair to another, enhancing translation quality across multiple languages.
Multilingual Translation: An NMT approach that handles multiple languages simultaneously, using a shared model that can translate between any pair of supported languages.
Low-Resource Languages: Languages that have limited digital resources, such as corpora or dictionaries, posing challenges for MT development.
Out-of-Vocabulary (OOV) Words: Words that are not present in the training data of an MT system, often leading to translation errors.
Post-Editing: The process of manually correcting errors in machine-generated translations to improve accuracy and fluency.
Syntactic Parsing: The process of analyzing the grammatical structure of sentences, used in RBMT to generate accurate translations.
Morphological Analysis: The study of the structure of words and their components, such as roots and affixes, used in RBMT to handle inflected languages.
Lexical Disambiguation: The process of determining the correct meaning of a word that has multiple possible interpretations, crucial in MT for accurate translations.
Semantic Role Labeling: Identifying the roles played by words in a sentence, such as agent or object, to improve the accuracy of MT systems.
Language Pair: The combination of a source language and a target language in MT, such as English to Spanish.
Pivot Language: An intermediate language used in MT when direct translation between two languages is difficult due to lack of resources.
Contextual Embeddings: Word embeddings that take into account the context in which a word appears, improving translation quality in NMT.
Data Augmentation: The process of artificially increasing the size of a training dataset by creating variations of existing data, used to improve MT performance.
Beam Search: A decoding algorithm used in NMT that considers multiple translation hypotheses simultaneously to find the most probable translation.
Dropout: A regularization technique in NMT that prevents overfitting by randomly dropping units in the neural network during training.
Parallel Sentence Mining: The process of automatically finding and extracting parallel sentences from large bilingual corpora, used to improve the training of MT systems.
Translationese: A term referring to the distinct linguistic patterns that emerge in machine-generated translations, often detectable by statistical analysis.
Interactive MT: An MT approach where human translators interact with the MT system during the translation process, refining the output in real-time.
Corpus-Based MT: An approach that relies heavily on large text corpora for training MT systems, typical in SMT and NMT.
Phrase Table: In SMT, a table that lists possible translations for phrases in the source language along with their probabilities.
Cross-Entropy Loss: A loss function used in NMT training to measure the difference between the predicted translation and the actual translation.
Neural Language Model: A type of language model used in NMT that predicts the next word in a sentence based on the context of previous words.
Knowledge Distillation: A technique in NMT where a smaller, simpler model is trained to replicate the behavior of a larger, more complex model, improving efficiency.
Transfer-Based MT: An MT approach that transfers linguistic structures from the source language to the target language, relying on syntactic and semantic transfer rules.