Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
-
Finetuning on bitext(bilingual finetuning) to translate from one language to another does not leverage the full capacity of the multilingual pretraining.
-
Multilingual translation models can be created through multilingual fine tuning.starting from pretrained models incor- porates the benefits of large quantities of unla- beled monolingual data, which is particularly important for low resource languages where bitext is not available.
-
Multi-lingual translation models with multilingual pretraining (with monolingual data) followed by multilingual finetuning (with parallel data).
Core Concepts:
- mBART is trained as a de- noising autoencoder, training to predict the original text
- Random span masking and order permutation used for creating text variation while training.
- Instead of training a model from language i to language j, a model is trained to translate N languages to N other languages.
- trained with temperature upsampling, which upsamples lower resource pairs so that the high resource languages do not dominate the training data.
- On average, all models have around 5.7 to 7 BLEU points improvement over bilingual baselines.
However, multilingual finetuning would mean that the same model capacity must model many directions rather than just one, which could decrease performance.
Ref: