# Training MAMMOTH 101 This example uses the [Europarl parallel corpus](https://www.statmt.org/europarl/) - a multilingual resource extracted from European Parliament proceedings, containing text in 21 European languages. If you use the data in your research, please cite the paper by Philipp Koehn, "Europarl: A Parallel Corpus for Statistical Machine Translation," presented at the MT Summit 2005. The tokenization is done with [sentencepiece](https://github.com/google/sentencepiece). ## Step 0: Download the data and SentencePiece model Download the Release v7 - a further expanded and improved version of the Europarl corpus on 15 May 2012 - from the original website or download the processed data by us: ```bash wget https://mammoth101.a3s.fi/europarl.tar.gz ``` We use a SentencePiece model trained on OPUS Tatoeba Challenge data with 64k vocabulary size. Download the SentencePiece model and the vocabulary: ```bash # Download the SentencePiece model wget https://mammoth101.a3s.fi/opusTC.mul.64k.spm # Download the vocabulary wget https://mammoth101.a3s.fi/opusTC.mul.vocab.onmt ``` ## Step 1: Prepare the data Then, read parallel text data, processes it, and generate output files for training and validation sets. Here's a high-level summary of the main processing steps. For each language in 'langs,' - read parallel data files. - clean the data by removing empty lines. - shuffle the data randomly. - tokenizes the text using SentencePiece and writes the tokenized data to separate output files for training and validation sets. We use a positional argument 'lang' that can accept one or more values, for specifying the languages (e.g., `bg` and `cs` as used in Europarl) to process. You're free to skip this step if you directly download the processed data. For details, see [this page](../prepare_data.md#europarl). ## Step 3: Configuration We can define a configuration for the model, sharing scheme, and training arguments. You can choose to manually write your config in a yaml file, or use our automatic config generation tool. Here, we provide two configuration examples for training a dummy transformer model in single-node and multi-node settings.
Single-node configuration ```yaml src_vocab: 'bg': path_to_vocab/opusTC.mul.vocab.onmt 'cs': path_to_vocab/opusTC.mul.vocab.onmt 'da': path_to_vocab/opusTC.mul.vocab.onmt 'de': path_to_vocab/opusTC.mul.vocab.onmt 'el': path_to_vocab/opusTC.mul.vocab.onmt 'en': path_to_vocab/opusTC.mul.vocab.onmt 'es': path_to_vocab/opusTC.mul.vocab.onmt 'et': path_to_vocab/opusTC.mul.vocab.onmt 'fi': path_to_vocab/opusTC.mul.vocab.onmt 'fr': path_to_vocab/opusTC.mul.vocab.onmt 'hu': path_to_vocab/opusTC.mul.vocab.onmt 'it': path_to_vocab/opusTC.mul.vocab.onmt 'lt': path_to_vocab/opusTC.mul.vocab.onmt 'lv': path_to_vocab/opusTC.mul.vocab.onmt 'nl': path_to_vocab/opusTC.mul.vocab.onmt 'pl': path_to_vocab/opusTC.mul.vocab.onmt 'pt': path_to_vocab/opusTC.mul.vocab.onmt 'ro': path_to_vocab/opusTC.mul.vocab.onmt 'sk': path_to_vocab/opusTC.mul.vocab.onmt 'sl': path_to_vocab/opusTC.mul.vocab.onmt 'sv': path_to_vocab/opusTC.mul.vocab.onmt tgt_vocab: 'bg': path_to_vocab/opusTC.mul.vocab.onmt 'cs': path_to_vocab/opusTC.mul.vocab.onmt 'da': path_to_vocab/opusTC.mul.vocab.onmt 'de': path_to_vocab/opusTC.mul.vocab.onmt 'el': path_to_vocab/opusTC.mul.vocab.onmt 'en': path_to_vocab/opusTC.mul.vocab.onmt 'es': path_to_vocab/opusTC.mul.vocab.onmt 'et': path_to_vocab/opusTC.mul.vocab.onmt 'fi': path_to_vocab/opusTC.mul.vocab.onmt 'fr': path_to_vocab/opusTC.mul.vocab.onmt 'hu': path_to_vocab/opusTC.mul.vocab.onmt 'it': path_to_vocab/opusTC.mul.vocab.onmt 'lt': path_to_vocab/opusTC.mul.vocab.onmt 'lv': path_to_vocab/opusTC.mul.vocab.onmt 'nl': path_to_vocab/opusTC.mul.vocab.onmt 'pl': path_to_vocab/opusTC.mul.vocab.onmt 'pt': path_to_vocab/opusTC.mul.vocab.onmt 'ro': path_to_vocab/opusTC.mul.vocab.onmt 'sk': path_to_vocab/opusTC.mul.vocab.onmt 'sl': path_to_vocab/opusTC.mul.vocab.onmt 'sv': path_to_vocab/opusTC.mul.vocab.onmt overwrite: False tasks: # GPU 0:0 train_bg-en: src_tgt: bg-en enc_sharing_group: [bg] dec_sharing_group: [en] node_gpu: 0:0 path_src: path_to_europarl/bg-en/train.bg-en.bg.sp path_tgt: path_to_europarl/bg-en/train.bg-en.en.sp path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.en.sp transforms: [filtertoolong] train_bg-bg: src_tgt: bg-bg enc_sharing_group: [bg] dec_sharing_group: [bg] node_gpu: 0:0 path_src: path_to_europarl/bg-en/train.bg-en.bg.sp path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp transforms: [filtertoolong, denoising] train_en-bg: src_tgt: en-bg enc_sharing_group: [en] dec_sharing_group: [bg] node_gpu: 0:0 path_src: path_to_europarl/bg-en/train.bg-en.en.sp path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp path_valid_src: path_to_europarl/bg-en/valid.bg-en.en.sp path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp transforms: [filtertoolong] # GPU 0:1 train_cs-en: src_tgt: cs-en enc_sharing_group: [cs] dec_sharing_group: [en] node_gpu: 0:1 path_src: path_to_europarl/cs-en/train.cs-en.cs.sp path_tgt: path_to_europarl/cs-en/train.cs-en.en.sp path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.en.sp transforms: [filtertoolong] train_cs-cs: src_tgt: cs-cs enc_sharing_group: [cs] dec_sharing_group: [cs] node_gpu: 0:1 path_src: path_to_europarl/cs-en/train.cs-en.cs.sp path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp transforms: [filtertoolong, denoising] train_en-cs: src_tgt: en-cs enc_sharing_group: [en] dec_sharing_group: [cs] node_gpu: 0:1 path_src: path_to_europarl/cs-en/train.cs-en.en.sp path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp path_valid_src: path_to_europarl/cs-en/valid.cs-en.en.sp path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp transforms: [filtertoolong] # GPU 0:2 train_da-en: src_tgt: da-en enc_sharing_group: [da] dec_sharing_group: [en] node_gpu: 0:2 path_src: path_to_europarl/da-en/train.da-en.da.sp path_tgt: path_to_europarl/da-en/train.da-en.en.sp path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp path_valid_tgt: path_to_europarl/da-en/valid.da-en.en.sp transforms: [filtertoolong] train_da-da: src_tgt: da-da enc_sharing_group: [da] dec_sharing_group: [da] node_gpu: 0:2 path_src: path_to_europarl/da-en/train.da-en.da.sp path_tgt: path_to_europarl/da-en/train.da-en.da.sp path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp transforms: [filtertoolong, denoising] train_en-da: src_tgt: en-da enc_sharing_group: [en] dec_sharing_group: [da] node_gpu: 0:2 path_src: path_to_europarl/da-en/train.da-en.en.sp path_tgt: path_to_europarl/da-en/train.da-en.da.sp path_valid_src: path_to_europarl/da-en/valid.da-en.en.sp path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp transforms: [filtertoolong] # GPU 0:3 train_de-en: src_tgt: de-en enc_sharing_group: [de] dec_sharing_group: [en] node_gpu: 0:3 path_src: path_to_europarl/de-en/train.de-en.de.sp path_tgt: path_to_europarl/de-en/train.de-en.en.sp path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp path_valid_tgt: path_to_europarl/de-en/valid.de-en.en.sp transforms: [filtertoolong] train_de-de: src_tgt: de-de enc_sharing_group: [de] dec_sharing_group: [de] node_gpu: 0:3 path_src: path_to_europarl/de-en/train.de-en.de.sp path_tgt: path_to_europarl/de-en/train.de-en.de.sp path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp transforms: [filtertoolong, denoising] train_en-de: src_tgt: en-de enc_sharing_group: [en] dec_sharing_group: [de] node_gpu: 0:3 path_src: path_to_europarl/de-en/train.de-en.en.sp path_tgt: path_to_europarl/de-en/train.de-en.de.sp path_valid_src: path_to_europarl/de-en/valid.de-en.en.sp path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp transforms: [filtertoolong] # GPU 0:0 train_el-en: src_tgt: el-en enc_sharing_group: [el] dec_sharing_group: [en] node_gpu: 0:0 path_src: path_to_europarl/el-en/train.el-en.el.sp path_tgt: path_to_europarl/el-en/train.el-en.en.sp path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp path_valid_tgt: path_to_europarl/el-en/valid.el-en.en.sp transforms: [filtertoolong] train_el-el: src_tgt: el-el enc_sharing_group: [el] dec_sharing_group: [el] node_gpu: 0:0 path_src: path_to_europarl/el-en/train.el-en.el.sp path_tgt: path_to_europarl/el-en/train.el-en.el.sp path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp transforms: [filtertoolong, denoising] train_en-el: src_tgt: en-el enc_sharing_group: [en] dec_sharing_group: [el] node_gpu: 0:0 path_src: path_to_europarl/el-en/train.el-en.en.sp path_tgt: path_to_europarl/el-en/train.el-en.el.sp path_valid_src: path_to_europarl/el-en/valid.el-en.en.sp path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp transforms: [filtertoolong] # GPU 0:1 train_es-en: src_tgt: es-en enc_sharing_group: [es] dec_sharing_group: [en] node_gpu: 0:1 path_src: path_to_europarl/es-en/train.es-en.es.sp path_tgt: path_to_europarl/es-en/train.es-en.en.sp path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp path_valid_tgt: path_to_europarl/es-en/valid.es-en.en.sp transforms: [filtertoolong] train_es-es: src_tgt: es-es enc_sharing_group: [es] dec_sharing_group: [es] node_gpu: 0:1 path_src: path_to_europarl/es-en/train.es-en.es.sp path_tgt: path_to_europarl/es-en/train.es-en.es.sp path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp transforms: [filtertoolong, denoising] train_en-es: src_tgt: en-es enc_sharing_group: [en] dec_sharing_group: [es] node_gpu: 0:1 path_src: path_to_europarl/es-en/train.es-en.en.sp path_tgt: path_to_europarl/es-en/train.es-en.es.sp path_valid_src: path_to_europarl/es-en/valid.es-en.en.sp path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp transforms: [filtertoolong] # GPU 0:2 train_et-en: src_tgt: et-en enc_sharing_group: [et] dec_sharing_group: [en] node_gpu: 0:2 path_src: path_to_europarl/et-en/train.et-en.et.sp path_tgt: path_to_europarl/et-en/train.et-en.en.sp path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp path_valid_tgt: path_to_europarl/et-en/valid.et-en.en.sp transforms: [filtertoolong] train_et-et: src_tgt: et-et enc_sharing_group: [et] dec_sharing_group: [et] node_gpu: 0:2 path_src: path_to_europarl/et-en/train.et-en.et.sp path_tgt: path_to_europarl/et-en/train.et-en.et.sp path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp transforms: [filtertoolong, denoising] train_en-et: src_tgt: en-et enc_sharing_group: [en] dec_sharing_group: [et] node_gpu: 0:2 path_src: path_to_europarl/et-en/train.et-en.en.sp path_tgt: path_to_europarl/et-en/train.et-en.et.sp path_valid_src: path_to_europarl/et-en/valid.et-en.en.sp path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp transforms: [filtertoolong] # GPU 0:3 train_fi-en: src_tgt: fi-en enc_sharing_group: [fi] dec_sharing_group: [en] node_gpu: 0:3 path_src: path_to_europarl/fi-en/train.fi-en.fi.sp path_tgt: path_to_europarl/fi-en/train.fi-en.en.sp path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.en.sp transforms: [filtertoolong] train_fi-fi: src_tgt: fi-fi enc_sharing_group: [fi] dec_sharing_group: [fi] node_gpu: 0:3 path_src: path_to_europarl/fi-en/train.fi-en.fi.sp path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp transforms: [filtertoolong, denoising] train_en-fi: src_tgt: en-fi enc_sharing_group: [en] dec_sharing_group: [fi] node_gpu: 0:3 path_src: path_to_europarl/fi-en/train.fi-en.en.sp path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp path_valid_src: path_to_europarl/fi-en/valid.fi-en.en.sp path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp transforms: [filtertoolong] # GPU 0:0 train_fr-en: src_tgt: fr-en enc_sharing_group: [fr] dec_sharing_group: [en] node_gpu: 0:0 path_src: path_to_europarl/fr-en/train.fr-en.fr.sp path_tgt: path_to_europarl/fr-en/train.fr-en.en.sp path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.en.sp transforms: [filtertoolong] train_fr-fr: src_tgt: fr-fr enc_sharing_group: [fr] dec_sharing_group: [fr] node_gpu: 0:0 path_src: path_to_europarl/fr-en/train.fr-en.fr.sp path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp transforms: [filtertoolong, denoising] train_en-fr: src_tgt: en-fr enc_sharing_group: [en] dec_sharing_group: [fr] node_gpu: 0:0 path_src: path_to_europarl/fr-en/train.fr-en.en.sp path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp path_valid_src: path_to_europarl/fr-en/valid.fr-en.en.sp path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp transforms: [filtertoolong] # GPU 0:1 train_hu-en: src_tgt: hu-en enc_sharing_group: [hu] dec_sharing_group: [en] node_gpu: 0:1 path_src: path_to_europarl/hu-en/train.hu-en.hu.sp path_tgt: path_to_europarl/hu-en/train.hu-en.en.sp path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.en.sp transforms: [filtertoolong] train_hu-hu: src_tgt: hu-hu enc_sharing_group: [hu] dec_sharing_group: [hu] node_gpu: 0:1 path_src: path_to_europarl/hu-en/train.hu-en.hu.sp path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp transforms: [filtertoolong, denoising] train_en-hu: src_tgt: en-hu enc_sharing_group: [en] dec_sharing_group: [hu] node_gpu: 0:1 path_src: path_to_europarl/hu-en/train.hu-en.en.sp path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp path_valid_src: path_to_europarl/hu-en/valid.hu-en.en.sp path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp transforms: [filtertoolong] # GPU 0:2 train_it-en: src_tgt: it-en enc_sharing_group: [it] dec_sharing_group: [en] node_gpu: 0:2 path_src: path_to_europarl/it-en/train.it-en.it.sp path_tgt: path_to_europarl/it-en/train.it-en.en.sp path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp path_valid_tgt: path_to_europarl/it-en/valid.it-en.en.sp transforms: [filtertoolong] train_it-it: src_tgt: it-it enc_sharing_group: [it] dec_sharing_group: [it] node_gpu: 0:2 path_src: path_to_europarl/it-en/train.it-en.it.sp path_tgt: path_to_europarl/it-en/train.it-en.it.sp path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp transforms: [filtertoolong, denoising] train_en-it: src_tgt: en-it enc_sharing_group: [en] dec_sharing_group: [it] node_gpu: 0:2 path_src: path_to_europarl/it-en/train.it-en.en.sp path_tgt: path_to_europarl/it-en/train.it-en.it.sp path_valid_src: path_to_europarl/it-en/valid.it-en.en.sp path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp transforms: [filtertoolong] # GPU 0:3 train_lt-en: src_tgt: lt-en enc_sharing_group: [lt] dec_sharing_group: [en] node_gpu: 0:3 path_src: path_to_europarl/lt-en/train.lt-en.lt.sp path_tgt: path_to_europarl/lt-en/train.lt-en.en.sp path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.en.sp transforms: [filtertoolong] train_lt-lt: src_tgt: lt-lt enc_sharing_group: [lt] dec_sharing_group: [lt] node_gpu: 0:3 path_src: path_to_europarl/lt-en/train.lt-en.lt.sp path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp transforms: [filtertoolong, denoising] train_en-lt: src_tgt: en-lt enc_sharing_group: [en] dec_sharing_group: [lt] node_gpu: 0:3 path_src: path_to_europarl/lt-en/train.lt-en.en.sp path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp path_valid_src: path_to_europarl/lt-en/valid.lt-en.en.sp path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp transforms: [filtertoolong] # GPU 0:0 train_lv-en: src_tgt: lv-en enc_sharing_group: [lv] dec_sharing_group: [en] node_gpu: 0:0 path_src: path_to_europarl/lv-en/train.lv-en.lv.sp path_tgt: path_to_europarl/lv-en/train.lv-en.en.sp path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.en.sp transforms: [filtertoolong] train_lv-lv: src_tgt: lv-lv enc_sharing_group: [lv] dec_sharing_group: [lv] node_gpu: 0:0 path_src: path_to_europarl/lv-en/train.lv-en.lv.sp path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp transforms: [filtertoolong, denoising] train_en-lv: src_tgt: en-lv enc_sharing_group: [en] dec_sharing_group: [lv] node_gpu: 0:0 path_src: path_to_europarl/lv-en/train.lv-en.en.sp path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp path_valid_src: path_to_europarl/lv-en/valid.lv-en.en.sp path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp transforms: [filtertoolong] # GPU 0:1 train_nl-en: src_tgt: nl-en enc_sharing_group: [nl] dec_sharing_group: [en] node_gpu: 0:1 path_src: path_to_europarl/nl-en/train.nl-en.nl.sp path_tgt: path_to_europarl/nl-en/train.nl-en.en.sp path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.en.sp transforms: [filtertoolong] train_nl-nl: src_tgt: nl-nl enc_sharing_group: [nl] dec_sharing_group: [nl] node_gpu: 0:1 path_src: path_to_europarl/nl-en/train.nl-en.nl.sp path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp transforms: [filtertoolong, denoising] train_en-nl: src_tgt: en-nl enc_sharing_group: [en] dec_sharing_group: [nl] node_gpu: 0:1 path_src: path_to_europarl/nl-en/train.nl-en.en.sp path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp path_valid_src: path_to_europarl/nl-en/valid.nl-en.en.sp path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp transforms: [filtertoolong] # GPU 0:2 train_pl-en: src_tgt: pl-en enc_sharing_group: [pl] dec_sharing_group: [en] node_gpu: 0:2 path_src: path_to_europarl/pl-en/train.pl-en.pl.sp path_tgt: path_to_europarl/pl-en/train.pl-en.en.sp path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.en.sp transforms: [filtertoolong] train_pl-pl: src_tgt: pl-pl enc_sharing_group: [pl] dec_sharing_group: [pl] node_gpu: 0:2 path_src: path_to_europarl/pl-en/train.pl-en.pl.sp path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp transforms: [filtertoolong, denoising] train_en-pl: src_tgt: en-pl enc_sharing_group: [en] dec_sharing_group: [pl] node_gpu: 0:2 path_src: path_to_europarl/pl-en/train.pl-en.en.sp path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp path_valid_src: path_to_europarl/pl-en/valid.pl-en.en.sp path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp transforms: [filtertoolong] # GPU 0:3 train_pt-en: src_tgt: pt-en enc_sharing_group: [pt] dec_sharing_group: [en] node_gpu: 0:3 path_src: path_to_europarl/pt-en/train.pt-en.pt.sp path_tgt: path_to_europarl/pt-en/train.pt-en.en.sp path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.en.sp transforms: [filtertoolong] train_pt-pt: src_tgt: pt-pt enc_sharing_group: [pt] dec_sharing_group: [pt] node_gpu: 0:3 path_src: path_to_europarl/pt-en/train.pt-en.pt.sp path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp transforms: [filtertoolong, denoising] train_en-pt: src_tgt: en-pt enc_sharing_group: [en] dec_sharing_group: [pt] node_gpu: 0:3 path_src: path_to_europarl/pt-en/train.pt-en.en.sp path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp path_valid_src: path_to_europarl/pt-en/valid.pt-en.en.sp path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp transforms: [filtertoolong] # GPU 0:0 train_ro-en: src_tgt: ro-en enc_sharing_group: [ro] dec_sharing_group: [en] node_gpu: 0:0 path_src: path_to_europarl/ro-en/train.ro-en.ro.sp path_tgt: path_to_europarl/ro-en/train.ro-en.en.sp path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.en.sp transforms: [filtertoolong] train_ro-ro: src_tgt: ro-ro enc_sharing_group: [ro] dec_sharing_group: [ro] node_gpu: 0:0 path_src: path_to_europarl/ro-en/train.ro-en.ro.sp path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp transforms: [filtertoolong, denoising] train_en-ro: src_tgt: en-ro enc_sharing_group: [en] dec_sharing_group: [ro] node_gpu: 0:0 path_src: path_to_europarl/ro-en/train.ro-en.en.sp path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp path_valid_src: path_to_europarl/ro-en/valid.ro-en.en.sp path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp transforms: [filtertoolong] # GPU 0:1 train_sk-en: src_tgt: sk-en enc_sharing_group: [sk] dec_sharing_group: [en] node_gpu: 0:1 path_src: path_to_europarl/sk-en/train.sk-en.sk.sp path_tgt: path_to_europarl/sk-en/train.sk-en.en.sp path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.en.sp transforms: [filtertoolong] train_sk-sk: src_tgt: sk-sk enc_sharing_group: [sk] dec_sharing_group: [sk] node_gpu: 0:1 path_src: path_to_europarl/sk-en/train.sk-en.sk.sp path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp transforms: [filtertoolong, denoising] train_en-sk: src_tgt: en-sk enc_sharing_group: [en] dec_sharing_group: [sk] node_gpu: 0:1 path_src: path_to_europarl/sk-en/train.sk-en.en.sp path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp path_valid_src: path_to_europarl/sk-en/valid.sk-en.en.sp path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp transforms: [filtertoolong] # GPU 0:2 train_sl-en: src_tgt: sl-en enc_sharing_group: [sl] dec_sharing_group: [en] node_gpu: 0:2 path_src: path_to_europarl/sl-en/train.sl-en.sl.sp path_tgt: path_to_europarl/sl-en/train.sl-en.en.sp path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.en.sp transforms: [filtertoolong] train_sl-sl: src_tgt: sl-sl enc_sharing_group: [sl] dec_sharing_group: [sl] node_gpu: 0:2 path_src: path_to_europarl/sl-en/train.sl-en.sl.sp path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp transforms: [filtertoolong, denoising] train_en-sl: src_tgt: en-sl enc_sharing_group: [en] dec_sharing_group: [sl] node_gpu: 0:2 path_src: path_to_europarl/sl-en/train.sl-en.en.sp path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp path_valid_src: path_to_europarl/sl-en/valid.sl-en.en.sp path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp transforms: [filtertoolong] # GPU 0:3 train_sv-en: src_tgt: sv-en enc_sharing_group: [sv] dec_sharing_group: [en] node_gpu: 0:3 path_src: path_to_europarl/sv-en/train.sv-en.sv.sp path_tgt: path_to_europarl/sv-en/train.sv-en.en.sp path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.en.sp transforms: [filtertoolong] train_sv-sv: src_tgt: sv-sv enc_sharing_group: [sv] dec_sharing_group: [sv] node_gpu: 0:3 path_src: path_to_europarl/sv-en/train.sv-en.sv.sp path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp transforms: [filtertoolong, denoising] train_en-sv: src_tgt: en-sv enc_sharing_group: [en] dec_sharing_group: [sv] node_gpu: 0:3 path_src: path_to_europarl/sv-en/train.sv-en.en.sp path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp path_valid_src: path_to_europarl/sv-en/valid.sv-en.en.sp path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp transforms: [filtertoolong] ### Transform related opts: #### Filter src_seq_length: 200 tgt_seq_length: 200 #### Bart src_subword_type: sentencepiece tgt_subword_type: sentencepiece mask_ratio: 0.2 replace_length: 1 # silently ignore empty lines in the data skip_empty_level: silent batch_size: 4096 batch_type: tokens normalization: tokens valid_batch_size: 4096 max_generator_batches: 2 src_vocab_size: 100000 tgt_vocab_size: 100000 encoder_type: transformer decoder_type: transformer model_dim: 512 transformer_ff: 2048 heads: 8 enc_layers: [6] dec_layers: [6] dropout: 0.1 label_smoothing: 0.1 param_init: 0.0 param_init_glorot: true position_encoding: true valid_steps: 10000 warmup_steps: 10000 report_every: 100 save_checkpoint_steps: 5000000 # save_checkpoint_steps: 50000 keep_checkpoint: -1 accum_count: 1 optim: adafactor decay_method: none learning_rate: 3.0 max_grad_norm: 0.0 seed: 3435 model_type: text save_all_gpus: false world_size: 4 gpu_ranks: [0, 1, 2, 3] node_rank: 0 early_stopping: 5 early_stopping_criteria: accuracy ```
Multi-node configuration ```yaml src_vocab: 'bg': path_to_vocab/opusTC.mul.vocab.onmt 'cs': path_to_vocab/opusTC.mul.vocab.onmt 'da': path_to_vocab/opusTC.mul.vocab.onmt 'de': path_to_vocab/opusTC.mul.vocab.onmt 'el': path_to_vocab/opusTC.mul.vocab.onmt 'en': path_to_vocab/opusTC.mul.vocab.onmt 'es': path_to_vocab/opusTC.mul.vocab.onmt 'et': path_to_vocab/opusTC.mul.vocab.onmt 'fi': path_to_vocab/opusTC.mul.vocab.onmt 'fr': path_to_vocab/opusTC.mul.vocab.onmt 'hu': path_to_vocab/opusTC.mul.vocab.onmt 'it': path_to_vocab/opusTC.mul.vocab.onmt 'lt': path_to_vocab/opusTC.mul.vocab.onmt 'lv': path_to_vocab/opusTC.mul.vocab.onmt 'nl': path_to_vocab/opusTC.mul.vocab.onmt 'pl': path_to_vocab/opusTC.mul.vocab.onmt 'pt': path_to_vocab/opusTC.mul.vocab.onmt 'ro': path_to_vocab/opusTC.mul.vocab.onmt 'sk': path_to_vocab/opusTC.mul.vocab.onmt 'sl': path_to_vocab/opusTC.mul.vocab.onmt 'sv': path_to_vocab/opusTC.mul.vocab.onmt tgt_vocab: 'bg': path_to_vocab/opusTC.mul.vocab.onmt 'cs': path_to_vocab/opusTC.mul.vocab.onmt 'da': path_to_vocab/opusTC.mul.vocab.onmt 'de': path_to_vocab/opusTC.mul.vocab.onmt 'el': path_to_vocab/opusTC.mul.vocab.onmt 'en': path_to_vocab/opusTC.mul.vocab.onmt 'es': path_to_vocab/opusTC.mul.vocab.onmt 'et': path_to_vocab/opusTC.mul.vocab.onmt 'fi': path_to_vocab/opusTC.mul.vocab.onmt 'fr': path_to_vocab/opusTC.mul.vocab.onmt 'hu': path_to_vocab/opusTC.mul.vocab.onmt 'it': path_to_vocab/opusTC.mul.vocab.onmt 'lt': path_to_vocab/opusTC.mul.vocab.onmt 'lv': path_to_vocab/opusTC.mul.vocab.onmt 'nl': path_to_vocab/opusTC.mul.vocab.onmt 'pl': path_to_vocab/opusTC.mul.vocab.onmt 'pt': path_to_vocab/opusTC.mul.vocab.onmt 'ro': path_to_vocab/opusTC.mul.vocab.onmt 'sk': path_to_vocab/opusTC.mul.vocab.onmt 'sl': path_to_vocab/opusTC.mul.vocab.onmt 'sv': path_to_vocab/opusTC.mul.vocab.onmt overwrite: False tasks: # GPU 0:0 train_bg-en: src_tgt: bg-en enc_sharing_group: [bg] dec_sharing_group: [en] node_gpu: "0:0" path_src: path_to_europarl/bg-en/train.bg-en.bg.sp path_tgt: path_to_europarl/bg-en/train.bg-en.en.sp path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.en.sp transforms: [filtertoolong] train_bg-bg: src_tgt: bg-bg enc_sharing_group: [bg] dec_sharing_group: [bg] node_gpu: "0:0" path_src: path_to_europarl/bg-en/train.bg-en.bg.sp path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp transforms: [filtertoolong, denoising] train_en-bg: src_tgt: en-bg enc_sharing_group: [en] dec_sharing_group: [bg] node_gpu: "0:0" path_src: path_to_europarl/bg-en/train.bg-en.en.sp path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp path_valid_src: path_to_europarl/bg-en/valid.bg-en.en.sp path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp transforms: [filtertoolong] # GPU 0:1 train_cs-en: src_tgt: cs-en enc_sharing_group: [cs] dec_sharing_group: [en] node_gpu: "0:1" path_src: path_to_europarl/cs-en/train.cs-en.cs.sp path_tgt: path_to_europarl/cs-en/train.cs-en.en.sp path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.en.sp transforms: [filtertoolong] train_cs-cs: src_tgt: cs-cs enc_sharing_group: [cs] dec_sharing_group: [cs] node_gpu: "0:1" path_src: path_to_europarl/cs-en/train.cs-en.cs.sp path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp transforms: [filtertoolong, denoising] train_en-cs: src_tgt: en-cs enc_sharing_group: [en] dec_sharing_group: [cs] node_gpu: "0:1" path_src: path_to_europarl/cs-en/train.cs-en.en.sp path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp path_valid_src: path_to_europarl/cs-en/valid.cs-en.en.sp path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp transforms: [filtertoolong] # GPU 0:2 train_da-en: src_tgt: da-en enc_sharing_group: [da] dec_sharing_group: [en] node_gpu: "0:2" path_src: path_to_europarl/da-en/train.da-en.da.sp path_tgt: path_to_europarl/da-en/train.da-en.en.sp path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp path_valid_tgt: path_to_europarl/da-en/valid.da-en.en.sp transforms: [filtertoolong] train_da-da: src_tgt: da-da enc_sharing_group: [da] dec_sharing_group: [da] node_gpu: "0:2" path_src: path_to_europarl/da-en/train.da-en.da.sp path_tgt: path_to_europarl/da-en/train.da-en.da.sp path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp transforms: [filtertoolong, denoising] train_en-da: src_tgt: en-da enc_sharing_group: [en] dec_sharing_group: [da] node_gpu: "0:2" path_src: path_to_europarl/da-en/train.da-en.en.sp path_tgt: path_to_europarl/da-en/train.da-en.da.sp path_valid_src: path_to_europarl/da-en/valid.da-en.en.sp path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp transforms: [filtertoolong] # GPU 0:3 train_de-en: src_tgt: de-en enc_sharing_group: [de] dec_sharing_group: [en] node_gpu: "0:3" path_src: path_to_europarl/de-en/train.de-en.de.sp path_tgt: path_to_europarl/de-en/train.de-en.en.sp path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp path_valid_tgt: path_to_europarl/de-en/valid.de-en.en.sp transforms: [filtertoolong] train_de-de: src_tgt: de-de enc_sharing_group: [de] dec_sharing_group: [de] node_gpu: "0:3" path_src: path_to_europarl/de-en/train.de-en.de.sp path_tgt: path_to_europarl/de-en/train.de-en.de.sp path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp transforms: [filtertoolong, denoising] train_en-de: src_tgt: en-de enc_sharing_group: [en] dec_sharing_group: [de] node_gpu: "0:3" path_src: path_to_europarl/de-en/train.de-en.en.sp path_tgt: path_to_europarl/de-en/train.de-en.de.sp path_valid_src: path_to_europarl/de-en/valid.de-en.en.sp path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp transforms: [filtertoolong] # GPU 1:0 train_el-en: src_tgt: el-en enc_sharing_group: [el] dec_sharing_group: [en] node_gpu: "1:0" path_src: path_to_europarl/el-en/train.el-en.el.sp path_tgt: path_to_europarl/el-en/train.el-en.en.sp path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp path_valid_tgt: path_to_europarl/el-en/valid.el-en.en.sp transforms: [filtertoolong] train_el-el: src_tgt: el-el enc_sharing_group: [el] dec_sharing_group: [el] node_gpu: "1:0" path_src: path_to_europarl/el-en/train.el-en.el.sp path_tgt: path_to_europarl/el-en/train.el-en.el.sp path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp transforms: [filtertoolong, denoising] train_en-el: src_tgt: en-el enc_sharing_group: [en] dec_sharing_group: [el] node_gpu: "1:0" path_src: path_to_europarl/el-en/train.el-en.en.sp path_tgt: path_to_europarl/el-en/train.el-en.el.sp path_valid_src: path_to_europarl/el-en/valid.el-en.en.sp path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp transforms: [filtertoolong] # GPU 1:1 train_es-en: src_tgt: es-en enc_sharing_group: [es] dec_sharing_group: [en] node_gpu: "1:1" path_src: path_to_europarl/es-en/train.es-en.es.sp path_tgt: path_to_europarl/es-en/train.es-en.en.sp path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp path_valid_tgt: path_to_europarl/es-en/valid.es-en.en.sp transforms: [filtertoolong] train_es-es: src_tgt: es-es enc_sharing_group: [es] dec_sharing_group: [es] node_gpu: "1:1" path_src: path_to_europarl/es-en/train.es-en.es.sp path_tgt: path_to_europarl/es-en/train.es-en.es.sp path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp transforms: [filtertoolong, denoising] train_en-es: src_tgt: en-es enc_sharing_group: [en] dec_sharing_group: [es] node_gpu: "1:1" path_src: path_to_europarl/es-en/train.es-en.en.sp path_tgt: path_to_europarl/es-en/train.es-en.es.sp path_valid_src: path_to_europarl/es-en/valid.es-en.en.sp path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp transforms: [filtertoolong] # GPU 1:2 train_et-en: src_tgt: et-en enc_sharing_group: [et] dec_sharing_group: [en] node_gpu: "1:2" path_src: path_to_europarl/et-en/train.et-en.et.sp path_tgt: path_to_europarl/et-en/train.et-en.en.sp path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp path_valid_tgt: path_to_europarl/et-en/valid.et-en.en.sp transforms: [filtertoolong] train_et-et: src_tgt: et-et enc_sharing_group: [et] dec_sharing_group: [et] node_gpu: "1:2" path_src: path_to_europarl/et-en/train.et-en.et.sp path_tgt: path_to_europarl/et-en/train.et-en.et.sp path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp transforms: [filtertoolong, denoising] train_en-et: src_tgt: en-et enc_sharing_group: [en] dec_sharing_group: [et] node_gpu: "1:2" path_src: path_to_europarl/et-en/train.et-en.en.sp path_tgt: path_to_europarl/et-en/train.et-en.et.sp path_valid_src: path_to_europarl/et-en/valid.et-en.en.sp path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp transforms: [filtertoolong] # GPU 1:3 train_fi-en: src_tgt: fi-en enc_sharing_group: [fi] dec_sharing_group: [en] node_gpu: "1:3" path_src: path_to_europarl/fi-en/train.fi-en.fi.sp path_tgt: path_to_europarl/fi-en/train.fi-en.en.sp path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.en.sp transforms: [filtertoolong] train_fi-fi: src_tgt: fi-fi enc_sharing_group: [fi] dec_sharing_group: [fi] node_gpu: "1:3" path_src: path_to_europarl/fi-en/train.fi-en.fi.sp path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp transforms: [filtertoolong, denoising] train_en-fi: src_tgt: en-fi enc_sharing_group: [en] dec_sharing_group: [fi] node_gpu: "1:3" path_src: path_to_europarl/fi-en/train.fi-en.en.sp path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp path_valid_src: path_to_europarl/fi-en/valid.fi-en.en.sp path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp transforms: [filtertoolong] # GPU 2:0 train_fr-en: src_tgt: fr-en enc_sharing_group: [fr] dec_sharing_group: [en] node_gpu: "2:0" path_src: path_to_europarl/fr-en/train.fr-en.fr.sp path_tgt: path_to_europarl/fr-en/train.fr-en.en.sp path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.en.sp transforms: [filtertoolong] train_fr-fr: src_tgt: fr-fr enc_sharing_group: [fr] dec_sharing_group: [fr] node_gpu: "2:0" path_src: path_to_europarl/fr-en/train.fr-en.fr.sp path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp transforms: [filtertoolong, denoising] train_en-fr: src_tgt: en-fr enc_sharing_group: [en] dec_sharing_group: [fr] node_gpu: "2:0" path_src: path_to_europarl/fr-en/train.fr-en.en.sp path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp path_valid_src: path_to_europarl/fr-en/valid.fr-en.en.sp path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp transforms: [filtertoolong] # GPU 2:1 train_hu-en: src_tgt: hu-en enc_sharing_group: [hu] dec_sharing_group: [en] node_gpu: "2:1" path_src: path_to_europarl/hu-en/train.hu-en.hu.sp path_tgt: path_to_europarl/hu-en/train.hu-en.en.sp path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.en.sp transforms: [filtertoolong] train_hu-hu: src_tgt: hu-hu enc_sharing_group: [hu] dec_sharing_group: [hu] node_gpu: "2:1" path_src: path_to_europarl/hu-en/train.hu-en.hu.sp path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp transforms: [filtertoolong, denoising] train_en-hu: src_tgt: en-hu enc_sharing_group: [en] dec_sharing_group: [hu] node_gpu: "2:1" path_src: path_to_europarl/hu-en/train.hu-en.en.sp path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp path_valid_src: path_to_europarl/hu-en/valid.hu-en.en.sp path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp transforms: [filtertoolong] # GPU 2:2 train_it-en: src_tgt: it-en enc_sharing_group: [it] dec_sharing_group: [en] node_gpu: "2:2" path_src: path_to_europarl/it-en/train.it-en.it.sp path_tgt: path_to_europarl/it-en/train.it-en.en.sp path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp path_valid_tgt: path_to_europarl/it-en/valid.it-en.en.sp transforms: [filtertoolong] train_it-it: src_tgt: it-it enc_sharing_group: [it] dec_sharing_group: [it] node_gpu: "2:2" path_src: path_to_europarl/it-en/train.it-en.it.sp path_tgt: path_to_europarl/it-en/train.it-en.it.sp path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp transforms: [filtertoolong, denoising] train_en-it: src_tgt: en-it enc_sharing_group: [en] dec_sharing_group: [it] node_gpu: "2:2" path_src: path_to_europarl/it-en/train.it-en.en.sp path_tgt: path_to_europarl/it-en/train.it-en.it.sp path_valid_src: path_to_europarl/it-en/valid.it-en.en.sp path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp transforms: [filtertoolong] # GPU 2:3 train_lt-en: src_tgt: lt-en enc_sharing_group: [lt] dec_sharing_group: [en] node_gpu: "2:3" path_src: path_to_europarl/lt-en/train.lt-en.lt.sp path_tgt: path_to_europarl/lt-en/train.lt-en.en.sp path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.en.sp transforms: [filtertoolong] train_lt-lt: src_tgt: lt-lt enc_sharing_group: [lt] dec_sharing_group: [lt] node_gpu: "2:3" path_src: path_to_europarl/lt-en/train.lt-en.lt.sp path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp transforms: [filtertoolong, denoising] train_en-lt: src_tgt: en-lt enc_sharing_group: [en] dec_sharing_group: [lt] node_gpu: "2:3" path_src: path_to_europarl/lt-en/train.lt-en.en.sp path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp path_valid_src: path_to_europarl/lt-en/valid.lt-en.en.sp path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp transforms: [filtertoolong] # GPU 3:0 train_lv-en: src_tgt: lv-en enc_sharing_group: [lv] dec_sharing_group: [en] node_gpu: "3:0" path_src: path_to_europarl/lv-en/train.lv-en.lv.sp path_tgt: path_to_europarl/lv-en/train.lv-en.en.sp path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.en.sp transforms: [filtertoolong] train_lv-lv: src_tgt: lv-lv enc_sharing_group: [lv] dec_sharing_group: [lv] node_gpu: "3:0" path_src: path_to_europarl/lv-en/train.lv-en.lv.sp path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp transforms: [filtertoolong, denoising] train_en-lv: src_tgt: en-lv enc_sharing_group: [en] dec_sharing_group: [lv] node_gpu: "3:0" path_src: path_to_europarl/lv-en/train.lv-en.en.sp path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp path_valid_src: path_to_europarl/lv-en/valid.lv-en.en.sp path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp transforms: [filtertoolong] # GPU 3:1 train_nl-en: src_tgt: nl-en enc_sharing_group: [nl] dec_sharing_group: [en] node_gpu: "3:1" path_src: path_to_europarl/nl-en/train.nl-en.nl.sp path_tgt: path_to_europarl/nl-en/train.nl-en.en.sp path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.en.sp transforms: [filtertoolong] train_nl-nl: src_tgt: nl-nl enc_sharing_group: [nl] dec_sharing_group: [nl] node_gpu: "3:1" path_src: path_to_europarl/nl-en/train.nl-en.nl.sp path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp transforms: [filtertoolong, denoising] train_en-nl: src_tgt: en-nl enc_sharing_group: [en] dec_sharing_group: [nl] node_gpu: "3:1" path_src: path_to_europarl/nl-en/train.nl-en.en.sp path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp path_valid_src: path_to_europarl/nl-en/valid.nl-en.en.sp path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp transforms: [filtertoolong] # GPU 3:2 train_pl-en: src_tgt: pl-en enc_sharing_group: [pl] dec_sharing_group: [en] node_gpu: "3:2" path_src: path_to_europarl/pl-en/train.pl-en.pl.sp path_tgt: path_to_europarl/pl-en/train.pl-en.en.sp path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.en.sp transforms: [filtertoolong] train_pl-pl: src_tgt: pl-pl enc_sharing_group: [pl] dec_sharing_group: [pl] node_gpu: "3:2" path_src: path_to_europarl/pl-en/train.pl-en.pl.sp path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp transforms: [filtertoolong, denoising] train_en-pl: src_tgt: en-pl enc_sharing_group: [en] dec_sharing_group: [pl] node_gpu: "3:2" path_src: path_to_europarl/pl-en/train.pl-en.en.sp path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp path_valid_src: path_to_europarl/pl-en/valid.pl-en.en.sp path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp transforms: [filtertoolong] # GPU 3:3 train_pt-en: src_tgt: pt-en enc_sharing_group: [pt] dec_sharing_group: [en] node_gpu: "3:3" path_src: path_to_europarl/pt-en/train.pt-en.pt.sp path_tgt: path_to_europarl/pt-en/train.pt-en.en.sp path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.en.sp transforms: [filtertoolong] train_pt-pt: src_tgt: pt-pt enc_sharing_group: [pt] dec_sharing_group: [pt] node_gpu: "3:3" path_src: path_to_europarl/pt-en/train.pt-en.pt.sp path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp transforms: [filtertoolong, denoising] train_en-pt: src_tgt: en-pt enc_sharing_group: [en] dec_sharing_group: [pt] node_gpu: "3:3" path_src: path_to_europarl/pt-en/train.pt-en.en.sp path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp path_valid_src: path_to_europarl/pt-en/valid.pt-en.en.sp path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp transforms: [filtertoolong] # GPU 4:0 train_ro-en: src_tgt: ro-en enc_sharing_group: [ro] dec_sharing_group: [en] node_gpu: "4:0" path_src: path_to_europarl/ro-en/train.ro-en.ro.sp path_tgt: path_to_europarl/ro-en/train.ro-en.en.sp path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.en.sp transforms: [filtertoolong] train_ro-ro: src_tgt: ro-ro enc_sharing_group: [ro] dec_sharing_group: [ro] node_gpu: "4:0" path_src: path_to_europarl/ro-en/train.ro-en.ro.sp path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp transforms: [filtertoolong, denoising] train_en-ro: src_tgt: en-ro enc_sharing_group: [en] dec_sharing_group: [ro] node_gpu: "4:0" path_src: path_to_europarl/ro-en/train.ro-en.en.sp path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp path_valid_src: path_to_europarl/ro-en/valid.ro-en.en.sp path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp transforms: [filtertoolong] # GPU 4:1 train_sk-en: src_tgt: sk-en enc_sharing_group: [sk] dec_sharing_group: [en] node_gpu: "4:1" path_src: path_to_europarl/sk-en/train.sk-en.sk.sp path_tgt: path_to_europarl/sk-en/train.sk-en.en.sp path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.en.sp transforms: [filtertoolong] train_sk-sk: src_tgt: sk-sk enc_sharing_group: [sk] dec_sharing_group: [sk] node_gpu: "4:1" path_src: path_to_europarl/sk-en/train.sk-en.sk.sp path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp transforms: [filtertoolong, denoising] train_en-sk: src_tgt: en-sk enc_sharing_group: [en] dec_sharing_group: [sk] node_gpu: "4:1" path_src: path_to_europarl/sk-en/train.sk-en.en.sp path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp path_valid_src: path_to_europarl/sk-en/valid.sk-en.en.sp path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp transforms: [filtertoolong] # GPU 4:2 train_sl-en: src_tgt: sl-en enc_sharing_group: [sl] dec_sharing_group: [en] node_gpu: "4:2" path_src: path_to_europarl/sl-en/train.sl-en.sl.sp path_tgt: path_to_europarl/sl-en/train.sl-en.en.sp path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.en.sp transforms: [filtertoolong] train_sl-sl: src_tgt: sl-sl enc_sharing_group: [sl] dec_sharing_group: [sl] node_gpu: "4:2" path_src: path_to_europarl/sl-en/train.sl-en.sl.sp path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp transforms: [filtertoolong, denoising] train_en-sl: src_tgt: en-sl enc_sharing_group: [en] dec_sharing_group: [sl] node_gpu: "4:2" path_src: path_to_europarl/sl-en/train.sl-en.en.sp path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp path_valid_src: path_to_europarl/sl-en/valid.sl-en.en.sp path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp transforms: [filtertoolong] # GPU 4:3 train_sv-en: src_tgt: sv-en enc_sharing_group: [sv] dec_sharing_group: [en] node_gpu: "4:3" path_src: path_to_europarl/sv-en/train.sv-en.sv.sp path_tgt: path_to_europarl/sv-en/train.sv-en.en.sp path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.en.sp transforms: [filtertoolong] train_sv-sv: src_tgt: sv-sv enc_sharing_group: [sv] dec_sharing_group: [sv] node_gpu: "4:3" path_src: path_to_europarl/sv-en/train.sv-en.sv.sp path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp transforms: [filtertoolong, denoising] train_en-sv: src_tgt: en-sv enc_sharing_group: [en] dec_sharing_group: [sv] node_gpu: "4:3" path_src: path_to_europarl/sv-en/train.sv-en.en.sp path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp path_valid_src: path_to_europarl/sv-en/valid.sv-en.en.sp path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp transforms: [filtertoolong] ### Transform related opts: #### Filter src_seq_length: 200 tgt_seq_length: 200 #### Bart src_subword_type: sentencepiece tgt_subword_type: sentencepiece mask_ratio: 0.2 replace_length: 1 # silently ignore empty lines in the data skip_empty_level: silent batch_size: 4096 batch_type: tokens normalization: tokens valid_batch_size: 4096 max_generator_batches: 2 src_vocab_size: 100000 tgt_vocab_size: 100000 encoder_type: transformer decoder_type: transformer model_dim: 512 transformer_ff: 2048 heads: 8 enc_layers: [6] dec_layers: [6] dropout: 0.1 label_smoothing: 0.1 param_init: 0.0 param_init_glorot: true position_encoding: true valid_steps: 10000 warmup_steps: 10000 report_every: 100 save_checkpoint_steps: 50000 keep_checkpoint: -1 accum_count: 1 optim: adafactor decay_method: none learning_rate: 3.0 max_grad_norm: 0.0 seed: 3435 model_type: text save_all_gpus: false n_nodes: 5 world_size: 20 gpu_ranks: [0, 1, 2, 3] early_stopping: 5 early_stopping_criteria: accuracy ```
### Data Configuration: - Vocabularies for the source and target languages is need to be specified. In the example, we used a shared vocabulary. - Specifies options related to data transformation, including filtering and BART-specific denoising parameters. ### Task Configuration: - Translation tasks are defined in this section, such as `bg-en` for Bulgarian to English translation. - Each task includes details such as source and target file paths, sharing groups, GPU assignments, and data transforms. - For GPU assignments, the task defines the ranks of nodes and GPUs. For example, `4:0` indicates the first GPU on the fifth node. ### Training Configuration: - Batch size, normalization, and other training parameters are set. - Model parameters such as dimensions, transformer layers, dropout, label smoothing, and more are specified. - The training uses the Adafactor optimizer with a learning rate of 3.0 and no gradient clipping. - Early stopping is enabled with a criterion of accuracy and a patience of 5 steps. - The training process is distributed across 4 GPUs (`world_size: 4`, `gpu_ranks: [0, 1, 2, 3]`) on a single node (`node_rank: 0`) for single node job. For the 5-node job, job is distributed across 20 GPUs. ## Step 4: Train your MAMMOTH model Finally, we can start the training process now. Here we provide an example script that sets several environment variables, creates necessary directories, and then runs a training job for a MAMMOTH machine translation model. ```bash export PYTHONUSERBASE=/path_to_your_env/mammoth/ # pointer to codebase export MAMMOTH=/path_to_codebase/mammoth # pointer to config file export CONFIG_DIR=path_to_europarl/config # pointer to slurm multinode wrapper. export SCRIPT_DIR=path_to_europarl/scripts/ # info for model and log saving export SAVE_DIR=your_path/models/europarl export LOG_DIR=${SAVE_DIR}/logs export EXP_ID=example-1-node mkdir -p ${SAVE_DIR}/{logs,models} srun ${SCRIPT_DIR}/wrapper.sh -u ${MAMMOTH}/train.py \ -config ${CONFIG_DIR}/europarl-1node-4gpu.yml \ -save_model ${SAVE_DIR}/models/${EXP_ID} \ -master_port 9973 \ -tensorboard -tensorboard_log_dir ${LOG_DIR}/${EXP_ID} ``` ### Environment Variable Setup: - `PYTHONUSERBASE`: Specifies the base directory for Python user-specific packages. You can also specify the python environment in your favorite way and check the installation guide for more information. - `MAMMOTH`: Points to the codebase directory for a project named "mammoth." - `CONFIG_DIR`: Points to a directory containing configuration files. - `SCRIPT_DIR`: Points to a directory containing Slurm multinode wrapper scripts. - `SAVE_DIR`: Specifies the base directory for saving model-related files. - `LOG_DIR`: Specifies the directory for saving logs related to the model training. - `EXP_ID`: Represents an experiment identifier, set to "example-1-node." ### Directory Creation: - Creates the "logs" and "models" directories inside `SAVE_DIR` if they do not already exist. You will find the logs and saved models there. ### Training Job Submission: - We utilize Slurm for resource allocation. `srun`: Initiates a Slurm job. - `${SCRIPT_DIR}/wrapper.sh`: Calls a wrapper script for managing Slurm settings, monitoring GPU usage, and etc. An example of wrapper script can be: ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-smi dmon -s mu -d 5 -o TD > "${LOG_DIR}/gpu_load-${EXP_ID}-${PPID}.log" & echo python -u "$@" --node_rank $SLURM_NODEID python -u "$@" --node_rank $SLURM_NODEID ``` - `-u ${MAMMOTH}/train.py`: Specifies the Python script for training, located in the "mammoth" codebase. - `-config ${CONFIG_DIR}/europarl-1node-4gpu.yml`: Specifies the configuration file for the training job. - `-save_model ${SAVE_DIR}/models/${EXP_ID}`: Specifies the directory to save the trained model. - `-master_port 9973`: Specifies the master port for communication. - `-tensorboard -tensorboard_log_dir ${LOG_DIR}/${EXP_ID}`: Enables TensorBoard logging and specifies the directory for TensorBoard logs. Hooray! Take a moment to celebrate the progress you've made. Wait for hours and the model training should be completed soon.