Training MAMMOTH 101

This example uses the Europarl parallel corpus - a multilingual resource extracted from European Parliament proceedings, containing text in 21 European languages. If you use the data in your research, please cite the paper by Philipp Koehn, “Europarl: A Parallel Corpus for Statistical Machine Translation,” presented at the MT Summit 2005. The tokenization is done with sentencepiece.

Step 0: Download the data and SentencePiece model

Download the Release v7 - a further expanded and improved version of the Europarl corpus on 15 May 2012 - from the original website or download the processed data by us:

wget https://mammoth101.a3s.fi/europarl.tar.gz

We use a SentencePiece model trained on OPUS Tatoeba Challenge data with 64k vocabulary size. Download the SentencePiece model and the vocabulary:

# Download the SentencePiece model
wget https://mammoth101.a3s.fi/opusTC.mul.64k.spm
# Download the vocabulary
wget https://mammoth101.a3s.fi/opusTC.mul.vocab.onmt

Step 1: Prepare the data

Then, read parallel text data, processes it, and generate output files for training and validation sets. Here’s a high-level summary of the main processing steps. For each language in ‘langs,’

  • read parallel data files.

  • clean the data by removing empty lines.

  • shuffle the data randomly.

  • tokenizes the text using SentencePiece and writes the tokenized data to separate output files for training and validation sets.

We use a positional argument ‘lang’ that can accept one or more values, for specifying the languages (e.g., bg and cs as used in Europarl) to process.

You’re free to skip this step if you directly download the processed data. For details, see this page.

Step 3: Configuration

We can define a configuration for the model, sharing scheme, and training arguments. You can choose to manually write your config in a yaml file, or use our automatic config generation tool. Here, we provide two configuration examples for training a dummy transformer model in single-node and multi-node settings.

Single-node configuration
src_vocab:
  'bg': path_to_vocab/opusTC.mul.vocab.onmt
  'cs': path_to_vocab/opusTC.mul.vocab.onmt
  'da': path_to_vocab/opusTC.mul.vocab.onmt
  'de': path_to_vocab/opusTC.mul.vocab.onmt
  'el': path_to_vocab/opusTC.mul.vocab.onmt
  'en': path_to_vocab/opusTC.mul.vocab.onmt
  'es': path_to_vocab/opusTC.mul.vocab.onmt
  'et': path_to_vocab/opusTC.mul.vocab.onmt
  'fi': path_to_vocab/opusTC.mul.vocab.onmt
  'fr': path_to_vocab/opusTC.mul.vocab.onmt
  'hu': path_to_vocab/opusTC.mul.vocab.onmt
  'it': path_to_vocab/opusTC.mul.vocab.onmt
  'lt': path_to_vocab/opusTC.mul.vocab.onmt
  'lv': path_to_vocab/opusTC.mul.vocab.onmt
  'nl': path_to_vocab/opusTC.mul.vocab.onmt
  'pl': path_to_vocab/opusTC.mul.vocab.onmt
  'pt': path_to_vocab/opusTC.mul.vocab.onmt
  'ro': path_to_vocab/opusTC.mul.vocab.onmt
  'sk': path_to_vocab/opusTC.mul.vocab.onmt
  'sl': path_to_vocab/opusTC.mul.vocab.onmt
  'sv': path_to_vocab/opusTC.mul.vocab.onmt
tgt_vocab:
  'bg': path_to_vocab/opusTC.mul.vocab.onmt
  'cs': path_to_vocab/opusTC.mul.vocab.onmt
  'da': path_to_vocab/opusTC.mul.vocab.onmt
  'de': path_to_vocab/opusTC.mul.vocab.onmt
  'el': path_to_vocab/opusTC.mul.vocab.onmt
  'en': path_to_vocab/opusTC.mul.vocab.onmt
  'es': path_to_vocab/opusTC.mul.vocab.onmt
  'et': path_to_vocab/opusTC.mul.vocab.onmt
  'fi': path_to_vocab/opusTC.mul.vocab.onmt
  'fr': path_to_vocab/opusTC.mul.vocab.onmt
  'hu': path_to_vocab/opusTC.mul.vocab.onmt
  'it': path_to_vocab/opusTC.mul.vocab.onmt
  'lt': path_to_vocab/opusTC.mul.vocab.onmt
  'lv': path_to_vocab/opusTC.mul.vocab.onmt
  'nl': path_to_vocab/opusTC.mul.vocab.onmt
  'pl': path_to_vocab/opusTC.mul.vocab.onmt
  'pt': path_to_vocab/opusTC.mul.vocab.onmt
  'ro': path_to_vocab/opusTC.mul.vocab.onmt
  'sk': path_to_vocab/opusTC.mul.vocab.onmt
  'sl': path_to_vocab/opusTC.mul.vocab.onmt
  'sv': path_to_vocab/opusTC.mul.vocab.onmt

overwrite: False
tasks:
  # GPU 0:0
  train_bg-en:
    src_tgt: bg-en
    enc_sharing_group: [bg]
    dec_sharing_group: [en]
    node_gpu: 0:0
    path_src: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_tgt: path_to_europarl/bg-en/train.bg-en.en.sp
    path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp
    path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.en.sp
    transforms: [filtertoolong]
  train_bg-bg:
    src_tgt: bg-bg
    enc_sharing_group: [bg]
    dec_sharing_group: [bg]
    node_gpu: 0:0
    path_src: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp
    path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp
    transforms: [filtertoolong, denoising]
  train_en-bg:
    src_tgt: en-bg
    enc_sharing_group: [en]
    dec_sharing_group: [bg]
    node_gpu: 0:0
    path_src: path_to_europarl/bg-en/train.bg-en.en.sp
    path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_valid_src: path_to_europarl/bg-en/valid.bg-en.en.sp
    path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp
    transforms: [filtertoolong]
  # GPU 0:1
  train_cs-en:
    src_tgt: cs-en
    enc_sharing_group: [cs]
    dec_sharing_group: [en]
    node_gpu: 0:1
    path_src: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_tgt: path_to_europarl/cs-en/train.cs-en.en.sp
    path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp
    path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.en.sp
    transforms: [filtertoolong]
  train_cs-cs:
    src_tgt: cs-cs
    enc_sharing_group: [cs]
    dec_sharing_group: [cs]
    node_gpu: 0:1
    path_src: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp
    path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp
    transforms: [filtertoolong, denoising]
  train_en-cs:
    src_tgt: en-cs
    enc_sharing_group: [en]
    dec_sharing_group: [cs]
    node_gpu: 0:1
    path_src: path_to_europarl/cs-en/train.cs-en.en.sp
    path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_valid_src: path_to_europarl/cs-en/valid.cs-en.en.sp
    path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp
    transforms: [filtertoolong]
  # GPU 0:2
  train_da-en:
    src_tgt: da-en
    enc_sharing_group: [da]
    dec_sharing_group: [en]
    node_gpu: 0:2
    path_src: path_to_europarl/da-en/train.da-en.da.sp
    path_tgt: path_to_europarl/da-en/train.da-en.en.sp
    path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp
    path_valid_tgt: path_to_europarl/da-en/valid.da-en.en.sp
    transforms: [filtertoolong]
  train_da-da:
    src_tgt: da-da
    enc_sharing_group: [da]
    dec_sharing_group: [da]
    node_gpu: 0:2
    path_src: path_to_europarl/da-en/train.da-en.da.sp
    path_tgt: path_to_europarl/da-en/train.da-en.da.sp
    path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp
    path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp
    transforms: [filtertoolong, denoising]
  train_en-da:
    src_tgt: en-da
    enc_sharing_group: [en]
    dec_sharing_group: [da]
    node_gpu: 0:2
    path_src: path_to_europarl/da-en/train.da-en.en.sp
    path_tgt: path_to_europarl/da-en/train.da-en.da.sp
    path_valid_src: path_to_europarl/da-en/valid.da-en.en.sp
    path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp
    transforms: [filtertoolong]
  # GPU 0:3
  train_de-en:
    src_tgt: de-en
    enc_sharing_group: [de]
    dec_sharing_group: [en]
    node_gpu: 0:3
    path_src: path_to_europarl/de-en/train.de-en.de.sp
    path_tgt: path_to_europarl/de-en/train.de-en.en.sp
    path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp
    path_valid_tgt: path_to_europarl/de-en/valid.de-en.en.sp
    transforms: [filtertoolong]
  train_de-de:
    src_tgt: de-de
    enc_sharing_group: [de]
    dec_sharing_group: [de]
    node_gpu: 0:3
    path_src: path_to_europarl/de-en/train.de-en.de.sp
    path_tgt: path_to_europarl/de-en/train.de-en.de.sp
    path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp
    path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp
    transforms: [filtertoolong, denoising]
  train_en-de:
    src_tgt: en-de
    enc_sharing_group: [en]
    dec_sharing_group: [de]
    node_gpu: 0:3
    path_src: path_to_europarl/de-en/train.de-en.en.sp
    path_tgt: path_to_europarl/de-en/train.de-en.de.sp
    path_valid_src: path_to_europarl/de-en/valid.de-en.en.sp
    path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp
    transforms: [filtertoolong]
  # GPU 0:0
  train_el-en:
    src_tgt: el-en
    enc_sharing_group: [el]
    dec_sharing_group: [en]
    node_gpu: 0:0
    path_src: path_to_europarl/el-en/train.el-en.el.sp
    path_tgt: path_to_europarl/el-en/train.el-en.en.sp
    path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp
    path_valid_tgt: path_to_europarl/el-en/valid.el-en.en.sp
    transforms: [filtertoolong]
  train_el-el:
    src_tgt: el-el
    enc_sharing_group: [el]
    dec_sharing_group: [el]
    node_gpu: 0:0
    path_src: path_to_europarl/el-en/train.el-en.el.sp
    path_tgt: path_to_europarl/el-en/train.el-en.el.sp
    path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp
    path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp
    transforms: [filtertoolong, denoising]
  train_en-el:
    src_tgt: en-el
    enc_sharing_group: [en]
    dec_sharing_group: [el]
    node_gpu: 0:0
    path_src: path_to_europarl/el-en/train.el-en.en.sp
    path_tgt: path_to_europarl/el-en/train.el-en.el.sp
    path_valid_src: path_to_europarl/el-en/valid.el-en.en.sp
    path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp
    transforms: [filtertoolong]
  # GPU 0:1
  train_es-en:
    src_tgt: es-en
    enc_sharing_group: [es]
    dec_sharing_group: [en]
    node_gpu: 0:1
    path_src: path_to_europarl/es-en/train.es-en.es.sp
    path_tgt: path_to_europarl/es-en/train.es-en.en.sp
    path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp
    path_valid_tgt: path_to_europarl/es-en/valid.es-en.en.sp
    transforms: [filtertoolong]
  train_es-es:
    src_tgt: es-es
    enc_sharing_group: [es]
    dec_sharing_group: [es]
    node_gpu: 0:1
    path_src: path_to_europarl/es-en/train.es-en.es.sp
    path_tgt: path_to_europarl/es-en/train.es-en.es.sp
    path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp
    path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp
    transforms: [filtertoolong, denoising]
  train_en-es:
    src_tgt: en-es
    enc_sharing_group: [en]
    dec_sharing_group: [es]
    node_gpu: 0:1
    path_src: path_to_europarl/es-en/train.es-en.en.sp
    path_tgt: path_to_europarl/es-en/train.es-en.es.sp
    path_valid_src: path_to_europarl/es-en/valid.es-en.en.sp
    path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp
    transforms: [filtertoolong]
  # GPU 0:2
  train_et-en:
    src_tgt: et-en
    enc_sharing_group: [et]
    dec_sharing_group: [en]
    node_gpu: 0:2
    path_src: path_to_europarl/et-en/train.et-en.et.sp
    path_tgt: path_to_europarl/et-en/train.et-en.en.sp
    path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp
    path_valid_tgt: path_to_europarl/et-en/valid.et-en.en.sp
    transforms: [filtertoolong]
  train_et-et:
    src_tgt: et-et
    enc_sharing_group: [et]
    dec_sharing_group: [et]
    node_gpu: 0:2
    path_src: path_to_europarl/et-en/train.et-en.et.sp
    path_tgt: path_to_europarl/et-en/train.et-en.et.sp
    path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp
    path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp
    transforms: [filtertoolong, denoising]
  train_en-et:
    src_tgt: en-et
    enc_sharing_group: [en]
    dec_sharing_group: [et]
    node_gpu: 0:2
    path_src: path_to_europarl/et-en/train.et-en.en.sp
    path_tgt: path_to_europarl/et-en/train.et-en.et.sp
    path_valid_src: path_to_europarl/et-en/valid.et-en.en.sp
    path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp
    transforms: [filtertoolong]
  # GPU 0:3
  train_fi-en:
    src_tgt: fi-en
    enc_sharing_group: [fi]
    dec_sharing_group: [en]
    node_gpu: 0:3
    path_src: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_tgt: path_to_europarl/fi-en/train.fi-en.en.sp
    path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp
    path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.en.sp
    transforms: [filtertoolong]
  train_fi-fi:
    src_tgt: fi-fi
    enc_sharing_group: [fi]
    dec_sharing_group: [fi]
    node_gpu: 0:3
    path_src: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp
    path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp
    transforms: [filtertoolong, denoising]
  train_en-fi:
    src_tgt: en-fi
    enc_sharing_group: [en]
    dec_sharing_group: [fi]
    node_gpu: 0:3
    path_src: path_to_europarl/fi-en/train.fi-en.en.sp
    path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_valid_src: path_to_europarl/fi-en/valid.fi-en.en.sp
    path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp
    transforms: [filtertoolong]
  # GPU 0:0
  train_fr-en:
    src_tgt: fr-en
    enc_sharing_group: [fr]
    dec_sharing_group: [en]
    node_gpu: 0:0
    path_src: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_tgt: path_to_europarl/fr-en/train.fr-en.en.sp
    path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp
    path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.en.sp
    transforms: [filtertoolong]
  train_fr-fr:
    src_tgt: fr-fr
    enc_sharing_group: [fr]
    dec_sharing_group: [fr]
    node_gpu: 0:0
    path_src: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp
    path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp
    transforms: [filtertoolong, denoising]
  train_en-fr:
    src_tgt: en-fr
    enc_sharing_group: [en]
    dec_sharing_group: [fr]
    node_gpu: 0:0
    path_src: path_to_europarl/fr-en/train.fr-en.en.sp
    path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_valid_src: path_to_europarl/fr-en/valid.fr-en.en.sp
    path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp
    transforms: [filtertoolong]  
  # GPU 0:1
  train_hu-en:
    src_tgt: hu-en
    enc_sharing_group: [hu]
    dec_sharing_group: [en]
    node_gpu: 0:1
    path_src: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_tgt: path_to_europarl/hu-en/train.hu-en.en.sp
    path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp
    path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.en.sp
    transforms: [filtertoolong]
  train_hu-hu:
    src_tgt: hu-hu
    enc_sharing_group: [hu]
    dec_sharing_group: [hu]
    node_gpu: 0:1
    path_src: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp
    path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp
    transforms: [filtertoolong, denoising]
  train_en-hu:
    src_tgt: en-hu
    enc_sharing_group: [en]
    dec_sharing_group: [hu]
    node_gpu: 0:1
    path_src: path_to_europarl/hu-en/train.hu-en.en.sp
    path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_valid_src: path_to_europarl/hu-en/valid.hu-en.en.sp
    path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp
    transforms: [filtertoolong]
  # GPU 0:2
  train_it-en:
    src_tgt: it-en
    enc_sharing_group: [it]
    dec_sharing_group: [en]
    node_gpu: 0:2
    path_src: path_to_europarl/it-en/train.it-en.it.sp
    path_tgt: path_to_europarl/it-en/train.it-en.en.sp
    path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp
    path_valid_tgt: path_to_europarl/it-en/valid.it-en.en.sp
    transforms: [filtertoolong]
  train_it-it:
    src_tgt: it-it
    enc_sharing_group: [it]
    dec_sharing_group: [it]
    node_gpu: 0:2
    path_src: path_to_europarl/it-en/train.it-en.it.sp
    path_tgt: path_to_europarl/it-en/train.it-en.it.sp
    path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp
    path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp
    transforms: [filtertoolong, denoising]
  train_en-it:
    src_tgt: en-it
    enc_sharing_group: [en]
    dec_sharing_group: [it]
    node_gpu: 0:2
    path_src: path_to_europarl/it-en/train.it-en.en.sp
    path_tgt: path_to_europarl/it-en/train.it-en.it.sp
    path_valid_src: path_to_europarl/it-en/valid.it-en.en.sp
    path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp
    transforms: [filtertoolong]
  # GPU 0:3
  train_lt-en:
    src_tgt: lt-en
    enc_sharing_group: [lt]
    dec_sharing_group: [en]
    node_gpu: 0:3
    path_src: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_tgt: path_to_europarl/lt-en/train.lt-en.en.sp
    path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp
    path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.en.sp
    transforms: [filtertoolong]
  train_lt-lt:
    src_tgt: lt-lt
    enc_sharing_group: [lt]
    dec_sharing_group: [lt]
    node_gpu: 0:3
    path_src: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp
    path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp
    transforms: [filtertoolong, denoising]
  train_en-lt:
    src_tgt: en-lt
    enc_sharing_group: [en]
    dec_sharing_group: [lt]
    node_gpu: 0:3
    path_src: path_to_europarl/lt-en/train.lt-en.en.sp
    path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_valid_src: path_to_europarl/lt-en/valid.lt-en.en.sp
    path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp
    transforms: [filtertoolong]
  # GPU 0:0
  train_lv-en:
    src_tgt: lv-en
    enc_sharing_group: [lv]
    dec_sharing_group: [en]
    node_gpu: 0:0
    path_src: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_tgt: path_to_europarl/lv-en/train.lv-en.en.sp
    path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp
    path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.en.sp
    transforms: [filtertoolong]
  train_lv-lv:
    src_tgt: lv-lv
    enc_sharing_group: [lv]
    dec_sharing_group: [lv]
    node_gpu: 0:0
    path_src: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp
    path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp
    transforms: [filtertoolong, denoising]
  train_en-lv:
    src_tgt: en-lv
    enc_sharing_group: [en]
    dec_sharing_group: [lv]
    node_gpu: 0:0
    path_src: path_to_europarl/lv-en/train.lv-en.en.sp
    path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_valid_src: path_to_europarl/lv-en/valid.lv-en.en.sp
    path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp
    transforms: [filtertoolong]
  # GPU 0:1
  train_nl-en:
    src_tgt: nl-en
    enc_sharing_group: [nl]
    dec_sharing_group: [en]
    node_gpu: 0:1
    path_src: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_tgt: path_to_europarl/nl-en/train.nl-en.en.sp
    path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp
    path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.en.sp
    transforms: [filtertoolong]
  train_nl-nl:
    src_tgt: nl-nl
    enc_sharing_group: [nl]
    dec_sharing_group: [nl]
    node_gpu: 0:1
    path_src: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp
    path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp
    transforms: [filtertoolong, denoising]
  train_en-nl:
    src_tgt: en-nl
    enc_sharing_group: [en]
    dec_sharing_group: [nl]
    node_gpu: 0:1
    path_src: path_to_europarl/nl-en/train.nl-en.en.sp
    path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_valid_src: path_to_europarl/nl-en/valid.nl-en.en.sp
    path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp
    transforms: [filtertoolong]
  # GPU 0:2
  train_pl-en:
    src_tgt: pl-en
    enc_sharing_group: [pl]
    dec_sharing_group: [en]
    node_gpu: 0:2
    path_src: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_tgt: path_to_europarl/pl-en/train.pl-en.en.sp
    path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp
    path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.en.sp
    transforms: [filtertoolong]
  train_pl-pl:
    src_tgt: pl-pl
    enc_sharing_group: [pl]
    dec_sharing_group: [pl]
    node_gpu: 0:2
    path_src: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp
    path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp
    transforms: [filtertoolong, denoising]
  train_en-pl:
    src_tgt: en-pl
    enc_sharing_group: [en]
    dec_sharing_group: [pl]
    node_gpu: 0:2
    path_src: path_to_europarl/pl-en/train.pl-en.en.sp
    path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_valid_src: path_to_europarl/pl-en/valid.pl-en.en.sp
    path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp
    transforms: [filtertoolong]
  # GPU 0:3
  train_pt-en:
    src_tgt: pt-en
    enc_sharing_group: [pt]
    dec_sharing_group: [en]
    node_gpu: 0:3
    path_src: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_tgt: path_to_europarl/pt-en/train.pt-en.en.sp
    path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp
    path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.en.sp
    transforms: [filtertoolong]
  train_pt-pt:
    src_tgt: pt-pt
    enc_sharing_group: [pt]
    dec_sharing_group: [pt]
    node_gpu: 0:3
    path_src: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp
    path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp
    transforms: [filtertoolong, denoising]
  train_en-pt:
    src_tgt: en-pt
    enc_sharing_group: [en]
    dec_sharing_group: [pt]
    node_gpu: 0:3
    path_src: path_to_europarl/pt-en/train.pt-en.en.sp
    path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_valid_src: path_to_europarl/pt-en/valid.pt-en.en.sp
    path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp
    transforms: [filtertoolong]
  # GPU 0:0
  train_ro-en:
    src_tgt: ro-en
    enc_sharing_group: [ro]
    dec_sharing_group: [en]
    node_gpu: 0:0
    path_src: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_tgt: path_to_europarl/ro-en/train.ro-en.en.sp
    path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp
    path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.en.sp
    transforms: [filtertoolong]
  train_ro-ro:
    src_tgt: ro-ro
    enc_sharing_group: [ro]
    dec_sharing_group: [ro]
    node_gpu: 0:0
    path_src: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp
    path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp
    transforms: [filtertoolong, denoising]
  train_en-ro:
    src_tgt: en-ro
    enc_sharing_group: [en]
    dec_sharing_group: [ro]
    node_gpu: 0:0
    path_src: path_to_europarl/ro-en/train.ro-en.en.sp
    path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_valid_src: path_to_europarl/ro-en/valid.ro-en.en.sp
    path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp
    transforms: [filtertoolong]
  # GPU 0:1
  train_sk-en:
    src_tgt: sk-en
    enc_sharing_group: [sk]
    dec_sharing_group: [en]
    node_gpu: 0:1
    path_src: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_tgt: path_to_europarl/sk-en/train.sk-en.en.sp
    path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp
    path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.en.sp
    transforms: [filtertoolong]
  train_sk-sk:
    src_tgt: sk-sk
    enc_sharing_group: [sk]
    dec_sharing_group: [sk]
    node_gpu: 0:1
    path_src: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp
    path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp
    transforms: [filtertoolong, denoising]
  train_en-sk:
    src_tgt: en-sk
    enc_sharing_group: [en]
    dec_sharing_group: [sk]
    node_gpu: 0:1
    path_src: path_to_europarl/sk-en/train.sk-en.en.sp
    path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_valid_src: path_to_europarl/sk-en/valid.sk-en.en.sp
    path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp
    transforms: [filtertoolong]
  # GPU 0:2
  train_sl-en:
    src_tgt: sl-en
    enc_sharing_group: [sl]
    dec_sharing_group: [en]
    node_gpu: 0:2
    path_src: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_tgt: path_to_europarl/sl-en/train.sl-en.en.sp
    path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp
    path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.en.sp
    transforms: [filtertoolong]
  train_sl-sl:
    src_tgt: sl-sl
    enc_sharing_group: [sl]
    dec_sharing_group: [sl]
    node_gpu: 0:2
    path_src: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp
    path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp
    transforms: [filtertoolong, denoising]
  train_en-sl:
    src_tgt: en-sl
    enc_sharing_group: [en]
    dec_sharing_group: [sl]
    node_gpu: 0:2
    path_src: path_to_europarl/sl-en/train.sl-en.en.sp
    path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_valid_src: path_to_europarl/sl-en/valid.sl-en.en.sp
    path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp
    transforms: [filtertoolong]
  # GPU 0:3
  train_sv-en:
    src_tgt: sv-en
    enc_sharing_group: [sv]
    dec_sharing_group: [en]
    node_gpu: 0:3
    path_src: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_tgt: path_to_europarl/sv-en/train.sv-en.en.sp
    path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp
    path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.en.sp
    transforms: [filtertoolong]
  train_sv-sv:
    src_tgt: sv-sv
    enc_sharing_group: [sv]
    dec_sharing_group: [sv]
    node_gpu: 0:3
    path_src: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp
    path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp
    transforms: [filtertoolong, denoising]
  train_en-sv:
    src_tgt: en-sv
    enc_sharing_group: [en]
    dec_sharing_group: [sv]
    node_gpu: 0:3
    path_src: path_to_europarl/sv-en/train.sv-en.en.sp
    path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_valid_src: path_to_europarl/sv-en/valid.sv-en.en.sp
    path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp
    transforms: [filtertoolong]

        
### Transform related opts:
#### Filter
src_seq_length: 200
tgt_seq_length: 200
#### Bart
src_subword_type: sentencepiece
tgt_subword_type: sentencepiece
mask_ratio: 0.2
replace_length: 1

# silently ignore empty lines in the data
skip_empty_level: silent

batch_size: 4096
batch_type: tokens
normalization: tokens
valid_batch_size: 4096
max_generator_batches: 2
src_vocab_size: 100000
tgt_vocab_size: 100000
encoder_type: transformer
decoder_type: transformer
model_dim: 512
transformer_ff: 2048
heads: 8
enc_layers: [6]
dec_layers: [6]
dropout: 0.1
label_smoothing: 0.1
param_init: 0.0
param_init_glorot: true
position_encoding: true
valid_steps: 10000
warmup_steps: 10000
report_every: 100
save_checkpoint_steps: 5000000
# save_checkpoint_steps: 50000
keep_checkpoint: -1
accum_count: 1
optim: adafactor
decay_method: none
learning_rate: 3.0
max_grad_norm: 0.0
seed: 3435
model_type: text
save_all_gpus: false

world_size: 4
gpu_ranks: [0, 1, 2, 3]
node_rank: 0

early_stopping: 5
early_stopping_criteria: accuracy
Multi-node configuration
src_vocab:
  'bg': path_to_vocab/opusTC.mul.vocab.onmt
  'cs': path_to_vocab/opusTC.mul.vocab.onmt
  'da': path_to_vocab/opusTC.mul.vocab.onmt
  'de': path_to_vocab/opusTC.mul.vocab.onmt
  'el': path_to_vocab/opusTC.mul.vocab.onmt
  'en': path_to_vocab/opusTC.mul.vocab.onmt
  'es': path_to_vocab/opusTC.mul.vocab.onmt
  'et': path_to_vocab/opusTC.mul.vocab.onmt
  'fi': path_to_vocab/opusTC.mul.vocab.onmt
  'fr': path_to_vocab/opusTC.mul.vocab.onmt
  'hu': path_to_vocab/opusTC.mul.vocab.onmt
  'it': path_to_vocab/opusTC.mul.vocab.onmt
  'lt': path_to_vocab/opusTC.mul.vocab.onmt
  'lv': path_to_vocab/opusTC.mul.vocab.onmt
  'nl': path_to_vocab/opusTC.mul.vocab.onmt
  'pl': path_to_vocab/opusTC.mul.vocab.onmt
  'pt': path_to_vocab/opusTC.mul.vocab.onmt
  'ro': path_to_vocab/opusTC.mul.vocab.onmt
  'sk': path_to_vocab/opusTC.mul.vocab.onmt
  'sl': path_to_vocab/opusTC.mul.vocab.onmt
  'sv': path_to_vocab/opusTC.mul.vocab.onmt
tgt_vocab:
  'bg': path_to_vocab/opusTC.mul.vocab.onmt
  'cs': path_to_vocab/opusTC.mul.vocab.onmt
  'da': path_to_vocab/opusTC.mul.vocab.onmt
  'de': path_to_vocab/opusTC.mul.vocab.onmt
  'el': path_to_vocab/opusTC.mul.vocab.onmt
  'en': path_to_vocab/opusTC.mul.vocab.onmt
  'es': path_to_vocab/opusTC.mul.vocab.onmt
  'et': path_to_vocab/opusTC.mul.vocab.onmt
  'fi': path_to_vocab/opusTC.mul.vocab.onmt
  'fr': path_to_vocab/opusTC.mul.vocab.onmt
  'hu': path_to_vocab/opusTC.mul.vocab.onmt
  'it': path_to_vocab/opusTC.mul.vocab.onmt
  'lt': path_to_vocab/opusTC.mul.vocab.onmt
  'lv': path_to_vocab/opusTC.mul.vocab.onmt
  'nl': path_to_vocab/opusTC.mul.vocab.onmt
  'pl': path_to_vocab/opusTC.mul.vocab.onmt
  'pt': path_to_vocab/opusTC.mul.vocab.onmt
  'ro': path_to_vocab/opusTC.mul.vocab.onmt
  'sk': path_to_vocab/opusTC.mul.vocab.onmt
  'sl': path_to_vocab/opusTC.mul.vocab.onmt
  'sv': path_to_vocab/opusTC.mul.vocab.onmt

overwrite: False
tasks:
  # GPU 0:0
  train_bg-en:
    src_tgt: bg-en
    enc_sharing_group: [bg]
    dec_sharing_group: [en]
    node_gpu: "0:0"
    path_src: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_tgt: path_to_europarl/bg-en/train.bg-en.en.sp
    path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp
    path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.en.sp
    transforms: [filtertoolong]
  train_bg-bg:
    src_tgt: bg-bg
    enc_sharing_group: [bg]
    dec_sharing_group: [bg]
    node_gpu: "0:0"
    path_src: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_valid_src: path_to_europarl/bg-en/valid.bg-en.bg.sp
    path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp
    transforms: [filtertoolong, denoising]
  train_en-bg:
    src_tgt: en-bg
    enc_sharing_group: [en]
    dec_sharing_group: [bg]
    node_gpu: "0:0"
    path_src: path_to_europarl/bg-en/train.bg-en.en.sp
    path_tgt: path_to_europarl/bg-en/train.bg-en.bg.sp
    path_valid_src: path_to_europarl/bg-en/valid.bg-en.en.sp
    path_valid_tgt: path_to_europarl/bg-en/valid.bg-en.bg.sp
    transforms: [filtertoolong]
  # GPU 0:1
  train_cs-en:
    src_tgt: cs-en
    enc_sharing_group: [cs]
    dec_sharing_group: [en]
    node_gpu: "0:1"
    path_src: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_tgt: path_to_europarl/cs-en/train.cs-en.en.sp
    path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp
    path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.en.sp
    transforms: [filtertoolong]
  train_cs-cs:
    src_tgt: cs-cs
    enc_sharing_group: [cs]
    dec_sharing_group: [cs]
    node_gpu: "0:1"
    path_src: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_valid_src: path_to_europarl/cs-en/valid.cs-en.cs.sp
    path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp
    transforms: [filtertoolong, denoising]
  train_en-cs:
    src_tgt: en-cs
    enc_sharing_group: [en]
    dec_sharing_group: [cs]
    node_gpu: "0:1"
    path_src: path_to_europarl/cs-en/train.cs-en.en.sp
    path_tgt: path_to_europarl/cs-en/train.cs-en.cs.sp
    path_valid_src: path_to_europarl/cs-en/valid.cs-en.en.sp
    path_valid_tgt: path_to_europarl/cs-en/valid.cs-en.cs.sp
    transforms: [filtertoolong]
  # GPU 0:2
  train_da-en:
    src_tgt: da-en
    enc_sharing_group: [da]
    dec_sharing_group: [en]
    node_gpu: "0:2"
    path_src: path_to_europarl/da-en/train.da-en.da.sp
    path_tgt: path_to_europarl/da-en/train.da-en.en.sp
    path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp
    path_valid_tgt: path_to_europarl/da-en/valid.da-en.en.sp
    transforms: [filtertoolong]
  train_da-da:
    src_tgt: da-da
    enc_sharing_group: [da]
    dec_sharing_group: [da]
    node_gpu: "0:2"
    path_src: path_to_europarl/da-en/train.da-en.da.sp
    path_tgt: path_to_europarl/da-en/train.da-en.da.sp
    path_valid_src: path_to_europarl/da-en/valid.da-en.da.sp
    path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp
    transforms: [filtertoolong, denoising]
  train_en-da:
    src_tgt: en-da
    enc_sharing_group: [en]
    dec_sharing_group: [da]
    node_gpu: "0:2"
    path_src: path_to_europarl/da-en/train.da-en.en.sp
    path_tgt: path_to_europarl/da-en/train.da-en.da.sp
    path_valid_src: path_to_europarl/da-en/valid.da-en.en.sp
    path_valid_tgt: path_to_europarl/da-en/valid.da-en.da.sp
    transforms: [filtertoolong]
  # GPU 0:3
  train_de-en:
    src_tgt: de-en
    enc_sharing_group: [de]
    dec_sharing_group: [en]
    node_gpu: "0:3"
    path_src: path_to_europarl/de-en/train.de-en.de.sp
    path_tgt: path_to_europarl/de-en/train.de-en.en.sp
    path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp
    path_valid_tgt: path_to_europarl/de-en/valid.de-en.en.sp
    transforms: [filtertoolong]
  train_de-de:
    src_tgt: de-de
    enc_sharing_group: [de]
    dec_sharing_group: [de]
    node_gpu: "0:3"
    path_src: path_to_europarl/de-en/train.de-en.de.sp
    path_tgt: path_to_europarl/de-en/train.de-en.de.sp
    path_valid_src: path_to_europarl/de-en/valid.de-en.de.sp
    path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp
    transforms: [filtertoolong, denoising]
  train_en-de:
    src_tgt: en-de
    enc_sharing_group: [en]
    dec_sharing_group: [de]
    node_gpu: "0:3"
    path_src: path_to_europarl/de-en/train.de-en.en.sp
    path_tgt: path_to_europarl/de-en/train.de-en.de.sp
    path_valid_src: path_to_europarl/de-en/valid.de-en.en.sp
    path_valid_tgt: path_to_europarl/de-en/valid.de-en.de.sp
    transforms: [filtertoolong]
  # GPU 1:0
  train_el-en:
    src_tgt: el-en
    enc_sharing_group: [el]
    dec_sharing_group: [en]
    node_gpu: "1:0"
    path_src: path_to_europarl/el-en/train.el-en.el.sp
    path_tgt: path_to_europarl/el-en/train.el-en.en.sp
    path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp
    path_valid_tgt: path_to_europarl/el-en/valid.el-en.en.sp
    transforms: [filtertoolong]
  train_el-el:
    src_tgt: el-el
    enc_sharing_group: [el]
    dec_sharing_group: [el]
    node_gpu: "1:0"
    path_src: path_to_europarl/el-en/train.el-en.el.sp
    path_tgt: path_to_europarl/el-en/train.el-en.el.sp
    path_valid_src: path_to_europarl/el-en/valid.el-en.el.sp
    path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp
    transforms: [filtertoolong, denoising]
  train_en-el:
    src_tgt: en-el
    enc_sharing_group: [en]
    dec_sharing_group: [el]
    node_gpu: "1:0"
    path_src: path_to_europarl/el-en/train.el-en.en.sp
    path_tgt: path_to_europarl/el-en/train.el-en.el.sp
    path_valid_src: path_to_europarl/el-en/valid.el-en.en.sp
    path_valid_tgt: path_to_europarl/el-en/valid.el-en.el.sp
    transforms: [filtertoolong]
  # GPU 1:1
  train_es-en:
    src_tgt: es-en
    enc_sharing_group: [es]
    dec_sharing_group: [en]
    node_gpu: "1:1"
    path_src: path_to_europarl/es-en/train.es-en.es.sp
    path_tgt: path_to_europarl/es-en/train.es-en.en.sp
    path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp
    path_valid_tgt: path_to_europarl/es-en/valid.es-en.en.sp
    transforms: [filtertoolong]
  train_es-es:
    src_tgt: es-es
    enc_sharing_group: [es]
    dec_sharing_group: [es]
    node_gpu: "1:1"
    path_src: path_to_europarl/es-en/train.es-en.es.sp
    path_tgt: path_to_europarl/es-en/train.es-en.es.sp
    path_valid_src: path_to_europarl/es-en/valid.es-en.es.sp
    path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp
    transforms: [filtertoolong, denoising]
  train_en-es:
    src_tgt: en-es
    enc_sharing_group: [en]
    dec_sharing_group: [es]
    node_gpu: "1:1"
    path_src: path_to_europarl/es-en/train.es-en.en.sp
    path_tgt: path_to_europarl/es-en/train.es-en.es.sp
    path_valid_src: path_to_europarl/es-en/valid.es-en.en.sp
    path_valid_tgt: path_to_europarl/es-en/valid.es-en.es.sp
    transforms: [filtertoolong]
  # GPU 1:2
  train_et-en:
    src_tgt: et-en
    enc_sharing_group: [et]
    dec_sharing_group: [en]
    node_gpu: "1:2"
    path_src: path_to_europarl/et-en/train.et-en.et.sp
    path_tgt: path_to_europarl/et-en/train.et-en.en.sp
    path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp
    path_valid_tgt: path_to_europarl/et-en/valid.et-en.en.sp
    transforms: [filtertoolong]
  train_et-et:
    src_tgt: et-et
    enc_sharing_group: [et]
    dec_sharing_group: [et]
    node_gpu: "1:2"
    path_src: path_to_europarl/et-en/train.et-en.et.sp
    path_tgt: path_to_europarl/et-en/train.et-en.et.sp
    path_valid_src: path_to_europarl/et-en/valid.et-en.et.sp
    path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp
    transforms: [filtertoolong, denoising]
  train_en-et:
    src_tgt: en-et
    enc_sharing_group: [en]
    dec_sharing_group: [et]
    node_gpu: "1:2"
    path_src: path_to_europarl/et-en/train.et-en.en.sp
    path_tgt: path_to_europarl/et-en/train.et-en.et.sp
    path_valid_src: path_to_europarl/et-en/valid.et-en.en.sp
    path_valid_tgt: path_to_europarl/et-en/valid.et-en.et.sp
    transforms: [filtertoolong]
  # GPU 1:3
  train_fi-en:
    src_tgt: fi-en
    enc_sharing_group: [fi]
    dec_sharing_group: [en]
    node_gpu: "1:3"
    path_src: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_tgt: path_to_europarl/fi-en/train.fi-en.en.sp
    path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp
    path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.en.sp
    transforms: [filtertoolong]
  train_fi-fi:
    src_tgt: fi-fi
    enc_sharing_group: [fi]
    dec_sharing_group: [fi]
    node_gpu: "1:3"
    path_src: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_valid_src: path_to_europarl/fi-en/valid.fi-en.fi.sp
    path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp
    transforms: [filtertoolong, denoising]
  train_en-fi:
    src_tgt: en-fi
    enc_sharing_group: [en]
    dec_sharing_group: [fi]
    node_gpu: "1:3"
    path_src: path_to_europarl/fi-en/train.fi-en.en.sp
    path_tgt: path_to_europarl/fi-en/train.fi-en.fi.sp
    path_valid_src: path_to_europarl/fi-en/valid.fi-en.en.sp
    path_valid_tgt: path_to_europarl/fi-en/valid.fi-en.fi.sp
    transforms: [filtertoolong]
  # GPU 2:0
  train_fr-en:
    src_tgt: fr-en
    enc_sharing_group: [fr]
    dec_sharing_group: [en]
    node_gpu: "2:0"
    path_src: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_tgt: path_to_europarl/fr-en/train.fr-en.en.sp
    path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp
    path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.en.sp
    transforms: [filtertoolong]
  train_fr-fr:
    src_tgt: fr-fr
    enc_sharing_group: [fr]
    dec_sharing_group: [fr]
    node_gpu: "2:0"
    path_src: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_valid_src: path_to_europarl/fr-en/valid.fr-en.fr.sp
    path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp
    transforms: [filtertoolong, denoising]
  train_en-fr:
    src_tgt: en-fr
    enc_sharing_group: [en]
    dec_sharing_group: [fr]
    node_gpu: "2:0"
    path_src: path_to_europarl/fr-en/train.fr-en.en.sp
    path_tgt: path_to_europarl/fr-en/train.fr-en.fr.sp
    path_valid_src: path_to_europarl/fr-en/valid.fr-en.en.sp
    path_valid_tgt: path_to_europarl/fr-en/valid.fr-en.fr.sp
    transforms: [filtertoolong]  
  # GPU 2:1
  train_hu-en:
    src_tgt: hu-en
    enc_sharing_group: [hu]
    dec_sharing_group: [en]
    node_gpu: "2:1"
    path_src: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_tgt: path_to_europarl/hu-en/train.hu-en.en.sp
    path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp
    path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.en.sp
    transforms: [filtertoolong]
  train_hu-hu:
    src_tgt: hu-hu
    enc_sharing_group: [hu]
    dec_sharing_group: [hu]
    node_gpu: "2:1"
    path_src: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_valid_src: path_to_europarl/hu-en/valid.hu-en.hu.sp
    path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp
    transforms: [filtertoolong, denoising]
  train_en-hu:
    src_tgt: en-hu
    enc_sharing_group: [en]
    dec_sharing_group: [hu]
    node_gpu: "2:1"
    path_src: path_to_europarl/hu-en/train.hu-en.en.sp
    path_tgt: path_to_europarl/hu-en/train.hu-en.hu.sp
    path_valid_src: path_to_europarl/hu-en/valid.hu-en.en.sp
    path_valid_tgt: path_to_europarl/hu-en/valid.hu-en.hu.sp
    transforms: [filtertoolong]
  # GPU 2:2
  train_it-en:
    src_tgt: it-en
    enc_sharing_group: [it]
    dec_sharing_group: [en]
    node_gpu: "2:2"
    path_src: path_to_europarl/it-en/train.it-en.it.sp
    path_tgt: path_to_europarl/it-en/train.it-en.en.sp
    path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp
    path_valid_tgt: path_to_europarl/it-en/valid.it-en.en.sp
    transforms: [filtertoolong]
  train_it-it:
    src_tgt: it-it
    enc_sharing_group: [it]
    dec_sharing_group: [it]
    node_gpu: "2:2"
    path_src: path_to_europarl/it-en/train.it-en.it.sp
    path_tgt: path_to_europarl/it-en/train.it-en.it.sp
    path_valid_src: path_to_europarl/it-en/valid.it-en.it.sp
    path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp
    transforms: [filtertoolong, denoising]
  train_en-it:
    src_tgt: en-it
    enc_sharing_group: [en]
    dec_sharing_group: [it]
    node_gpu: "2:2"
    path_src: path_to_europarl/it-en/train.it-en.en.sp
    path_tgt: path_to_europarl/it-en/train.it-en.it.sp
    path_valid_src: path_to_europarl/it-en/valid.it-en.en.sp
    path_valid_tgt: path_to_europarl/it-en/valid.it-en.it.sp
    transforms: [filtertoolong]
  # GPU 2:3
  train_lt-en:
    src_tgt: lt-en
    enc_sharing_group: [lt]
    dec_sharing_group: [en]
    node_gpu: "2:3"
    path_src: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_tgt: path_to_europarl/lt-en/train.lt-en.en.sp
    path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp
    path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.en.sp
    transforms: [filtertoolong]
  train_lt-lt:
    src_tgt: lt-lt
    enc_sharing_group: [lt]
    dec_sharing_group: [lt]
    node_gpu: "2:3"
    path_src: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_valid_src: path_to_europarl/lt-en/valid.lt-en.lt.sp
    path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp
    transforms: [filtertoolong, denoising]
  train_en-lt:
    src_tgt: en-lt
    enc_sharing_group: [en]
    dec_sharing_group: [lt]
    node_gpu: "2:3"
    path_src: path_to_europarl/lt-en/train.lt-en.en.sp
    path_tgt: path_to_europarl/lt-en/train.lt-en.lt.sp
    path_valid_src: path_to_europarl/lt-en/valid.lt-en.en.sp
    path_valid_tgt: path_to_europarl/lt-en/valid.lt-en.lt.sp
    transforms: [filtertoolong]
  # GPU 3:0
  train_lv-en:
    src_tgt: lv-en
    enc_sharing_group: [lv]
    dec_sharing_group: [en]
    node_gpu: "3:0"
    path_src: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_tgt: path_to_europarl/lv-en/train.lv-en.en.sp
    path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp
    path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.en.sp
    transforms: [filtertoolong]
  train_lv-lv:
    src_tgt: lv-lv
    enc_sharing_group: [lv]
    dec_sharing_group: [lv]
    node_gpu: "3:0"
    path_src: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_valid_src: path_to_europarl/lv-en/valid.lv-en.lv.sp
    path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp
    transforms: [filtertoolong, denoising]
  train_en-lv:
    src_tgt: en-lv
    enc_sharing_group: [en]
    dec_sharing_group: [lv]
    node_gpu: "3:0"
    path_src: path_to_europarl/lv-en/train.lv-en.en.sp
    path_tgt: path_to_europarl/lv-en/train.lv-en.lv.sp
    path_valid_src: path_to_europarl/lv-en/valid.lv-en.en.sp
    path_valid_tgt: path_to_europarl/lv-en/valid.lv-en.lv.sp
    transforms: [filtertoolong]
  # GPU 3:1
  train_nl-en:
    src_tgt: nl-en
    enc_sharing_group: [nl]
    dec_sharing_group: [en]
    node_gpu: "3:1"
    path_src: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_tgt: path_to_europarl/nl-en/train.nl-en.en.sp
    path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp
    path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.en.sp
    transforms: [filtertoolong]
  train_nl-nl:
    src_tgt: nl-nl
    enc_sharing_group: [nl]
    dec_sharing_group: [nl]
    node_gpu: "3:1"
    path_src: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_valid_src: path_to_europarl/nl-en/valid.nl-en.nl.sp
    path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp
    transforms: [filtertoolong, denoising]
  train_en-nl:
    src_tgt: en-nl
    enc_sharing_group: [en]
    dec_sharing_group: [nl]
    node_gpu: "3:1"
    path_src: path_to_europarl/nl-en/train.nl-en.en.sp
    path_tgt: path_to_europarl/nl-en/train.nl-en.nl.sp
    path_valid_src: path_to_europarl/nl-en/valid.nl-en.en.sp
    path_valid_tgt: path_to_europarl/nl-en/valid.nl-en.nl.sp
    transforms: [filtertoolong]
  # GPU 3:2
  train_pl-en:
    src_tgt: pl-en
    enc_sharing_group: [pl]
    dec_sharing_group: [en]
    node_gpu: "3:2"
    path_src: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_tgt: path_to_europarl/pl-en/train.pl-en.en.sp
    path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp
    path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.en.sp
    transforms: [filtertoolong]
  train_pl-pl:
    src_tgt: pl-pl
    enc_sharing_group: [pl]
    dec_sharing_group: [pl]
    node_gpu: "3:2"
    path_src: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_valid_src: path_to_europarl/pl-en/valid.pl-en.pl.sp
    path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp
    transforms: [filtertoolong, denoising]
  train_en-pl:
    src_tgt: en-pl
    enc_sharing_group: [en]
    dec_sharing_group: [pl]
    node_gpu: "3:2"
    path_src: path_to_europarl/pl-en/train.pl-en.en.sp
    path_tgt: path_to_europarl/pl-en/train.pl-en.pl.sp
    path_valid_src: path_to_europarl/pl-en/valid.pl-en.en.sp
    path_valid_tgt: path_to_europarl/pl-en/valid.pl-en.pl.sp
    transforms: [filtertoolong]
  # GPU 3:3
  train_pt-en:
    src_tgt: pt-en
    enc_sharing_group: [pt]
    dec_sharing_group: [en]
    node_gpu: "3:3"
    path_src: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_tgt: path_to_europarl/pt-en/train.pt-en.en.sp
    path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp
    path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.en.sp
    transforms: [filtertoolong]
  train_pt-pt:
    src_tgt: pt-pt
    enc_sharing_group: [pt]
    dec_sharing_group: [pt]
    node_gpu: "3:3"
    path_src: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_valid_src: path_to_europarl/pt-en/valid.pt-en.pt.sp
    path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp
    transforms: [filtertoolong, denoising]
  train_en-pt:
    src_tgt: en-pt
    enc_sharing_group: [en]
    dec_sharing_group: [pt]
    node_gpu: "3:3"
    path_src: path_to_europarl/pt-en/train.pt-en.en.sp
    path_tgt: path_to_europarl/pt-en/train.pt-en.pt.sp
    path_valid_src: path_to_europarl/pt-en/valid.pt-en.en.sp
    path_valid_tgt: path_to_europarl/pt-en/valid.pt-en.pt.sp
    transforms: [filtertoolong]
  # GPU 4:0
  train_ro-en:
    src_tgt: ro-en
    enc_sharing_group: [ro]
    dec_sharing_group: [en]
    node_gpu: "4:0"
    path_src: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_tgt: path_to_europarl/ro-en/train.ro-en.en.sp
    path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp
    path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.en.sp
    transforms: [filtertoolong]
  train_ro-ro:
    src_tgt: ro-ro
    enc_sharing_group: [ro]
    dec_sharing_group: [ro]
    node_gpu: "4:0"
    path_src: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_valid_src: path_to_europarl/ro-en/valid.ro-en.ro.sp
    path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp
    transforms: [filtertoolong, denoising]
  train_en-ro:
    src_tgt: en-ro
    enc_sharing_group: [en]
    dec_sharing_group: [ro]
    node_gpu: "4:0"
    path_src: path_to_europarl/ro-en/train.ro-en.en.sp
    path_tgt: path_to_europarl/ro-en/train.ro-en.ro.sp
    path_valid_src: path_to_europarl/ro-en/valid.ro-en.en.sp
    path_valid_tgt: path_to_europarl/ro-en/valid.ro-en.ro.sp
    transforms: [filtertoolong]
  # GPU 4:1
  train_sk-en:
    src_tgt: sk-en
    enc_sharing_group: [sk]
    dec_sharing_group: [en]
    node_gpu: "4:1"
    path_src: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_tgt: path_to_europarl/sk-en/train.sk-en.en.sp
    path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp
    path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.en.sp
    transforms: [filtertoolong]
  train_sk-sk:
    src_tgt: sk-sk
    enc_sharing_group: [sk]
    dec_sharing_group: [sk]
    node_gpu: "4:1"
    path_src: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_valid_src: path_to_europarl/sk-en/valid.sk-en.sk.sp
    path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp
    transforms: [filtertoolong, denoising]
  train_en-sk:
    src_tgt: en-sk
    enc_sharing_group: [en]
    dec_sharing_group: [sk]
    node_gpu: "4:1"
    path_src: path_to_europarl/sk-en/train.sk-en.en.sp
    path_tgt: path_to_europarl/sk-en/train.sk-en.sk.sp
    path_valid_src: path_to_europarl/sk-en/valid.sk-en.en.sp
    path_valid_tgt: path_to_europarl/sk-en/valid.sk-en.sk.sp
    transforms: [filtertoolong]
  # GPU 4:2
  train_sl-en:
    src_tgt: sl-en
    enc_sharing_group: [sl]
    dec_sharing_group: [en]
    node_gpu: "4:2"
    path_src: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_tgt: path_to_europarl/sl-en/train.sl-en.en.sp
    path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp
    path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.en.sp
    transforms: [filtertoolong]
  train_sl-sl:
    src_tgt: sl-sl
    enc_sharing_group: [sl]
    dec_sharing_group: [sl]
    node_gpu: "4:2"
    path_src: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_valid_src: path_to_europarl/sl-en/valid.sl-en.sl.sp
    path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp
    transforms: [filtertoolong, denoising]
  train_en-sl:
    src_tgt: en-sl
    enc_sharing_group: [en]
    dec_sharing_group: [sl]
    node_gpu: "4:2"
    path_src: path_to_europarl/sl-en/train.sl-en.en.sp
    path_tgt: path_to_europarl/sl-en/train.sl-en.sl.sp
    path_valid_src: path_to_europarl/sl-en/valid.sl-en.en.sp
    path_valid_tgt: path_to_europarl/sl-en/valid.sl-en.sl.sp
    transforms: [filtertoolong]
  # GPU 4:3
  train_sv-en:
    src_tgt: sv-en
    enc_sharing_group: [sv]
    dec_sharing_group: [en]
    node_gpu: "4:3"
    path_src: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_tgt: path_to_europarl/sv-en/train.sv-en.en.sp
    path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp
    path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.en.sp
    transforms: [filtertoolong]
  train_sv-sv:
    src_tgt: sv-sv
    enc_sharing_group: [sv]
    dec_sharing_group: [sv]
    node_gpu: "4:3"
    path_src: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_valid_src: path_to_europarl/sv-en/valid.sv-en.sv.sp
    path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp
    transforms: [filtertoolong, denoising]
  train_en-sv:
    src_tgt: en-sv
    enc_sharing_group: [en]
    dec_sharing_group: [sv]
    node_gpu: "4:3"
    path_src: path_to_europarl/sv-en/train.sv-en.en.sp
    path_tgt: path_to_europarl/sv-en/train.sv-en.sv.sp
    path_valid_src: path_to_europarl/sv-en/valid.sv-en.en.sp
    path_valid_tgt: path_to_europarl/sv-en/valid.sv-en.sv.sp
    transforms: [filtertoolong]

        
### Transform related opts:
#### Filter
src_seq_length: 200
tgt_seq_length: 200
#### Bart
src_subword_type: sentencepiece
tgt_subword_type: sentencepiece
mask_ratio: 0.2
replace_length: 1

# silently ignore empty lines in the data
skip_empty_level: silent

batch_size: 4096
batch_type: tokens
normalization: tokens
valid_batch_size: 4096
max_generator_batches: 2
src_vocab_size: 100000
tgt_vocab_size: 100000
encoder_type: transformer
decoder_type: transformer
model_dim: 512
transformer_ff: 2048
heads: 8
enc_layers: [6]
dec_layers: [6]
dropout: 0.1
label_smoothing: 0.1
param_init: 0.0
param_init_glorot: true
position_encoding: true
valid_steps: 10000
warmup_steps: 10000
report_every: 100
save_checkpoint_steps: 50000
keep_checkpoint: -1
accum_count: 1
optim: adafactor
decay_method: none
learning_rate: 3.0
max_grad_norm: 0.0
seed: 3435
model_type: text
save_all_gpus: false

n_nodes: 5
world_size: 20
gpu_ranks: [0, 1, 2, 3]

early_stopping: 5
early_stopping_criteria: accuracy

Data Configuration:

  • Vocabularies for the source and target languages is need to be specified. In the example, we used a shared vocabulary.

  • Specifies options related to data transformation, including filtering and BART-specific denoising parameters.

Task Configuration:

  • Translation tasks are defined in this section, such as bg-en for Bulgarian to English translation.

  • Each task includes details such as source and target file paths, sharing groups, GPU assignments, and data transforms.

  • For GPU assignments, the task defines the ranks of nodes and GPUs. For example, 4:0 indicates the first GPU on the fifth node.

Training Configuration:

  • Batch size, normalization, and other training parameters are set.

  • Model parameters such as dimensions, transformer layers, dropout, label smoothing, and more are specified.

  • The training uses the Adafactor optimizer with a learning rate of 3.0 and no gradient clipping.

  • Early stopping is enabled with a criterion of accuracy and a patience of 5 steps.

  • The training process is distributed across 4 GPUs (world_size: 4, gpu_ranks: [0, 1, 2, 3]) on a single node (node_rank: 0) for single node job. For the 5-node job, job is distributed across 20 GPUs.

Step 4: Train your MAMMOTH model

Finally, we can start the training process now. Here we provide an example script that sets several environment variables, creates necessary directories, and then runs a training job for a MAMMOTH machine translation model.

export PYTHONUSERBASE=/path_to_your_env/mammoth/

# pointer to codebase
export MAMMOTH=/path_to_codebase/mammoth

# pointer to config file
export CONFIG_DIR=path_to_europarl/config

# pointer to slurm multinode wrapper.
export SCRIPT_DIR=path_to_europarl/scripts/

# info for model and log saving
export SAVE_DIR=your_path/models/europarl
export LOG_DIR=${SAVE_DIR}/logs
export EXP_ID=example-1-node

mkdir -p  ${SAVE_DIR}/{logs,models}

srun ${SCRIPT_DIR}/wrapper.sh -u ${MAMMOTH}/train.py \
    -config ${CONFIG_DIR}/europarl-1node-4gpu.yml \
    -save_model ${SAVE_DIR}/models/${EXP_ID} \
    -master_port 9973 \
    -tensorboard -tensorboard_log_dir ${LOG_DIR}/${EXP_ID}

Environment Variable Setup:

  • PYTHONUSERBASE: Specifies the base directory for Python user-specific packages. You can also specify the python environment in your favorite way and check the installation guide for more information.

  • MAMMOTH: Points to the codebase directory for a project named “mammoth.”

  • CONFIG_DIR: Points to a directory containing configuration files.

  • SCRIPT_DIR: Points to a directory containing Slurm multinode wrapper scripts.

  • SAVE_DIR: Specifies the base directory for saving model-related files.

  • LOG_DIR: Specifies the directory for saving logs related to the model training.

  • EXP_ID: Represents an experiment identifier, set to “example-1-node.”

Directory Creation:

  • Creates the “logs” and “models” directories inside SAVE_DIR if they do not already exist. You will find the logs and saved models there.

Training Job Submission:

  • We utilize Slurm for resource allocation. srun: Initiates a Slurm job.

  • ${SCRIPT_DIR}/wrapper.sh: Calls a wrapper script for managing Slurm settings, monitoring GPU usage, and etc. An example of wrapper script can be:

export CUDA_VISIBLE_DEVICES=0,1,2,3
nvidia-smi dmon -s mu -d 5 -o TD > "${LOG_DIR}/gpu_load-${EXP_ID}-${PPID}.log" &
echo python -u "$@" --node_rank $SLURM_NODEID
python -u "$@" --node_rank $SLURM_NODEID
  • -u ${MAMMOTH}/train.py: Specifies the Python script for training, located in the “mammoth” codebase.

  • -config ${CONFIG_DIR}/europarl-1node-4gpu.yml: Specifies the configuration file for the training job.

  • -save_model ${SAVE_DIR}/models/${EXP_ID}: Specifies the directory to save the trained model.

  • -master_port 9973: Specifies the master port for communication.

  • -tensorboard -tensorboard_log_dir ${LOG_DIR}/${EXP_ID}: Enables TensorBoard logging and specifies the directory for TensorBoard logs.

Hooray! Take a moment to celebrate the progress you’ve made. Wait for hours and the model training should be completed soon.