MAMMOTH Sharing Schemes

MAMMOTH is designed as a flexible modular system, allowing users to configure, train, and test various sharing schemes. This tutorial will guide you through the process of setting up and experimenting with different sharing schemes, including:

  • fully shared

  • fully unshared

  • encoder shared

  • decoder shared

The configuration for each scheme is managed through YAML files, ensuring a seamless and customizable experience.

Dataset

For this tutorial, we will be utilizing the UNPC dataset, which consists of manually translated UN documents spanning the last 25 years (1990 to 2014) for the six official UN languages: Arabic, Chinese, English, French, Russian, and Spanish.

Before diving into the sharing schemes, we need to preprocess the data. You can download the processed data using the following command:

wget https://mammoth-share.a3s.fi/unpc.tar

Additionally, we require the corresponding vocabularies for the dataset. Download the vocabularies with the following command:

wget https://mammoth-share.a3s.fi/vocab.tar.gz

Now, let’s explore an overview of the sharing schemes to better understand their functionalities.

Sharing Schemes Overview

Let’s delve into an overview of the MAMMOTH Sharing Schemes, each offering unique configurations for a flexible modular system.

1. Fully Unshared:

  • Each language maintains a distinct set of parameters for both encoder and decoder.

  • No parameter sharing occurs between languages.

  train_ar-ar:
    dec_sharing_group:
    - ar
    enc_sharing_group:
    - ar
  • train_ar-ar: This denotes the training configuration for the Arabic-to-Arabic language pair.

  • dec_sharing_group: Specifies the decoder sharing group, indicating which languages share decoder parameters. In this case, only Arabic (ar) is included, meaning no sharing with other languages for decoding.

  • enc_sharing_group: Denotes the encoder sharing group, signifying which languages share encoder parameters. Here, it’s also set to only Arabic (ar), indicating no encoder parameter sharing with other languages.

2. Shared Encoder, Separate Decoder:

  • Encoder parameters are shared across all languages.

  • Each language has a separate set of parameters for the decoder.

  train_ar-ar:
    dec_sharing_group:
    - ar
    enc_sharing_group:
    - all

3. Separate Encoder, Shared Decoder:

  • Each language has a separate set of parameters for the encoder.

  • Decoder parameters are shared across all languages.

  train_ar-en:
    dec_sharing_group:
    - all
    enc_sharing_group:
    - ar

4. Fully Shared:

  • Both encoder and decoder parameters are shared across all languages.

  • The entire model is shared among all language pairs.

  train_ar-ar:
    dec_sharing_group:
    - all
    enc_sharing_group:
    - all

You can conveniently download the complete configurations using the following command:

wget https://mammoth-share.a3s.fi/configs.tar.gz

These configurations provide a solid foundation for configuring, training, and testing various sharing schemes in the MAMMOTH framework. Ensure to modify the file paths according to your specific compute device configurations. Feel free to experiment and tailor these settings to suit your specific needs.

Training Modular Systems

1. Setup:

To initiate the training process for MAMMOTH’s modular systems, start by setting up the necessary environment variables:

export MAMMOTH=/path/to/mammoth
export CONFIG=/path/to/configs/config.yaml

2. Training Command:

Execute the following command to commence training:

srun /path/to/wrapper.sh $MAMMOTH/train.py \
    -config $CONFIG \
    -master_ip $SLURMD_NODENAME \
    -master_port 9969

For the wrapper script, use an example like the one below:

python -u "$@" --node_rank $SLURM_NODEID

This tutorial utilizes SLURM for job scheduling and parallel computing. You can tailor the provided commands for your specific needs, adapting them to alternative job scheduling systems or standalone setups. Ensure that the config.yaml file specifies the desired sharing scheme.

The training can be run on a single GPU in which case the wrapper wouldn’t be necessary. In this case, you can train with the following command.

python -u $MAMMOTH/train.py -config $CONFIG

3. Inference Command:

After training, use the following command to test the model:

python3 -u $MAMMOTH/translate.py \
    --config $CONFIG \
    --model "$checkpoint" \
    --task_id train_$sl-$tl \
    --src $processed_data/$lp/$lp.$sl.sp \
    --output $out_path/$sl-$tl.${base}hyp.sp \
    --gpu 0 --shard_size 0 \
    --batch_size 512

Remember to replace $checkpoint, $sl (source language), $tl (target language), $lp (language pair), $processed_data, and $out_path with appropriate values.

We provide the model checkpoint trained using the aforementioned encoder shared scheme.

wget https://mammoth-share.a3s.fi/encoder-shared-models.tar.gz

Notes:

  • Make sure to adapt the paths and variables to your specific directory structure.

  • Adjust the --gpu flag in the testing command based on your GPU availability.

  • Ensure that the configuration file (config.yaml) contains the correct sharing scheme based on your experiment.

This tutorial serves as a general guide, and it is recommended to refer to the specific configuration file for additional details and customization options. Feel free to explore and adapt the commands to suit your specific training and testing requirements, regardless of the job scheduling system you choose to employ.