Translation

Translations

class mammoth.translate.Translation(src, src_raw, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns)[source]

Bases: object

Container for a translated sentence.

Variables
  • src (LongTensor) – Source word IDs.

  • src_raw (List[str]) – Raw source words.

  • pred_sents (List[List[str]]) – Words from the n-best translations.

  • pred_scores (List[List[float]]) – Log-probs of n-best translations.

  • attns (List[FloatTensor]) – Attention distribution for each translation.

  • gold_sent (List[str]) – Words from gold translation.

  • gold_score (List[float]) – Log-prob of gold translation.

  • word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.

log(sent_number)[source]

Log translation.

Translator Class

class mammoth.translate.Translator(model, vocabs, src_file_path, tgt_file_path=None, gpu=-1, n_best=1, min_length=0, max_length=100, ratio=0.0, beam_size=30, random_sampling_topk=0, random_sampling_topp=0.0, random_sampling_temp=1.0, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, ban_unk_token=False, tgt_prefix=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, report_score=True, logger=None, seed=-1, task=None)[source]

Bases: mammoth.translate.translator.Inference

translate_batch(batch, src_vocabs, attn_debug)[source]

Translate a batch of sentences.

class mammoth.translate.TranslationBuilder(data, vocabs, n_best=1, replace_unk=False, has_tgt=False, phrase_table='')[source]

Bases: object

Build a word-based translation from the batch output of translator and the underlying dictionaries.

Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]

Parameters
  • data (mammoth.inputters.ParallelCorpus) – Data.

  • vocabs (dict[str, mammoth.inputters.Vocab]) – data vocabs

  • n_best (int) – number of translations produced

  • replace_unk (bool) – replace unknown words using attention

  • has_tgt (bool) – will the batch have gold targets

Decoding Strategies

class mammoth.translate.DecodeStrategy(pad, bos, eos, unk, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token)[source]

Bases: object

Base class for generation strategies.

Parameters
  • pad (int) – Magic integer in output vocab.

  • bos (int) – Magic integer in output vocab.

  • eos (int) – Magic integer in output vocab.

  • unk (int) – Magic integer in output vocab.

  • batch_size (int) – Current batch size.

  • parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated parallel_paths times in relevant state tensors.

  • min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.

  • max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).

  • ban_unk_token (Boolean) – Whether unk token is forbidden

  • block_ngram_repeat (int) – Block beams where block_ngram_repeat-grams repeat.

  • exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.

  • return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.

Variables
  • pad (int) – See above.

  • bos (int) – See above.

  • eos (int) – See above.

  • unk (int) – See above.

  • predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences.

  • scores (list[list[FloatTensor]]) – For each batch, holds a list of scores.

  • attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape (step, inp_seq_len) where inp_seq_len is the length of the sample (not the max length of all inp seqs).

  • alive_seq (LongTensor) – Shape (B x parallel_paths, step). This sequence grows in the step axis on each call to advance().

  • is_finished (ByteTensor or NoneType) – Shape (B, parallel_paths). Initialized to None.

  • alive_attn (FloatTensor or NoneType) – If tensor, shape is (step, B x parallel_paths, inp_seq_len), where inp_seq_len is the (max) length of the input sequence.

  • target_prefix (LongTensor or NoneType) – If tensor, shape is (B x parallel_paths, prefix_seq_len), where prefix_seq_len is the (max) length of the pre-fixed prediction.

  • min_length (int) – See above.

  • max_length (int) – See above.

  • ban_unk_token (Boolean) – See above.

  • block_ngram_repeat (int) – See above.

  • exclusion_tokens (set[int]) – See above.

  • return_attention (bool) – See above.

  • done (bool) – See above.

advance(log_probs, attn)[source]

DecodeStrategy subclasses should override advance().

Advance is used to update self.alive_seq, self.is_finished, and, when appropriate, self.alive_attn.

block_ngram_repeats(log_probs)[source]

We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.

The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.

This improves on the previous version’s complexity:
  • previous version’s complexity: batch_size * beam_size * len(self)

  • current version’s complexity: batch_size * beam_size

This improves on the previous version’s accuracy;
  • Previous version blocks the whole beam, whereas here we only

block specific tokens.

  • Before the translation would fail when all beams contained

repeated ngrams. This is sure to never happen here.

initialize(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]

DecodeStrategy subclasses should override initialize().

initialize should be called before all actions. used to prepare necessary ingredients for decode.

maybe_update_forbidden_tokens()[source]

We complete and reorder the list of forbidden_tokens

maybe_update_target_prefix(select_index)[source]

We update / reorder target_prefix for alive path.

target_prefixing(log_probs)[source]

Fix the first part of predictions with self.target_prefix.

Parameters

log_probs (FloatTensor) – logits of size (B, vocab_size).

Returns

modified logits in (B, vocab_size).

Return type

log_probs (FloatTensor)

update_finished()[source]

DecodeStrategy subclasses should override update_finished().

update_finished is used to update self.predictions, self.scores, and other “output” attributes.

class mammoth.translate.BeamSearch(beam_size, batch_size, pad, bos, eos, unk, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token)[source]

Bases: mammoth.translate.beam_search.BeamSearchBase

Beam search for seq2seq/encoder-decoder models

initialize(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]

Initialize for decoding. Repeat src objects beam_size times.

mammoth.translate.greedy_search.sample_with_temperature(logits, sampling_temp, keep_topk, keep_topp)[source]

Select next tokens randomly from the top k possible next tokens.

Samples from a categorical distribution over the keep_topk words using the category probabilities logits / sampling_temp.

Parameters
  • logits (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)

  • sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.

  • keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.

  • keep_topp (float) – Keep most likely words until the cumulated probability is greater than p. If used with keep_topk: both conditions will be applied

Returns

  • topk_ids: Shaped (batch_size, 1). These are the sampled word indices in the output vocab.

  • topk_scores: Shaped (batch_size, 1). These are essentially (logits / sampling_temp)[topk_ids].

Return type

(LongTensor, FloatTensor)

class mammoth.translate.GreedySearch(pad, bos, eos, unk, batch_size, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, keep_topp, beam_size, ban_unk_token)[source]

Bases: mammoth.translate.decode_strategy.DecodeStrategy

Select next tokens randomly from the top k possible next tokens.

The scores attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event that max_length is reached)

Parameters
  • pad (int) – See base.

  • bos (int) – See base.

  • eos (int) – See base.

  • unk (int) – See base.

  • batch_size (int) – See base.

  • global_scorer (mammoth.translate.GNMTGlobalScorer) – Scorer instance.

  • min_length (int) – See base.

  • max_length (int) – See base.

  • ban_unk_token (Boolean) – See base.

  • block_ngram_repeat (int) – See base.

  • exclusion_tokens (set[int]) – See base.

  • return_attention (bool) – See base.

  • max_length – See base.

  • sampling_temp (float) – See sample_with_temperature().

  • keep_topk (int) – See sample_with_temperature().

  • keep_topp (float) – See sample_with_temperature().

  • beam_size (int) – Number of beams to use.

advance(log_probs, attn)[source]

Select next tokens randomly from the top k possible next tokens.

Parameters
  • log_probs (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)

  • attn (FloatTensor) – Shaped (1, B, inp_seq_len).

initialize(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]

Initialize for decoding.

update_finished()[source]

Finalize scores and predictions.

Scoring

class mammoth.translate.penalties.PenaltyBuilder(cov_pen, length_pen)[source]

Bases: object

Returns the Length and Coverage Penalty function for Beam Search.

Parameters
  • length_pen (str) – option name of length pen

  • cov_pen (str) – option name of cov pen

Variables
  • has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.

  • has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.

  • coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.

  • length_penalty (callable[[int, float], float]) – Calculates the length penalty.

coverage_none(cov, beta=0.0)[source]

Returns zero as penalty

coverage_summary(cov, beta=0.0)[source]

Our summary penalty.

coverage_wu(cov, beta=0.0)[source]

GNMT coverage re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16]. cov is expected to be sized (*, seq_len), where * is probably batch_size x beam_size but could be several dimensions like (batch_size, beam_size). If cov is attention, then the seq_len axis probably sums to (almost) 1.

length_average(cur_len, alpha=0.0)[source]

Returns the current sequence length.

length_none(cur_len, alpha=0.0)[source]

Returns unmodified scores.

length_wu(cur_len, alpha=0.0)[source]

GNMT length re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16].

class mammoth.translate.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)[source]

Bases: object

NMT re-ranking.

Parameters
  • alpha (float) – Length parameter.

  • beta (float) – Coverage parameter.

  • length_penalty (str) – Length penalty strategy.

  • coverage_penalty (str) – Coverage penalty strategy.

Variables