Translation¶

Translations¶

class mammoth.translate.Translation(src, src_raw, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns)[source]¶

Bases: object

Container for a translated sentence.

Variables

src (LongTensor) – Source word IDs.
src_raw (List[str]) – Raw source words.
pred_sents (List[List[str]]) – Words from the n-best translations.
pred_scores (List[List[float]]) – Log-probs of n-best translations.
attns (List[FloatTensor]) – Attention distribution for each translation.
gold_sent (List[str]) – Words from gold translation.
gold_score (List[float]) – Log-prob of gold translation.
word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.

log(sent_number)[source]¶: Log translation.

Translator Class¶

class mammoth.translate.Translator(model, vocabs, src_file_path, tgt_file_path=None, gpu=-1, n_best=1, min_length=0, max_length=100, ratio=0.0, beam_size=30, random_sampling_topk=0, random_sampling_topp=0.0, random_sampling_temp=1.0, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, ban_unk_token=False, tgt_prefix=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, report_score=True, logger=None, seed=-1, task=None)[source]¶

Bases: mammoth.translate.translator.Inference

translate_batch(batch, src_vocabs, attn_debug)[source]¶: Translate a batch of sentences.

class mammoth.translate.TranslationBuilder(data, vocabs, n_best=1, replace_unk=False, has_tgt=False, phrase_table='')[source]¶

Bases: object

Build a word-based translation from the batch output of translator and the underlying dictionaries.

Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]

Parameters

data (mammoth.inputters.ParallelCorpus) – Data.
vocabs (dict[str, mammoth.inputters.Vocab]) – data vocabs
n_best (int) – number of translations produced
replace_unk (bool) – replace unknown words using attention
has_tgt (bool) – will the batch have gold targets

Decoding Strategies¶

class mammoth.translate.DecodeStrategy(pad, bos, eos, unk, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token)[source]¶

Bases: object

Base class for generation strategies.

Parameters

pad (int) – Magic integer in output vocab.
bos (int) – Magic integer in output vocab.
eos (int) – Magic integer in output vocab.
unk (int) – Magic integer in output vocab.
batch_size (int) – Current batch size.
parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated parallel_paths times in relevant state tensors.
min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.
max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).
ban_unk_token (Boolean) – Whether unk token is forbidden
block_ngram_repeat (int) – Block beams where block_ngram_repeat-grams repeat.
exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.
return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.

Variables

pad (int) – See above.
bos (int) – See above.
eos (int) – See above.
unk (int) – See above.
predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences.
scores (list[list[FloatTensor]]) – For each batch, holds a list of scores.
attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape (step, inp_seq_len) where inp_seq_len is the length of the sample (not the max length of all inp seqs).
alive_seq (LongTensor) – Shape (B x parallel_paths, step). This sequence grows in the step axis on each call to advance().
is_finished (ByteTensor or NoneType) – Shape (B, parallel_paths). Initialized to None.
alive_attn (FloatTensor or NoneType) – If tensor, shape is (step, B x parallel_paths, inp_seq_len), where inp_seq_len is the (max) length of the input sequence.
target_prefix (LongTensor or NoneType) – If tensor, shape is (B x parallel_paths, prefix_seq_len), where prefix_seq_len is the (max) length of the pre-fixed prediction.
min_length (int) – See above.
max_length (int) – See above.
ban_unk_token (Boolean) – See above.
block_ngram_repeat (int) – See above.
exclusion_tokens (set[int]) – See above.
return_attention (bool) – See above.
done (bool) – See above.

advance(log_probs, attn)[source]¶

DecodeStrategy subclasses should override advance().

Advance is used to update self.alive_seq, self.is_finished, and, when appropriate, self.alive_attn.

block_ngram_repeats(log_probs)[source]¶

We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.

The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.

This improves on the previous version’s complexity:

previous version’s complexity: batch_size * beam_size * len(self)
current version’s complexity: batch_size * beam_size

This improves on the previous version’s accuracy;

Previous version blocks the whole beam, whereas here we only

block specific tokens.

Before the translation would fail when all beams contained

repeated ngrams. This is sure to never happen here.

initialize(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]¶

DecodeStrategy subclasses should override initialize().

initialize should be called before all actions. used to prepare necessary ingredients for decode.

maybe_update_forbidden_tokens()[source]¶: We complete and reorder the list of forbidden_tokens

maybe_update_target_prefix(select_index)[source]¶: We update / reorder target_prefix for alive path.

target_prefixing(log_probs)[source]¶

Fix the first part of predictions with self.target_prefix.

Parameters: log_probs (FloatTensor) – logits of size (B, vocab_size).
Returns: modified logits in (B, vocab_size).
Return type: log_probs (FloatTensor)

update_finished()[source]¶

DecodeStrategy subclasses should override update_finished().

update_finished is used to update self.predictions, self.scores, and other “output” attributes.

class mammoth.translate.BeamSearch(beam_size, batch_size, pad, bos, eos, unk, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token)[source]¶

Bases: mammoth.translate.beam_search.BeamSearchBase

Beam search for seq2seq/encoder-decoder models

initialize(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]¶: Initialize for decoding. Repeat src objects beam_size times.

mammoth.translate.greedy_search.sample_with_temperature(logits, sampling_temp, keep_topk, keep_topp)[source]¶

Select next tokens randomly from the top k possible next tokens.

Samples from a categorical distribution over the keep_topk words using the category probabilities logits / sampling_temp.

Parameters

logits (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)
sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
keep_topp (float) – Keep most likely words until the cumulated probability is greater than p. If used with keep_topk: both conditions will be applied

Returns

topk_ids: Shaped (batch_size, 1). These are the sampled word indices in the output vocab.
topk_scores: Shaped (batch_size, 1). These are essentially (logits / sampling_temp)[topk_ids].

Return type

(LongTensor, FloatTensor)

class mammoth.translate.GreedySearch(pad, bos, eos, unk, batch_size, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, keep_topp, beam_size, ban_unk_token)[source]¶

Bases: mammoth.translate.decode_strategy.DecodeStrategy

Select next tokens randomly from the top k possible next tokens.

The scores attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event that max_length is reached)

Parameters

pad (int) – See base.
bos (int) – See base.
eos (int) – See base.
unk (int) – See base.
batch_size (int) – See base.
global_scorer (mammoth.translate.GNMTGlobalScorer) – Scorer instance.
min_length (int) – See base.
max_length (int) – See base.
ban_unk_token (Boolean) – See base.
block_ngram_repeat (int) – See base.
exclusion_tokens (set[int]) – See base.
return_attention (bool) – See base.
max_length – See base.
sampling_temp (float) – See sample_with_temperature().
keep_topk (int) – See sample_with_temperature().
keep_topp (float) – See sample_with_temperature().
beam_size (int) – Number of beams to use.

advance(log_probs, attn)[source]¶

Select next tokens randomly from the top k possible next tokens.

Parameters

log_probs (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)
attn (FloatTensor) – Shaped (1, B, inp_seq_len).

initialize(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]¶: Initialize for decoding.

update_finished()[source]¶: Finalize scores and predictions.

Scoring¶

class mammoth.translate.penalties.PenaltyBuilder(cov_pen, length_pen)[source]¶

Bases: object

Returns the Length and Coverage Penalty function for Beam Search.

Parameters

length_pen (str) – option name of length pen
cov_pen (str) – option name of cov pen

Variables

has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.
length_penalty (callable[[int, float], float]) – Calculates the length penalty.

coverage_none(cov, beta=0.0)[source]¶: Returns zero as penalty

coverage_summary(cov, beta=0.0)[source]¶: Our summary penalty.

coverage_wu(cov, beta=0.0)[source]¶

GNMT coverage re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16]. cov is expected to be sized (*, seq_len), where * is probably batch_size x beam_size but could be several dimensions like (batch_size, beam_size). If cov is attention, then the seq_len axis probably sums to (almost) 1.

length_average(cur_len, alpha=0.0)[source]¶: Returns the current sequence length.

length_none(cur_len, alpha=0.0)[source]¶: Returns unmodified scores.

length_wu(cur_len, alpha=0.0)[source]¶

GNMT length re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16].

class mammoth.translate.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)[source]¶

Bases: object

NMT re-ranking.

Parameters

alpha (float) – Length parameter.
beta (float) – Coverage parameter.
length_penalty (str) – Length penalty strategy.
coverage_penalty (str) – Coverage penalty strategy.

Variables

alpha (float) – See above.
beta (float) – See above.
length_penalty (callable) – See penalties.PenaltyBuilder.
coverage_penalty (callable) – See penalties.PenaltyBuilder.
has_cov_pen (bool) – See penalties.PenaltyBuilder.
has_len_pen (bool) – See penalties.PenaltyBuilder.