Translation¶
Translations¶
-
class
mammoth.translate.
Translation
(src, src_raw, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns)[source]¶ Bases:
object
Container for a translated sentence.
- Variables
src (LongTensor) – Source word IDs.
src_raw (List[str]) – Raw source words.
pred_sents (List[List[str]]) – Words from the n-best translations.
pred_scores (List[List[float]]) – Log-probs of n-best translations.
attns (List[FloatTensor]) – Attention distribution for each translation.
gold_sent (List[str]) – Words from gold translation.
gold_score (List[float]) – Log-prob of gold translation.
word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.
Translator Class¶
-
class
mammoth.translate.
Translator
(model, vocabs, src_file_path, tgt_file_path=None, gpu=-1, n_best=1, min_length=0, max_length=100, ratio=0.0, beam_size=30, random_sampling_topk=0, random_sampling_topp=0.0, random_sampling_temp=1.0, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, ban_unk_token=False, tgt_prefix=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, report_score=True, logger=None, seed=-1, task=None)[source]¶ Bases:
mammoth.translate.translator.Inference
-
class
mammoth.translate.
TranslationBuilder
(data, vocabs, n_best=1, replace_unk=False, has_tgt=False, phrase_table='')[source]¶ Bases:
object
Build a word-based translation from the batch output of translator and the underlying dictionaries.
Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]
- Parameters
data (mammoth.inputters.ParallelCorpus) – Data.
vocabs (dict[str, mammoth.inputters.Vocab]) – data vocabs
n_best (int) – number of translations produced
replace_unk (bool) – replace unknown words using attention
has_tgt (bool) – will the batch have gold targets
Decoding Strategies¶
-
class
mammoth.translate.
DecodeStrategy
(pad, bos, eos, unk, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token)[source]¶ Bases:
object
Base class for generation strategies.
- Parameters
pad (int) – Magic integer in output vocab.
bos (int) – Magic integer in output vocab.
eos (int) – Magic integer in output vocab.
unk (int) – Magic integer in output vocab.
batch_size (int) – Current batch size.
parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated
parallel_paths
times in relevant state tensors.min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.
max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).
ban_unk_token (Boolean) – Whether unk token is forbidden
block_ngram_repeat (int) – Block beams where
block_ngram_repeat
-grams repeat.exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.
return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.
- Variables
pad (int) – See above.
bos (int) – See above.
eos (int) – See above.
unk (int) – See above.
predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences.
scores (list[list[FloatTensor]]) – For each batch, holds a list of scores.
attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape
(step, inp_seq_len)
whereinp_seq_len
is the length of the sample (not the max length of all inp seqs).alive_seq (LongTensor) – Shape
(B x parallel_paths, step)
. This sequence grows in thestep
axis on each call toadvance()
.is_finished (ByteTensor or NoneType) – Shape
(B, parallel_paths)
. Initialized toNone
.alive_attn (FloatTensor or NoneType) – If tensor, shape is
(step, B x parallel_paths, inp_seq_len)
, whereinp_seq_len
is the (max) length of the input sequence.target_prefix (LongTensor or NoneType) – If tensor, shape is
(B x parallel_paths, prefix_seq_len)
, whereprefix_seq_len
is the (max) length of the pre-fixed prediction.min_length (int) – See above.
max_length (int) – See above.
ban_unk_token (Boolean) – See above.
block_ngram_repeat (int) – See above.
exclusion_tokens (set[int]) – See above.
return_attention (bool) – See above.
done (bool) – See above.
-
advance
(log_probs, attn)[source]¶ DecodeStrategy subclasses should override
advance()
.Advance is used to update
self.alive_seq
,self.is_finished
, and, when appropriate,self.alive_attn
.
-
block_ngram_repeats
(log_probs)[source]¶ We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.
The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.
- This improves on the previous version’s complexity:
previous version’s complexity: batch_size * beam_size * len(self)
current version’s complexity: batch_size * beam_size
- This improves on the previous version’s accuracy;
Previous version blocks the whole beam, whereas here we only
block specific tokens.
Before the translation would fail when all beams contained
repeated ngrams. This is sure to never happen here.
-
initialize
(memory_bank, src_lengths, src_map=None, device=None, target_prefix=None)[source]¶ DecodeStrategy subclasses should override
initialize()
.initialize should be called before all actions. used to prepare necessary ingredients for decode.
-
target_prefixing
(log_probs)[source]¶ Fix the first part of predictions with self.target_prefix.
- Parameters
log_probs (FloatTensor) – logits of size
(B, vocab_size)
.- Returns
modified logits in
(B, vocab_size)
.- Return type
log_probs (FloatTensor)
-
update_finished
()[source]¶ DecodeStrategy subclasses should override
update_finished()
.update_finished
is used to updateself.predictions
,self.scores
, and other “output” attributes.
-
class
mammoth.translate.
BeamSearch
(beam_size, batch_size, pad, bos, eos, unk, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token)[source]¶ Bases:
mammoth.translate.beam_search.BeamSearchBase
Beam search for seq2seq/encoder-decoder models
-
mammoth.translate.greedy_search.
sample_with_temperature
(logits, sampling_temp, keep_topk, keep_topp)[source]¶ Select next tokens randomly from the top k possible next tokens.
Samples from a categorical distribution over the
keep_topk
words using the category probabilitieslogits / sampling_temp
.- Parameters
logits (FloatTensor) – Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.)sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
keep_topp (float) – Keep most likely words until the cumulated probability is greater than p. If used with keep_topk: both conditions will be applied
- Returns
topk_ids: Shaped
(batch_size, 1)
. These are the sampled word indices in the output vocab.topk_scores: Shaped
(batch_size, 1)
. These are essentially(logits / sampling_temp)[topk_ids]
.
- Return type
(LongTensor, FloatTensor)
-
class
mammoth.translate.
GreedySearch
(pad, bos, eos, unk, batch_size, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, keep_topp, beam_size, ban_unk_token)[source]¶ Bases:
mammoth.translate.decode_strategy.DecodeStrategy
Select next tokens randomly from the top k possible next tokens.
The
scores
attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event thatmax_length
is reached)- Parameters
pad (int) – See base.
bos (int) – See base.
eos (int) – See base.
unk (int) – See base.
batch_size (int) – See base.
global_scorer (mammoth.translate.GNMTGlobalScorer) – Scorer instance.
min_length (int) – See base.
max_length (int) – See base.
ban_unk_token (Boolean) – See base.
block_ngram_repeat (int) – See base.
exclusion_tokens (set[int]) – See base.
return_attention (bool) – See base.
max_length – See base.
sampling_temp (float) – See
sample_with_temperature()
.keep_topk (int) – See
sample_with_temperature()
.keep_topp (float) – See
sample_with_temperature()
.beam_size (int) – Number of beams to use.
-
advance
(log_probs, attn)[source]¶ Select next tokens randomly from the top k possible next tokens.
- Parameters
log_probs (FloatTensor) – Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.)attn (FloatTensor) – Shaped
(1, B, inp_seq_len)
.
Scoring¶
-
class
mammoth.translate.penalties.
PenaltyBuilder
(cov_pen, length_pen)[source]¶ Bases:
object
Returns the Length and Coverage Penalty function for Beam Search.
- Parameters
length_pen (str) – option name of length pen
cov_pen (str) – option name of cov pen
- Variables
has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.
length_penalty (callable[[int, float], float]) – Calculates the length penalty.
-
coverage_wu
(cov, beta=0.0)[source]¶ GNMT coverage re-ranking score.
See “Google’s Neural Machine Translation System” [WSC+16].
cov
is expected to be sized(*, seq_len)
, where*
is probablybatch_size x beam_size
but could be several dimensions like(batch_size, beam_size)
. Ifcov
is attention, then theseq_len
axis probably sums to (almost) 1.
-
class
mammoth.translate.
GNMTGlobalScorer
(alpha, beta, length_penalty, coverage_penalty)[source]¶ Bases:
object
NMT re-ranking.
- Parameters
alpha (float) – Length parameter.
beta (float) – Coverage parameter.
length_penalty (str) – Length penalty strategy.
coverage_penalty (str) – Coverage penalty strategy.
- Variables
alpha (float) – See above.
beta (float) – See above.
length_penalty (callable) – See
penalties.PenaltyBuilder
.coverage_penalty (callable) – See
penalties.PenaltyBuilder
.has_cov_pen (bool) – See
penalties.PenaltyBuilder
.has_len_pen (bool) – See
penalties.PenaltyBuilder
.