Data Loaders¶
Dataset¶
Data loading¶
-
class
mammoth.inputters.dataloader.
DynamicDatasetIter
(task_queue_manager, opts, corpora_info, transforms_cls, vocabs_dict, is_train, batch_type, batch_size, batch_size_multiple, data_type='text', pool_size=2048, n_buckets=1024, skip_empty_level='warning')[source]¶ Bases:
object
Yield batch from (multiple) plain text corpus.
- Parameters
corpora (dict[str, ParallelCorpus]) – collections of corpora to iterate;
corpora_info (dict[str, dict]) – corpora infos correspond to corpora;
transforms (dict[str, Transform]) – transforms may be used by corpora;
fields (dict[str, Field]) – fields dict for convert corpora into Tensor;
is_train (bool) – True when generate data for training;
batch_type (str) – batching type to count on, choices=[tokens, sents];
batch_size (int) – numbers of examples in a batch;
batch_size_multiple (int) – make batch size multiply of this;
data_type (str) – input data type, currently only text;
pool_size (int) – accum this number of examples in a dynamic dataset;
skip_empty_level (str) – security level when encouter empty line;
stride (int) – iterate data files with this stride;
offset (int) – iterate data files with this offset.
- Variables
dataset_adapter (DatasetAdapter) – organize raw corpus to tensor adapt;
mixer (MixingStrategy) – the strategy to iterate corpora.