Custom preprocessors

Similarly to filters, You can import your own preprocessors by defining the module key in the filter configuration entries.

The custom preprocessors should inherit the abstract base class PreprocessorABC from the opusfilter package, and implement the abstract method process. The process method is a generator that takes an iterator over segments and yields preprocessed (modified) segments of text. It also has an additional argument, f_idx, which is the index of the current file being processed by the preprocess step. This argument enables preprocessing parallel files in the same step even if the preprocessing options (such as language code for a tokenizer) varies.