OpusFilter
3.2
Get started
Installation
Basic usage
Automatic configuration generation
Command line tools for analysis
Available functions
Downloading and selecting data
Preprocessing text
Filtering and scoring
Using score files
Training language and alignment models
Training and using classifiers
Available filters
Length filters
Script and language identification filters
Special character and similarity filters
Language model filters
Alignment model filters
Sentence embedding filters
Custom filters
Available preprocessors
Tokenizer
Detokenizer
WhitespaceNormalizer
RegExpSub
MonolingualSentenceSplitter
BPESegmentation
MorfessorSegmentation
Custom preprocessors
Other information
Citing and references
Contributing
Changelog
OpusFilter
Index
Index