Preprocessing
Table of contents
- Tokenization Exercise
- Characteristic of Tokenization Style
- Pipeline
- Subword Segmenations
- Subword Segmentation Exercise
- Detokenizations
- Detokenization Exercise
- Parallel Corpus Aligning
- Parallel Corpus Aligning Exercise
- Tips on Preprocessing
- Mini-batchify
- TorchText
- Wrap-up
- Crawling
- Cleaning
- Regular Expressions
- Cleaning with RegEx Exercise
- Labeling
- Tokenizations