Raw data can have errors/missing values/ high dimensionality
Pre-processing is the most time-consuming step and requires a lot of experimentation and fine-tuning