Idil Ismiguzel
Nov 29, 2022

Hi Etienne, thanks! It is hard to select one technique over another since it depends on data distribution, contamination of local and/or global outliers, and overall data size. Since this is a toy dataset, having only around 200 instances, I would probably start with removing extreme outliers with IQR only, and compare the performance of the classifier with data before and after removal. Then if I want to improve the performance, I would test other techniques such as Lof or isolation forest (on top of IQR), with different parameters and compare if they increase model performance. I would iterate over and decide which technique(s) to use based on the model performance improvement.

Idil Ismiguzel
Idil Ismiguzel

Written by Idil Ismiguzel

Data Scientist | Writing articles on Data Science & Machine Learning | MSc, MBA | https://de.linkedin.com/in/idilismiguzel

Responses (1)