Before running any cutting-edge Machine Learning algorithms, we need labelled data – lots of it – and preferably in high quality. This data collection process is often very tedious and time consuming. It is usually an iterative process, between data labelling and model training, that needs to be repeated multiple times.
That’s what techniques such as active learning or weak supervision are tackling. However it is not clear when active learning works, on which data, and which techniques perform the best. To fully investigate this, we are building an extensive benchmarking of active learning techniques on a large collection of datasets.