All AI projects start with data — no matter how simple your idea is, you cannot develop machine learning algorithms without examples to train them on. And after the first prototype, when the chase for metrics improvement begins, you find out that the amount and quality of your data matters. That is when a good data labeling pipeline will probably help you a lot.
In this talk we give an introduction to building data labeling pipelines and present real life use cases from different areas such as search relevance, content moderation, voice assistants and self-driving cars. We will explain how to fight concept drift in machine learning, how to build complex products using human-in-loop model and how to remove people management from the data labeling process.
Lecturer: Magdalena Konkiewicz, a Data Evangelist at Toloka which is a global data labeling company servicing the needs of approximately 2,000 large and small businesses worldwide.
Toloka helps its customers generate machine learning data at scale by harnessing the wisdom of the crowd from around the world.
Toloka is used by organizations in e-commerce, R&D, banking, autonomous vehicles, web services, and more.
Toloka relies on a geographically diverse crowd of several million registered users – 200,000 of which are active monthly, on average. The company is incorporated in Switzerland and has its global headquarters in the USA. Magdalena prior to joining Toloka has worked in many different sectors in technical roles such as NLP Engineer, Developer, and Data Scientist. She has also been involved in teaching and mentoring Data Scientists. Additionally, she contributes to one of the biggest Medium publications Towards Data Science writing about Machine Learning tools and best practices.