Cogito: Speech Data Annotation for Machine Learning

Overview

Worked on annotation and quality assurance for speech and language datasets used in machine learning models, focusing on improving model performance and annotation consistency.

My Role

Annotated speech data for emotion, engagement, and speech patterns
Designed and refined annotation approaches across projects
Conducted prompt engineering to improve model outputs
Tested pre-trained language models and suggested calibration improvements

Data & Workflows

Processed audio and text data for ML training pipelines
Built and validated annotated datasets for internal and external clients
Handled dynamic annotation requests across teams
Contributed to workflow improvements and QA processes

Impact

Improved annotation consistency across datasets
Contributed to higher-quality training data for ML models
Provided insights that informed model behavior and UX decisions

Machine Learning NLP Data Annotation

Authors

Lee-Ann Vidal Covas (she/her)

Language Scientist (PhD, Boston University) with expertise in sociolinguistic research, dataset curation, and applied data science.

Spanish in Boston: Sociolinguistic Dataset & Analysis →