Cogito: Speech Data Annotation for Machine Learning

Overview
Worked on annotation and quality assurance for speech and language datasets used in machine learning models, focusing on improving model performance and annotation consistency.
My Role
- Annotated speech data for emotion, engagement, and speech patterns
- Designed and refined annotation approaches across projects
- Conducted prompt engineering to improve model outputs
- Tested pre-trained language models and suggested calibration improvements
Data & Workflows
- Processed audio and text data for ML training pipelines
- Built and validated annotated datasets for internal and external clients
- Handled dynamic annotation requests across teams
- Contributed to workflow improvements and QA processes
Impact
- Improved annotation consistency across datasets
- Contributed to higher-quality training data for ML models
- Provided insights that informed model behavior and UX decisions

Authors
Lee-Ann Vidal Covas
(she/her)
Language Scientist (PhD, Boston University) with expertise in sociolinguistic research, dataset curation, and applied data science.