Cogito: Speech Data Annotation for Machine Learning

tech_projects

Overview

Worked on annotation and quality assurance for speech and language datasets used in machine learning models, focusing on improving model performance and annotation consistency.

My Role

  • Annotated speech data for emotion, engagement, and speech patterns
  • Designed and refined annotation approaches across projects
  • Conducted prompt engineering to improve model outputs
  • Tested pre-trained language models and suggested calibration improvements

Data & Workflows

  • Processed audio and text data for ML training pipelines
  • Built and validated annotated datasets for internal and external clients
  • Handled dynamic annotation requests across teams
  • Contributed to workflow improvements and QA processes

Impact

  • Improved annotation consistency across datasets
  • Contributed to higher-quality training data for ML models
  • Provided insights that informed model behavior and UX decisions
Lee-Ann Vidal Covas
Authors
Language Scientist (PhD, Boston University) with expertise in sociolinguistic research, dataset curation, and applied data science.