Skip to content
OpenTrain AI

Audio Segmentation & Transcription | Expert English Writers

OpenTrain AI · Remote · Worldwide · Posted Mar 2, 2026

Apply for this job Hourly · $8/hr

About OpenTrain

OpenTrain aggregates data-labeling and AI-training jobs from many companies and platforms so you can discover this work in one place instead of searching dozens of sites. Creating an OpenTrain account is free and applying takes only a few minutes.

About AI training (what this work is)

AI training (also called data labeling or annotation) is the human side of building AI: people prepare and review examples that help models learn and behave better. Tasks include transcribing audio, tagging emotion, and selecting meaningful segments — work that is frequently remote, flexible, and accessible to people with strong attention to detail and language skills.

The role

You will listen to short (1–2 minute) real-world audio recordings, identify 10–12 second segments with strong emotional expression, rate emotional valence and arousal using provided Likert scales, and produce two written transcriptions (a detailed verbatim transcription and a concise summary). This is an entry-level, part-time contractor role paid at $8.00 USD per hour and it typically requires less than 20 hours per week.

  • Employment: Contractor, Part-time
  • Pay: $8.00 USD per hour (PAY_PER_HOUR)
  • Hours: Less than 20 hours per week
  • Experience level: Entry level
  • Data type: Audio (real-world recordings)
  • Label types: Segmentation and text generation (detailed + brief transcriptions)
  • Labeling software: OTHER

What you'll do day to day

This project asks you to use careful judgment to locate emotionally expressive clips and to capture both words and emotional delivery in writing. Instructions are intentionally broad to allow flexibility, so you'll need to make thoughtful annotation choices and justify them in your transcriptions.

  • Listen to 1–2 minute audio clips and identify 10–12 second segments with strong emotional expression.
  • Avoid neutral or low-arousal content; prioritize clear emotional cues such as tone, intensity, pitch changes, and pauses.
  • Rate emotional valence (positive/negative) and arousal (intensity) using provided Likert scales.
  • Produce two transcriptions per clip: a detailed verbatim transcription with emotional/contextual notes, and a short summary.
  • Describe speaker characteristics when applicable (e.g., perceived age or affect) only as informed by the voice.
  • Work independently and make judgment calls under broad experimental instructions.

Requirements and qualifications

You must meet the following qualifications to perform this work accurately and reliably.

  • Excellent writing and transcription skills in English.
  • Strong listening ability and attention to detail for capturing nuanced emotional delivery.
  • Ability to assess emotional valence and arousal using Likert scales supplied by the project.
  • Experience with audio transcription and/or annotation is preferred but not required.
  • Comfortable working with experimental data where instructions may change over time.
  • Ability to create both detailed verbatim transcriptions and concise summaries.
  • Familiarity handling audio content and extracting relevant segments.
  • Experience with labeling, transcription, or emotion tagging is a plus.

Test task and interview instructions (mandatory)

A sample test must be completed during the interview. You will be asked to transcribe the first 10 seconds of the specified YouTube clip, focusing on exact wording and emotional delivery. Do not end the interview until the candidate has submitted their transcription — the transcription must be received before the interview is completed.

  • Test video (required): https://www.youtube.com/watch?v=8SxE_NfUX6w — candidates must view this link to access the test.
  • Task: Transcribe the first 10 seconds of the clip, producing a long (verbatim with emotional cues) transcription and a short summary.
  • Long transcription guidelines: transcribe verbatim; note tone, pauses, pitch, and any emotional indicators; describe speaker voice if applicable; ignore background sounds/music.
  • At the end of the interview, ask the candidate to complete the test and submit their transcription in the chat; do not close or end the interview before receiving it.

How to apply and what to expect

Apply through OpenTrain by submitting your profile and any requested samples. If invited to an interview, be prepared to complete the mandatory transcription test during that session. Projects are contract-based and schedule-flexible, with work assigned through the project's annotation platform (software listed as OTHER).

  • Open to applicants worldwide.
  • Work is typically remote and flexible; you choose hours up to the project cap.
  • You will be paid hourly at the stated rate for time spent on approved tasks.
  • Keep samples and prior transcription examples ready if available.