Urgent English Audio Recording — Short Contract

Native US/Canadian English speakers needed to record 11 trigger words (12 takes each = 132 clips) with quiet and reflective takes plus four emotional states; complete within 2–4 days. Part-time contractor work at $15/hr, under 20 hrs/week.

Audio Speech

100% Remote Hourly · $15/hr

$15/hr

Compensation

Worldwide

Eligibility

Entry

Experience

Aug 23, 2024

Posted

Open worldwide

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. We connect contributors with short-term and ongoing projects that directly shape how modern AI systems behave.

This listing is hosted through OpenTrain’s marketplace of audio and speech projects — a fast-growing area where people with good microphones and clear speech can earn flexible, remote pay by recording or annotating audio.

About AI training and this work

AI training (also called data labeling or annotation) is the human work behind how voice-enabled and conversational AI understand and respond. Recording high-quality voice samples in specific tones and environments helps models learn natural speech patterns and emotional cues.

These roles are frequently remote, flexible, and accessible to entry-level contributors who can follow clear instructions and deliver consistent, clean audio files.

The role — short, urgent audio recording

We’re seeking native English speakers from the USA and Canada to complete a time-sensitive audio recording task within the next 2–4 days. This is entry-level, contract, part-time work (under 20 hrs/week) at $15/hour.

You will record 11 specified trigger words, with each word recorded 12 times under defined conditions for a total of 132 recordings.

Project length: complete all recordings within 2–4 days (time-sensitive).
Employment type: Contractor, Part-time; expected workload under 20 hours/week.
Pay: $15 per hour (PAY_PER_HOUR).

What you’ll record and how

Each of the 11 trigger words must be recorded 12 times for a total of 132 clips. The 12 takes per word are split as follows: four takes in a quiet indoor environment with variations in speed and tone, four takes in a reflective environment (for example a bathroom or hallway) with variations in speed and tone, and four emotional takes recorded in a quiet setting (angry, scared/panicked, worried, and excited).

Each clip must include one second of silence before and after the spoken word.
No background noise is allowed — recordings must be clear and clean.
Vary speed, tone, and emotional delivery exactly as instructed for each set of takes.

Requirements

You must meet the following to apply. These come from the project specifications and are required for acceptance and payment.

Native English speaker from the USA or Canada (must be a native speaker).
Reliable equipment: good-quality microphone or headphones with a built-in mic.
Access to both a quiet indoor space and a reflective space (bathroom, hallway, etc.).
Attention to detail: follow timing instructions (1s silence before/after) and file naming/format instructions.
Availability to finish all recordings within the 2–4 day window.
Basic technical skills to record audio and save files in the specified format.

Who should apply

This project is a good fit if you are an entry-level contributor who can follow precise instructions, have reliable recording equipment, and are available immediately. Apply only if you can meet the 2–4 day deadline and deliver 132 clean clips.

Ideal for people seeking short-term, remote, flexible work.
Not suitable if you cannot guarantee a quiet environment or the quick turnaround.

How it works — technical and submission notes

If selected, you’ll receive the trigger-word list and exact recording instructions including required file naming and the audio format to use. Follow the format and naming instructions precisely to avoid rejection.

Submit recordings through the project’s upload process within the deadline. Incomplete or noisy files will be rejected, so check your audio quality before submission.

You’ll be given a file format to use — save files in that specified format.
Double-check each clip for the required 1 second of silence before and after the word.
Expect brief quality checks; follow feedback quickly if any files need re-recording.

Apply

To apply, confirm you are a native English speaker from the USA or Canada, that you have suitable equipment and the availability to complete the work in 2–4 days. Only apply if you can commit to the full task and delivery window.

Include a short note confirming availability and a description of your recording setup when you apply.
Selected applicants will receive the trigger-word list and detailed instructions immediately.