LLM Data Labelling and Annotation

OpenTrain AI · Remote · Worldwide · Posted Mar 5, 2026

About OpenTrain

OpenTrain aggregates data-labeling and AI-training opportunities from many companies and platforms into a single job board so contributors can find work without hunting dozens of sites.

Creating an OpenTrain account is free and applying takes only a few minutes—this listing is posted through OpenTrain to help you discover and apply to this LLM labeling project quickly.

About AI training (data labeling) work

AI training is the human side of building intelligent systems: people prepare, review, and rate examples so models learn to respond accurately and safely. Typical tasks include classifying text, tagging entities, and rating model outputs.

This role focuses on text-based English documents and LLM responses. The work is fully remote, flexible, and often accessible without prior specialist credentials, though this project requires expert-level annotation experience.

The role

You will help create and verify a dataset of approximately 500–1,000 rows of LLM response data. Each row includes context, a query, and an auto-generated label that must be checked and corrected as needed.

Work is performed as a contractor on a part-time schedule (20+ hours per week). Tasks are completed in AWS SageMaker. Pay is USD $5 per hour.

Data type: plain text (English).
Label types required: classification, entity (NER) + classification, and evaluation/rating tasks.
Platform: AWS SageMaker.
Employment: Contractor, Part-time; worldwide applicants accepted.

What you'll do

Annotate and verify LLM responses according to project guidelines, using AWS SageMaker to apply labels and record decisions.

Participate in the required multi-stage human review process to ensure label accuracy and consistency.

Perform classification tasks on model responses and context.
Tag named entities (NER) and apply relevant classification labels.
Rate model outputs using provided evaluation rubrics.
Compare and confirm auto-generated labels; correct or refine them when needed.
Follow annotation guidelines precisely and document ambiguous cases for escalation.

Review and quality-assurance workflow

Every row will be reviewed in a two-step human process plus QA: two independent agents examine the same row first, then a third agent performs quality assurance.

For items containing technical computer science content or financial content, all agents must do their best-effort labeling, but at least one of the three reviewers assigned to each such row must be an expert capable of assessing technical or financial material accurately.

Step 1: Two agents independently label the same row and record their decisions.
Step 2: A third agent conducts QA on the row and resolves conflicts or flags for expert review.
Technical/financial rows require at least one expert among the three reviewers; non-expert agents should do best-effort labeling and escalate when unsure.

Requirements

This role is labeled 'Expert'—you should have strong previous annotation, review, or related AI-training experience and be comfortable making judgment calls on text content.

Familiarity with English is mandatory. Experience working with technical computer science data and/or financial documents is a plus and expected for expert reviewers on those items.

Availability: 20+ hours per week.
Experience level: Expert (previous annotation or QA experience preferred).
Language: Fluent English is required.
Domain experience: Technical CS and/or financial document experience is a plus and required for expert reviewer assignments.
Tooling: Tasks are completed in AWS SageMaker; prior experience with SageMaker is helpful but not strictly required.

Compensation, schedule, and how to apply

Compensation is USD $5 per hour, paid per hour. This is a part-time contract role with flexible scheduling, and contributors may work remotely from anywhere.

To apply, create a free OpenTrain account and submit your application through the posting—applications typically take only a few minutes to complete.

Pay type: Pay per hour at $5 USD/hour.
Work model: Contractor, Part-time, Remote, Worldwide.
Dataset size: ~500–1,000 rows of LLM response data to label and QA.
Apply via OpenTrain (creating an account is free).