AI Safety LLM Evaluator (French/English, Red Team)

Join as a remote contractor to score, annotate, and red-team LLM outputs in English and French; 20+ hrs/week, $24–$36/hr (typical $30/hr). Use your Trust & Safety and hands-on red-teaming experience to help shape model safety standards.

Generative Ai Rlhf

100% Remote Hourly · $24–$36/hr

$24–$36/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Apr 3, 2026

Posted

Open worldwide

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Creating an OpenTrain account is free — discover projects, build a profile, and apply in minutes.

Working through OpenTrain connects you to meaningful, remote work that directly shapes how state-of-the-art AI systems behave. This role is offered as a contractor, part-time opportunity on the OpenTrain platform.

Why AI Training Matters

AI training (data labeling and human feedback work) is the human side of building artificial intelligence. Contributors annotate, evaluate, and red-team model outputs so models behave safer and more reliably.

This work is highly flexible, accessible, and cutting-edge — suitable for those who want remote, part-time work that directly improves real-world AI systems.

The Role

We are hiring an AI Safety LLM Evaluator to review model-generated responses and create safety-focused evaluation content in both French and English. This is a fully remote contractor role, expected to run at 20+ hours per week.

You will score, annotate, and evaluate outputs with an emphasis on clear reasoning, policy alignment, and safety, and you will curate red-team training cases across nuanced and potentially explicit areas to reduce toxic or unsafe outputs.

Employment type: Contractor, Part-time
Time commitment: 20+ hours/week
Data type: Text; Label types: Evaluation rating, RLHF, Red teaming
Labeling software: Other (platform/tool specified by project)
Worldwide applicants accepted

What You’ll Do

You will evaluate LLM responses against written safety policies, assign scores, provide annotations, and document reasoning for edge and ambiguous cases. You will design and curate adversarial red-team prompts and cases to probe safety boundaries.

Work includes reviewing and classifying content across safety-sensitive categories, documenting adversarial patterns, and helping establish robust labeling and safety standards used to improve leading AI models.

Score and annotate model outputs for safety, accuracy, and policy alignment
Curate and document red-team test cases, including adversarial prompts
Identify and describe failure modes and adversarial strategies
Explain labeling decisions clearly and consistently, including in ambiguous scenarios
Collaborate with project leads to refine labeling guidelines and standards

Requirements

Candidates must meet the following hard requirements; applications that do not meet them will not be considered.

Near-native or native French proficiency in reading and writing
Minimum C1 English proficiency in reading and writing
Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience
Proven experience in Trust & Safety, content moderation, policy enforcement, risk operations, investigations, or safety evaluation
Required hands-on LLM red teaming experience, including probing safety boundaries and documenting adversarial patterns
Strong knowledge of safety categories: hate & harassment, sexual content, suicide & self-harm, violence, bias, illegal goods/services, malicious activities, malicious code, and misinformation
Ability to apply written safety policies consistently and explain decisions clearly in ambiguous cases
Comfortable reviewing explicit, toxic, violent, sexual, or psychologically disturbing content as part of daily work
Practical experience using tools such as Perplexity, Gemini, ChatGPT, or similar AI systems

Preferred Qualifications

The following are advantages but not strict requirements. Preference may be given to applicants who bring these experiences.

Prior experience with AI data training, annotation, or evaluation workflows
Experience producing reproducible red-team reports or risk assessments
Familiarity with RLHF workflows and evaluation best practices

Compensation & Logistics

Pay is hourly in USD. The project lists a typical rate of $30/hr with an allowable range of $24–$36/hr depending on experience and demonstrated red-team skills.

This is a contract, part-time position. You will work remotely and submit evaluations and annotations via the project's chosen tooling. The role requires regular, consistent availability to meet review and iteration cycles.

Who Should Apply

Apply if you have demonstrated Trust & Safety or content-moderation experience and hands-on LLM red-teaming skills, are fluent in French and advanced in English, and are comfortable working with sensitive and explicit content.

This role is a strong fit for specialists in safety, policy, or investigations who want flexible, impactful remote work improving how major LLMs behave in real-world contexts.

How It Works / How to Apply

Create a free OpenTrain profile, complete any requested skill checks, and submit your application for this project. Successful applicants will be invited to a short red-team evaluation or interview to demonstrate practical skills.

Selected contractors will receive project-specific onboarding, labeling guidelines, and access to the tooling required to perform evaluations and red-team tasks.

Application step: build profile on OpenTrain and apply to this posting
Selection may include a short practical red-team test or interview
Onboarding provides guidelines, examples, and access to the project’s tools