AI Safety LLM Evaluator (French/English, Red Team)
Join as a remote contractor to score, annotate, and red-team LLM outputs in English and French; 20+ hrs/week, $24–$36/hr (typical $30/hr). Use your Trust & Safety and hands-on red-teaming experience to help shape model safety standards.
Generative Ai Rlhf
$24–$36/hr
Compensation
Worldwide
Eligibility
Intermediate
Experience
Apr 3, 2026
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Creating an OpenTrain account is free — discover projects, build a profile, and apply in minutes.
Working through OpenTrain connects you to meaningful, remote work that directly shapes how state-of-the-art AI systems behave. This role is offered as a contractor, part-time opportunity on the OpenTrain platform.
Why AI Training Matters
AI training (data labeling and human feedback work) is the human side of building artificial intelligence. Contributors annotate, evaluate, and red-team model outputs so models behave safer and more reliably.
This work is highly flexible, accessible, and cutting-edge — suitable for those who want remote, part-time work that directly improves real-world AI systems.
The Role
We are hiring an AI Safety LLM Evaluator to review model-generated responses and create safety-focused evaluation content in both French and English. This is a fully remote contractor role, expected to run at 20+ hours per week.
You will score, annotate, and evaluate outputs with an emphasis on clear reasoning, policy alignment, and safety, and you will curate red-team training cases across nuanced and potentially explicit areas to reduce toxic or unsafe outputs.
- Employment type: Contractor, Part-time
- Time commitment: 20+ hours/week
- Data type: Text; Label types: Evaluation rating, RLHF, Red teaming
- Labeling software: Other (platform/tool specified by project)
- Worldwide applicants accepted
What You’ll Do
You will evaluate LLM responses against written safety policies, assign scores, provide annotations, and document reasoning for edge and ambiguous cases. You will design and curate adversarial red-team prompts and cases to probe safety boundaries.
Work includes reviewing and classifying content across safety-sensitive categories, documenting adversarial patterns, and helping establish robust labeling and safety standards used to improve leading AI models.
- Score and annotate model outputs for safety, accuracy, and policy alignment
- Curate and document red-team test cases, including adversarial prompts
- Identify and describe failure modes and adversarial strategies
- Explain labeling decisions clearly and consistently, including in ambiguous scenarios
- Collaborate with project leads to refine labeling guidelines and standards
Requirements
Candidates must meet the following hard requirements; applications that do not meet them will not be considered.
- Near-native or native French proficiency in reading and writing
- Minimum C1 English proficiency in reading and writing
- Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience
- Proven experience in Trust & Safety, content moderation, policy enforcement, risk operations, investigations, or safety evaluation
- Required hands-on LLM red teaming experience, including probing safety boundaries and documenting adversarial patterns
- Strong knowledge of safety categories: hate & harassment, sexual content, suicide & self-harm, violence, bias, illegal goods/services, malicious activities, malicious code, and misinformation
- Ability to apply written safety policies consistently and explain decisions clearly in ambiguous cases
- Comfortable reviewing explicit, toxic, violent, sexual, or psychologically disturbing content as part of daily work
- Practical experience using tools such as Perplexity, Gemini, ChatGPT, or similar AI systems
Preferred Qualifications
The following are advantages but not strict requirements. Preference may be given to applicants who bring these experiences.
- Prior experience with AI data training, annotation, or evaluation workflows
- Experience producing reproducible red-team reports or risk assessments
- Familiarity with RLHF workflows and evaluation best practices
Compensation & Logistics
Pay is hourly in USD. The project lists a typical rate of $30/hr with an allowable range of $24–$36/hr depending on experience and demonstrated red-team skills.
This is a contract, part-time position. You will work remotely and submit evaluations and annotations via the project's chosen tooling. The role requires regular, consistent availability to meet review and iteration cycles.
Who Should Apply
Apply if you have demonstrated Trust & Safety or content-moderation experience and hands-on LLM red-teaming skills, are fluent in French and advanced in English, and are comfortable working with sensitive and explicit content.
This role is a strong fit for specialists in safety, policy, or investigations who want flexible, impactful remote work improving how major LLMs behave in real-world contexts.
How It Works / How to Apply
Create a free OpenTrain profile, complete any requested skill checks, and submit your application for this project. Successful applicants will be invited to a short red-team evaluation or interview to demonstrate practical skills.
Selected contractors will receive project-specific onboarding, labeling guidelines, and access to the tooling required to perform evaluations and red-team tasks.
- Application step: build profile on OpenTrain and apply to this posting
- Selection may include a short practical red-team test or interview
- Onboarding provides guidelines, examples, and access to the project’s tools