AI Safety LLM Trainer (Korean C1+, English C1+)
Remote, part-time contractor role evaluating AI-generated Korean and English text for safety, policy alignment, and factual reasoning—20+ hrs/week, paid $28–$38/hr (typical $32/hr). Work includes RLHF-style reviews, red-teaming, and cross-lingual moderation.
Generative Ai Rlhf
$28–$38/hr
Compensation
Worldwide
Eligibility
Intermediate
Experience
Apr 3, 2026
Posted
Open worldwide
About OpenTrain
OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Contributors use the platform to discover projects, build a profile, and apply quickly.
This role is posted through OpenTrain’s platform and connects you to paid, remote AI training work where your evaluations directly improve how major AI systems behave.
About AI training and trust & safety work
AI training (data labeling / annotation / human feedback) is the human side of building modern AI: people review, correct, and judge model outputs so models learn to be safer and more accurate.
Trust & safety work focuses on policy alignment, harm reduction, and evaluating edge cases—an opportunity to shape behavior in state-of-the-art language models.
The role
You will work as an AI Safety Data Reviewer evaluating and labeling AI-generated text for safety, policy compliance, and reasoning quality. Tasks span Korean and English content and feed into model safety improvements used by major AI companies.
This is a remote, hourly, contractor position. Work is part-time (20+ hours/week) and may expose you to explicit or otherwise sensitive content in a secure remote environment.
- Employment type: Contractor, Part-time
- Time commitment: 20+ hours per week
- Data type: Text — tasks include evaluation, question answering, text generation review, and RLHF-style ratings
- Labeling software: Other / proprietary annotation tools
What you'll do
Your day-to-day tasks will center on reviewing model outputs and applying safety policies consistently across Korean and English content. Clear, reproducible rationales for each decision are required.
- Rate multiple model outputs for safety, factuality, and reasoning quality
- Assess policy alignment and supervise moderation decisions
- Identify methodological or conceptual errors and flag edge cases for red-teaming
- Provide written feedback and mitigation recommendations to improve model behavior
- Handle cross-lingual nuance, slang, coded language, and cultural context when evaluating content
Requirements
Candidates must meet all of the following mandatory requirements; we cannot consider applicants who do not meet them.
- Near-native or native Korean proficiency (reading and writing)
- Minimum C1-level English proficiency (reading and writing)
- Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience
- Senior-level experience in Trust & Safety, content moderation, policy operations, risk, compliance, investigations, or related safety functions
- Proven LLM red-teaming or adversarial testing experience, including identifying edge cases and recommending mitigations
- Strong knowledge of safety domains: hate & harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activity, malicious code, and misinformation
- Experience applying policy standards consistently across Korean and English content, including cultural nuance and slang
- Strong analytical writing skills with clear, reproducible rationales for moderation decisions
- Comfortable handling explicit, toxic, violent, sexual, or psychologically disturbing content in a secure remote environment
Preferred but not required
These skills make you a stronger candidate but are not strict requirements.
- Localization or translation experience, especially preserving meaning, severity, and intent across languages
- Prior experience with RLHF workflows, instruction-following evaluation, or similar model-alignment projects
Compensation, schedule, and logistics
Compensation is hourly, paid in USD: $28–$38 per hour (typical rate listed as $32/hr). This is contractor pay and does not include employee benefits.
You will be expected to work at least 20 hours per week. The role is remote and open to applicants worldwide. You will use the project’s provided annotation tools and follow secure handling procedures for sensitive content.
- Hourly pay: $28–$38 USD (typical $32/hr)
- Minimum weekly hours: 20+
- World-wide applicants welcome; contractor engagement
How to apply and next steps
Apply now through the OpenTrain platform. Your application should clearly state your Korean and English proficiency levels and summarize relevant Trust & Safety or red-teaming experience.
If selected, expect a skills assessment and onboarding that includes project-specific policy training, test tasks, and secure-environment instructions.
- Prepare examples of moderation or red-team reports if available
- Be ready for a practical evaluation in both Korean and English