AI Safety LLM Trainer (Korean C1+, English C1+)

Remote, part-time contractor role evaluating AI-generated Korean and English text for safety, policy alignment, and factual reasoning—20+ hrs/week, paid $28–$38/hr (typical $32/hr). Work includes RLHF-style reviews, red-teaming, and cross-lingual moderation.

Generative Ai Rlhf

100% Remote Hourly · $28–$38/hr

$28–$38/hr

Compensation

Worldwide

Eligibility

Intermediate

Experience

Apr 3, 2026

Posted

Open worldwide

About OpenTrain

OpenTrain is the #1 platform for finding and building careers in AI training and data labeling. Contributors use the platform to discover projects, build a profile, and apply quickly.

This role is posted through OpenTrain’s platform and connects you to paid, remote AI training work where your evaluations directly improve how major AI systems behave.

About AI training and trust & safety work

AI training (data labeling / annotation / human feedback) is the human side of building modern AI: people review, correct, and judge model outputs so models learn to be safer and more accurate.

Trust & safety work focuses on policy alignment, harm reduction, and evaluating edge cases—an opportunity to shape behavior in state-of-the-art language models.

The role

You will work as an AI Safety Data Reviewer evaluating and labeling AI-generated text for safety, policy compliance, and reasoning quality. Tasks span Korean and English content and feed into model safety improvements used by major AI companies.

This is a remote, hourly, contractor position. Work is part-time (20+ hours/week) and may expose you to explicit or otherwise sensitive content in a secure remote environment.

Employment type: Contractor, Part-time
Time commitment: 20+ hours per week
Data type: Text — tasks include evaluation, question answering, text generation review, and RLHF-style ratings
Labeling software: Other / proprietary annotation tools

What you'll do

Your day-to-day tasks will center on reviewing model outputs and applying safety policies consistently across Korean and English content. Clear, reproducible rationales for each decision are required.

Rate multiple model outputs for safety, factuality, and reasoning quality
Assess policy alignment and supervise moderation decisions
Identify methodological or conceptual errors and flag edge cases for red-teaming
Provide written feedback and mitigation recommendations to improve model behavior
Handle cross-lingual nuance, slang, coded language, and cultural context when evaluating content

Requirements

Candidates must meet all of the following mandatory requirements; we cannot consider applicants who do not meet them.

Near-native or native Korean proficiency (reading and writing)
Minimum C1-level English proficiency (reading and writing)
Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience
Senior-level experience in Trust & Safety, content moderation, policy operations, risk, compliance, investigations, or related safety functions
Proven LLM red-teaming or adversarial testing experience, including identifying edge cases and recommending mitigations
Strong knowledge of safety domains: hate & harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activity, malicious code, and misinformation
Experience applying policy standards consistently across Korean and English content, including cultural nuance and slang
Strong analytical writing skills with clear, reproducible rationales for moderation decisions
Comfortable handling explicit, toxic, violent, sexual, or psychologically disturbing content in a secure remote environment

Preferred but not required

These skills make you a stronger candidate but are not strict requirements.

Localization or translation experience, especially preserving meaning, severity, and intent across languages
Prior experience with RLHF workflows, instruction-following evaluation, or similar model-alignment projects

Compensation, schedule, and logistics

Compensation is hourly, paid in USD: $28–$38 per hour (typical rate listed as $32/hr). This is contractor pay and does not include employee benefits.

You will be expected to work at least 20 hours per week. The role is remote and open to applicants worldwide. You will use the project’s provided annotation tools and follow secure handling procedures for sensitive content.

Hourly pay: $28–$38 USD (typical $32/hr)
Minimum weekly hours: 20+
World-wide applicants welcome; contractor engagement

How to apply and next steps

Apply now through the OpenTrain platform. Your application should clearly state your Korean and English proficiency levels and summarize relevant Trust & Safety or red-teaming experience.

If selected, expect a skills assessment and onboarding that includes project-specific policy training, test tasks, and secure-environment instructions.

Prepare examples of moderation or red-team reports if available
Be ready for a practical evaluation in both Korean and English