AI Safety Data Reviewer (Japanese/English)
OpenTrain AI · Remote · Worldwide · Posted Apr 3, 2026
About OpenTrain
OpenTrain is a central job hub for AI-training and data-labeling roles. We aggregate openings from many AI companies and labeling platforms so contributors can discover and apply to relevant work in one place.
Creating an OpenTrain account is free and applying takes only a few minutes.
- Find remote AI-training roles across language, safety, annotation, and evaluation work
- Quick applications and a single place to track contract opportunities
About AI training and safety work
AI training (data labeling / annotation / human feedback) is the human side of teaching models how to behave. Reviewers like you evaluate outputs, apply policy, and shape how models respond to real-world prompts.
Safety-focused work prevents models from producing harmful, adversarial, or misleading outputs by identifying edge cases, inconsistent reasoning, and policy gaps across languages and cultures.
- Your decisions directly influence model safety, accuracy, and trustworthiness
- This work is often remote, flexible, and essential to real-world deployments
The role
This remote, hourly-paid contract position asks you to review AI-generated content and safety decisions in both Japanese and English. You will evaluate reasoning quality, step-by-step problem solving, and policy alignment, providing clear feedback and reproducible rationales.
Be aware: you may be exposed to potentially disturbing content, including sexual or violent material, while helping models deploy safely in the real world.
- Employment type: Contractor, part-time
- Time requirement: 20+ hours per week
- Pay: hourly, USD $27–$31/hr (typical rate shown: $30/hr)
What you'll do
You will assess model outputs for correctness, clarity, and safety; spot methodological or conceptual errors; and rate or compare multiple responses based on policy alignment and risk.
Work emphasizes clear, reproducible reasoning so moderation decisions can be applied consistently across reviewers and languages.
- Evaluate solutions for correctness, logic, and clarity
- Identify methodological, factual, or conceptual errors and recommend fixes
- Fact-check content as needed across Japanese and English
- Rate or compare multiple responses on safety, policy alignment, and severity
- Document reproducible rationales that others can follow
- Identify adversarial/edge-case inputs and recommend mitigations
Requirements
Candidates must meet all listed requirements and be comfortable working in a secure remote environment where explicit or disturbing content may appear.
- Near-native or native Japanese proficiency in reading and writing
- Minimum C1 English proficiency in reading and writing
- Bachelor’s degree or higher in a relevant field (Communications, Linguistics, Psychology, Law/Policy, Security Studies) or equivalent professional experience
- Senior-level experience in Trust & Safety, content moderation, policy operations, risk, compliance, investigations, or related safety functions
- Proven LLM red-teaming or adversarial testing experience, including identifying edge cases and recommending mitigations
- Strong knowledge of safety domains: hate and harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activities, malicious code, misinformation
- Experience applying policy standards consistently across Japanese and English content, including cultural nuance, slang, coded language, and context shifts
- Localization or translation experience preferred (ability to preserve meaning, severity, and intent across languages)
- Strong analytical writing skills with clear, reproducible rationales for moderation or safety decisions
- Comfortable handling explicit, toxic, violent, sexual, or psychologically disturbing content in a secure remote work environment
Who should apply
Apply if you have trust & safety or moderation experience and strong bilingual skills in Japanese and English, plus hands-on exposure to LLM red-teaming or policy enforcement.
This role suits professionals who can make nuanced judgments across languages, document decisions clearly, and flag adversarial risks.
- Trust & Safety specialists, policy operators, and senior moderators
- Bilingual LLM red-teamers and adversarial testers
- Localization professionals with safety/policy experience
How it works
This is a remote contract position open worldwide. The work focuses on TEXT data and uses evaluation rating and text-generation labeling tasks. Labeling software is listed as OTHER, meaning the platform may use a proprietary or third-party tool.
To apply, create an OpenTrain account (free) and submit your application. If selected, you'll work as a contractor on a part-time schedule (20+ hrs/week) and be paid hourly in USD at the stated rate range.
- Data type: TEXT; label types: EVALUATION_RATING and TEXT_GENERATION
- Labeling software: OTHER (proprietary or third-party tool)
- Worldwide applicants accepted; role is remote
- Contractor engagement and part-time hours (20+ hrs/week)