AI Safety Data Reviewer (Japanese/English)

OpenTrain AI · Remote · Worldwide · Posted Apr 3, 2026

About OpenTrain

OpenTrain is a central job hub for AI-training and data-labeling roles. We aggregate openings from many AI companies and labeling platforms so contributors can discover and apply to relevant work in one place.

Creating an OpenTrain account is free and applying takes only a few minutes.

Find remote AI-training roles across language, safety, annotation, and evaluation work
Quick applications and a single place to track contract opportunities

About AI training and safety work

AI training (data labeling / annotation / human feedback) is the human side of teaching models how to behave. Reviewers like you evaluate outputs, apply policy, and shape how models respond to real-world prompts.

Safety-focused work prevents models from producing harmful, adversarial, or misleading outputs by identifying edge cases, inconsistent reasoning, and policy gaps across languages and cultures.

Your decisions directly influence model safety, accuracy, and trustworthiness
This work is often remote, flexible, and essential to real-world deployments

The role

This remote, hourly-paid contract position asks you to review AI-generated content and safety decisions in both Japanese and English. You will evaluate reasoning quality, step-by-step problem solving, and policy alignment, providing clear feedback and reproducible rationales.

Be aware: you may be exposed to potentially disturbing content, including sexual or violent material, while helping models deploy safely in the real world.

Employment type: Contractor, part-time
Time requirement: 20+ hours per week
Pay: hourly, USD $27–$31/hr (typical rate shown: $30/hr)

What you'll do

You will assess model outputs for correctness, clarity, and safety; spot methodological or conceptual errors; and rate or compare multiple responses based on policy alignment and risk.

Work emphasizes clear, reproducible reasoning so moderation decisions can be applied consistently across reviewers and languages.

Evaluate solutions for correctness, logic, and clarity
Identify methodological, factual, or conceptual errors and recommend fixes
Fact-check content as needed across Japanese and English
Rate or compare multiple responses on safety, policy alignment, and severity
Document reproducible rationales that others can follow
Identify adversarial/edge-case inputs and recommend mitigations

Requirements

Candidates must meet all listed requirements and be comfortable working in a secure remote environment where explicit or disturbing content may appear.

Near-native or native Japanese proficiency in reading and writing
Minimum C1 English proficiency in reading and writing
Bachelor’s degree or higher in a relevant field (Communications, Linguistics, Psychology, Law/Policy, Security Studies) or equivalent professional experience
Senior-level experience in Trust & Safety, content moderation, policy operations, risk, compliance, investigations, or related safety functions
Proven LLM red-teaming or adversarial testing experience, including identifying edge cases and recommending mitigations
Strong knowledge of safety domains: hate and harassment, sexual content, self-harm, violence, bias, illegal goods/services, malicious activities, malicious code, misinformation
Experience applying policy standards consistently across Japanese and English content, including cultural nuance, slang, coded language, and context shifts
Localization or translation experience preferred (ability to preserve meaning, severity, and intent across languages)
Strong analytical writing skills with clear, reproducible rationales for moderation or safety decisions
Comfortable handling explicit, toxic, violent, sexual, or psychologically disturbing content in a secure remote work environment

Who should apply

Apply if you have trust & safety or moderation experience and strong bilingual skills in Japanese and English, plus hands-on exposure to LLM red-teaming or policy enforcement.

This role suits professionals who can make nuanced judgments across languages, document decisions clearly, and flag adversarial risks.

Trust & Safety specialists, policy operators, and senior moderators
Bilingual LLM red-teamers and adversarial testers
Localization professionals with safety/policy experience

How it works

This is a remote contract position open worldwide. The work focuses on TEXT data and uses evaluation rating and text-generation labeling tasks. Labeling software is listed as OTHER, meaning the platform may use a proprietary or third-party tool.

To apply, create an OpenTrain account (free) and submit your application. If selected, you'll work as a contractor on a part-time schedule (20+ hrs/week) and be paid hourly in USD at the stated rate range.

Data type: TEXT; label types: EVALUATION_RATING and TEXT_GENERATION
Labeling software: OTHER (proprietary or third-party tool)
Worldwide applicants accepted; role is remote
Contractor engagement and part-time hours (20+ hrs/week)