LLM Safety Evaluator (Hebrew & English Required)
OpenTrain AI · Remote · Worldwide · Posted Apr 3, 2026
About OpenTrain
OpenTrain is a central job board for data-labeling and AI-training work. We aggregate opportunities from many AI companies and labeling platforms so you can find training, evaluation, and annotation roles in one place. Creating an OpenTrain account is free and applying takes only a few minutes.
About AI training and safety work
AI training (also called data labeling, annotation, or human feedback work) is the human side of building intelligent systems. People prepare, review, and rate model outputs so models become accurate, safe, and useful.
Safety and trust work focuses on identifying harmful outputs, designing adversarial examples, and helping models avoid producing unsafe or misleading content. This job directly shapes how large language models behave in real-world use.
- Work is typically fully remote and flexible; many projects allow you to choose hours and workload.
- Projects range from beginner-friendly annotation to specialist safety or red-teaming work that pays more for relevant expertise.
The role — what this job is
This is a fully remote, hourly contractor position (part-time) for an LLM Safety Evaluator working in Hebrew and English. You will review AI-generated responses, curate adversarial and safety-sensitive examples, score model outputs, and document safety failures.
You will work 20+ hours per week and be paid hourly at $32/hr on average (range $26–$38 USD per hour). The position is worldwide; contractors will be engaged by a global AI data services organization and will follow provided safety policies and evaluation guidelines.
- Employment type: Contractor, Part-time.
- Time requirement: 20+ hours/week.
- Pay: USD $26–$38 per hour (typical $32/hr).
- Data type: Text. Label types: EVALUATION_RATING, TEXT_GENERATION, RLHF.
- Labeling software: Other (proprietary/platform tools supplied).
What you'll do day-to-day
You will generate, review, and rate safety-focused content in both Hebrew and English, probe model boundaries, and document weaknesses and adversarial patterns. Tasks sometimes involve explicit, toxic, violent, sexual, or otherwise disturbing material as part of stress-testing models.
- Curate and label adversarial or safety-sensitive training examples in Hebrew and English.
- Review and score model outputs against policy and quality criteria.
- Document and report safety failures, edge cases, and recurring adversarial patterns.
- Stress-test models to reveal policy gaps and ambiguous behavior.
- Explain and justify decisions clearly in written form, especially in ambiguous cases.
Requirements — must-haves
Candidates must meet the language, education, and experience requirements below and be comfortable handling explicit or sensitive content as part of regular work.
- Near-native or native Hebrew proficiency in reading and writing.
- Minimum C1 English proficiency in reading and writing.
- Bachelor’s degree or higher in Communications, Linguistics, Psychology, Law/Policy, Security Studies, or equivalent professional experience.
- Proven experience in Trust & Safety, content moderation, policy enforcement, risk operations, investigations, or safety evaluation.
- Required hands-on LLM red teaming experience, including probing safety boundaries and documenting adversarial patterns.
- Strong knowledge of safety categories: hate and harassment, sexual content, suicide and self-harm, violence, bias, illegal goods/services, malicious activities, malicious code, and misinformation.
- Ability to apply written safety policies consistently and to explain decisions clearly in ambiguous cases.
- Comfortable reviewing explicit, toxic, violent, sexual, or psychologically disturbing content as part of daily work.
- Strong practical experience using tools such as Perplexity, Gemini, ChatGPT, or similar AI systems.
- Prior experience with AI data training, annotation, or evaluation workflows preferred.
Who should apply
Apply if you have real-world Trust & Safety or red-teaming experience, are fluent in Hebrew and advanced in English, and want to influence how LLMs handle sensitive content. This role suits candidates who can write clear rationale, follow policy, and comfortably review difficult material.
- Good fit: content moderators, trust & safety analysts, investigators, policy reviewers, and AI evaluators with red-teaming experience.
- Not a fit: applicants without required language skills or who are unable to review explicit or disturbing content.
How the process works
Apply through OpenTrain by creating a free account and submitting your details — the application typically takes only a few minutes. Qualified candidates will be contacted by the global AI data services organization for onboarding, training on policy and tools, and contractor setup.
You will receive project-specific guidelines and access to the annotation/evaluation platform (labeling software is provided by the client). Your evaluations will directly inform model safety improvements.
- OpenTrain aggregates jobs so you can find and apply quickly.
- Onboarding includes training on safety policies and evaluation standards before live tasks.