Python Infrastructure Engineer — LLM Training & Agent Tooling [US‑CA]

OpenTrain AI · Remote · Worldwide · Posted Jun 4, 2026

About OpenTrain

OpenTrain is a central job board for data-labeling and AI-training roles. We aggregate positions from many AI companies and labeling platforms so contributors can discover work in one place and apply quickly. Creating an OpenTrain account is free and applying takes only a few minutes.

This posting is for a role working on infrastructure for LLM training and agent tooling. OpenTrain helps match qualified engineers with projects like this across the industry.

About AI training and why it matters

AI training (also called data labeling, annotation, or human-feedback work) is the human side of building machine learning systems. Engineers and contributors prepare examples, evaluation pipelines, and test harnesses that teach models to behave accurately and safely.

Work in this category is typically remote and flexible. Many projects require domain skills rather than formal experience, and contributions directly affect model performance, reliability, and safety.

The role

We’re seeking a senior-minded Python Infrastructure Engineer to own the infrastructure underpinning LLM-training workflows and agent evaluation tooling. The role centers on building secure sandboxes, task frameworks, scoring pipelines, and developer environments that let researchers iterate quickly.

This is a remote contractor role open to candidates located in the United States and Canada. Time commitment is 20+ hours per week. Employment types: Contractor, Part-time. Experience level: Intermediate, with a senior mindset for system ownership and collaboration.

Location: Remote (United States and Canada only).
Time: 20+ hours/week.
Employment: Contractor, Part-time.
Pay tiers: Junior $34/hr | Middle $37/hr | Senior $42/hr (USD).

What you’ll deliver

You will build reusable repositories, automated evaluation and scoring pipelines, secure sandbox environments, and developer setups so AI researchers can run, test, and iterate on agent tasks quickly and safely.

Reusable repo templates and modular service patterns for FastAPI/Flask back-ends.
Automated CI/CD pipelines (GitHub Actions preferred) that lint, test, build, and deploy with secrets and caching management.
Secure containerized sandboxes and hardened Docker images, plus integration of scanners such as Trivy or Snyk in CI.
Test-driven toolchains: unit, integration, and functional tests driven by pytest with high coverage targets.
Developer environments: devcontainers, Makefiles, pre-commit hooks, and .env workflows for quick onboarding.

Requirements

You must meet the core technical requirements and be ready to complete screening assessments within the timetable described below.

5+ years professional Python: production-grade code, async I/O, packaging, and refactoring.
CS/Engineering degree or equivalent hands-on experience.
Test-driven mindset with pytest: unit, integration, and functional tests; targets high coverage.
Linux power-user skills: bash, grep, curl, jq, permissions, and basic networking.
Docker expertise: multi-stage Dockerfiles, image optimization, docker-compose; Kubernetes is a plus.
CI/CD ownership: designs GitHub Actions (or similar) pipelines; manages secrets and caching.
FastAPI or Flask proficiency: modular REST/async services, Pydantic validation, structured logging.
Dev-environment setup experience: devcontainers, Makefiles, .env workflows, pre-commit hooks.
LLM/agent infrastructure exposure: built sandboxes, scoring pipelines, or evaluation frameworks for AI agents.
Familiarity with AI coding assistants (e.g., Copilot, Cursor, Claude Code) and responsible use in workflows.
Security awareness: least-privilege principles, Docker hardening, integrating scanners in CI.
Version-control discipline: semantic commits, branch hygiene, and thorough code reviews.
Clear English communication, collaborative mindset, and ability to support and guide researchers.
Screening readiness: able to complete a timed HackerRank assessment plus a platform coding test within 48 hours of invite.

Who should apply

Apply if you enjoy technical ownership of infrastructure that enables AI research and evaluation, and if you are comfortable partnering closely with researchers to make tools reliable and usable.

Candidates with prior experience on platforms such as Remotasks, Outlier, DataAnnotation, or Alignerr—especially on coding-focused tasks—are a strong fit, but relevant infrastructure experience is the core requirement.

You value testability, repeatability, and secure defaults.
You can document clearly, pair-program, and mentor researchers on tooling usage.
You can commit to 20+ hours/week and work as a contractor in the US or Canada.

How the hiring process works

Short-listed candidates will be asked to complete a timed HackerRank assessment and a platform coding test. You must be able to finish these within 48 hours of receiving the invite. Successful candidates progress to recruiter-led interviews.

We do not require in-person presence; all interviews and assessments are remote. Compensation is hourly and tiered by experience as stated above.

Apply through OpenTrain with your resume; create an OpenTrain account if needed.
Complete HackerRank + platform coding test within 48 hours when invited.
If shortlisted, attend recruiter interviews to discuss fit and next steps.