Python Infrastructure Engineer — LLM Training & Agent Tooling [AS‑L]

OpenTrain AI · Remote · Worldwide · Posted Jun 9, 2026

About OpenTrain

OpenTrain aggregates data‑labeling and AI‑training jobs from many companies into one searchable place so contributors can find this kind of work without hunting dozens of sites. Creating an OpenTrain account is free and applying typically takes only a few minutes.

We surface roles that power modern AI — from annotation to tooling — and connect skilled contractors with projects that need their expertise.

About AI Training Work

This role sits squarely in AI training: the human and engineering work that makes models reliable, safe, and useful. For coding and agent projects, that means building robust infrastructure for tests, scoring, and safe execution.

Work in AI training is often remote and flexible, and it directly shapes how models behave in real world use cases.

You will enable researchers and evaluators by delivering reproducible environments and pipelines.
Typical tasks include sandboxing, evaluation tooling, CI/CD, and instrumentation for agent experiments.

The Role

We are hiring senior‑minded Python engineers to design and maintain infrastructure for LLM training and agent tooling. You will deliver reusable repositories, scoring pipelines, and developer environments that researchers rely on to evaluate agent performance.

This is a fully remote, part‑time contractor role open to candidates located in the Asia‑Low region (Afghanistan through Vietnam). Expect 20+ hours per week.

Write clean, test‑driven Python code (pytest) and package production‑grade services.
Architect secure sandboxes and task environments for running and evaluating agents.
Build CI/CD pipelines that lint, test, build, and deploy; integrate security scanners and caching.

What You'll Do

You will be hands‑on across code, containers, CI, and developer tooling to make agent evaluation reproducible and safe for researchers.

Design and implement secure sandboxes and task environments for agent execution and scoring.
Author multi‑stage Dockerfiles, docker‑compose setups, and debug containerized workflows (Kubernetes experience is a plus).
Build and own CI/CD (GitHub Actions or similar) that handles linting, testing, builds, secrets, caching, and security scans (Trivy/Snyk).
Create developer environments: devcontainer.json, Makefiles, .env workflows, pre‑commit hooks and clear startup docs.
Develop modular FastAPI/Flask back‑ends with schema validation (Pydantic), auth, and robust logging.
Deliver scoring pipelines, reusable repositories, and automation that researchers can run locally or in CI.
Mentor teammates, pair‑program with researchers, and write concise technical documentation.

Requirements

You must meet all of the following core qualifications and be prepared for a short timed screening during the selection process.

5+ years professional Python experience producing production‑grade code, packaging, async I/O, and refactoring legacy modules.
Testing mindset: writes unit/integration/functional tests with pytest and focuses on coverage and reliability.
Linux power‑user: daily CLI use (bash, grep, curl, jq, systemd), basic networking and permissions troubleshooting.
Container expertise: author and debug multi‑stage Dockerfiles; comfortable with docker‑compose (Kubernetes is a plus).
CI/CD ownership: designs GitHub Actions or similar that lint, test, build, deploy; manages secrets and caching.
FastAPI or Flask proficiency: builds modular REST/async services, implements auth, validation (Pydantic), and logging.
Developer environment setup: creates devcontainer.json, Makefiles, .env workflows, and pre‑commit hooks.
Experience with LLM/agent infrastructure: built sandboxes, scoring pipelines, or evaluation frameworks for agents.
Familiarity with AI coding assistants (Cursor, Claude Code, Copilot, etc.) and how to leverage them safely.
Security awareness: applies least‑privilege, hardens images, and integrates scanners (Trivy/Snyk) into CI.
Version‑control discipline: semantic commits, PR templates, and code‑review best practices.
Screening readiness: able to complete a timed HackerRank assessment plus a platform coding test within 48 hours of invite.

Compensation, Hours, and Hiring Process

This role is offered as a part‑time contractor engagement with a minimum of 20 hours per week. Work is fully remote for candidates in the Asia‑Low region.

Hourly rates are tiered by experience: Junior $9 USD/hr, Middle $12 USD/hr, Senior $16 USD/hr. Selected candidates will complete a quick HackerRank assessment and a platform coding test before recruiter interviews are scheduled.

Employment type: Contractor, Part‑time.
Time requirement: 20+ hours/week.
Screening: timed HackerRank + platform coding test; please be prepared to complete tests within 48 hours of invite.

Who Should Apply

Apply if you are a senior‑minded Python engineer who enjoys building developer tooling and infrastructure that supports research. You should be comfortable mentoring others, documenting systems concisely, and pairing with researchers to ship reliable evaluation tooling.

Candidates with prior experience training or evaluating AI systems on annotation or evaluation platforms (Remotasks, Outlier, DataAnnotation, Alignerr) are especially welcome but not required.

You thrive in remote, asynchronous teams and take ownership of tooling and quality.
You value security, reproducibility, and developer experience when crafting infrastructure.

How It Works

Create a free OpenTrain account and submit your application — applying usually takes only a few minutes. If shortlisted, you'll receive an invite for a timed HackerRank assessment plus a short platform coding test.

Complete the tests (ideally within 48 hours), then you will be scheduled for recruiter interviews. If selected, you'll start as a contractor with agreed hours and hourly pay according to experience tier.

Sign up on OpenTrain and apply to this listing.
Complete HackerRank + platform coding test when invited.
Interview with a recruiter; onboarding and project assignment follow if selected.