Databricks Specialist with Python, Java, and/or Spark Expertise

OpenTrain AI · Remote · Worldwide · Posted Feb 21, 2026

About OpenTrain

OpenTrain is a central job board for AI-training and data-labeling work. We aggregate openings from many AI companies and labeling platforms so you can discover suitable projects in one place.

Creating an OpenTrain account is free and applying typically takes only a few minutes.

Free account to apply quickly across multiple AI-training opportunities.
This posting lists a contractor, part-time role working with Databricks and Apache Spark.

About AI Training Work

AI training (data labeling, annotation, or human feedback) is the human side of building AI systems: people prepare, review, and evaluate examples that models learn from.

This role focuses on computer-code and programming work: designing and optimizing data pipelines, reviewing code, and improving large-scale data processing so models and systems run correctly and efficiently.

Work is fully remote and often flexible in hours.
Many projects require domain skills (here: Databricks, Spark, and programming) rather than formal credentials.

The Role

We are seeking skilled data engineers with hands-on Databricks experience and deep Apache Spark expertise to develop efficient data workflows and optimize large-scale data processing systems.

This is a contractor, part-time position for 20+ hours per week, paid at $12 USD per hour. The role is worldwide/remote.

Subject matter: Databricks (Python, Java, Apache Spark).
Employment type: Contractor, Part-time.
Time requirement: 20+ hours/week.
Pay: $12 USD per hour.
Experience level (posting field): Entry level — see requirements below for actual experience expectations.

What You'll Do

Design, build, and optimize Databricks data pipelines and ETL processes for large datasets.

Perform advanced data analysis, debug and test large code bases, and tune performance for Spark jobs running in Databricks.

Develop efficient workflows and Spark jobs in Databricks.
Profile and optimize performance to prevent memory and compute issues.
Analyze, debug, and test large code bases and notebooks.
Document findings and navigate complex technical documentation.
Collaborate with a remote team and communicate progress clearly.

Requirements

You must meet all substantive requirements listed below. Candidates are asked to state exact years of experience for each programming language they know and their weekly availability.

We are hiring multiple specialists and intend to hire up to five people per programming language listed below; candidates who are proficient in multiple languages can be assigned to multiple language tracks.

Minimum of 5 years of experience working with Databricks.
Deep expertise in Apache Spark, including building, optimizing, and troubleshooting Spark-based data processing systems.
5+ years of experience in at least one of the following: Python, Java, SQL, Scala or Spark. In your application, list which languages you know and the exact number of years of experience with each.
Strong experience building and optimizing data pipelines and ETL processes.
Hands-on experience with data processing, analysis, and performance optimization in Databricks.
Experience analyzing, debugging, and testing large code bases and navigating complex documentation.
Familiarity with cloud platforms like Azure or AWS is preferred.
Ability to work independently, solve complex technical problems, and collaborate in a remote team environment.
English: B1 or B2 level required.
Availability: at least 20 hours per week; state exactly how many hours per week you can commit (this will be included in the interview summary).

Application & Interview Process

When you apply, please include: the Databricks projects you’ve worked on, the core Spark components you know (e.g., RDDs, DataFrames, Structured Streaming, Catalyst optimizer, partitioning/shuffle strategies), and for each programming language you listed, the exact number of years of experience.

You will participate in a live chat interview during which you must answer the test questions below before the interview ends; your answers will be evaluated for correctness and completeness and included in the interview score.

Provide a brief interview summary that includes how many hours per week you are available.
List which language tracks you are applying for (Python, Java, SQL, Scala, Spark) and the years of experience for each.
Preferred candidates will demonstrate concrete Databricks troubleshooting and optimization experience.

Test Questions (Answered Live)

You will be asked the following test questions during the live chat interview. The interviewer will not end the session until you have answered both questions; answers are scored for correctness and completeness.

Prepare clear, specific steps and examples for your answers—practical debugging steps, Databricks-specific configuration changes, and code or documentation improvements are expected.

Test Question 1 — Databricks Debugging and Optimization: You are given a PySpark job in Databricks that processes a large dataset, but it keeps failing with an OutOfMemoryError. What steps would you take to debug and resolve this issue? Provide specific adjustments or optimizations you would apply i
Test Question 2 — Code Review and Documentation: Review this Python code written for a Databricks notebook: data = spark.read.csv("/path/to/file.csv", header=True) filtered_data = data.filter(data["column"] > 100) result = filtered_data.groupBy("category").count() result.show() Identify any issues o

How It Works

OpenTrain connects you with AI-training projects like this one. After applying, you may be invited to a live interview where you’ll demonstrate your Databricks and Spark skillset.

This role focuses on computer code and programming labeling/engineering work: you’ll be building, reviewing, or improving code and data workflows that support AI systems.

Create a free OpenTrain account and apply — the application takes only a few minutes.
If selected, you’ll be hired as a contractor for a part-time engagement at $12/hr and will work remotely.