Fine-Tune Qwen3-Coder on JUCE/C++ Audio DSP — Dataset Curation & QLoRA Training
OpenTrain AI · Remote · Worldwide · Posted Jun 9, 2026
About OpenTrain
OpenTrain is a central job board for AI-training and data-labeling work. We aggregate opportunities across the industry so you can discover projects like this one in a single place instead of hunting across many sites.
Creating an OpenTrain account is free and applying takes only a few minutes.
About AI training and why it matters
AI systems learn from examples prepared and reviewed by people. For coding models, that means curated instruction-response examples, cleaned code snippets, and careful quality filtering so the model learns correct patterns and best practices.
This role directly shapes how a specialized coder model behaves on real-time audio DSP, plugin architecture, and JUCE/C++ idioms.
Role overview
You will build a supervised fine-tuning (SFT) dataset for JUCE/C++ audio DSP and run QLoRA fine-tuning on Qwen3-Coder using Unsloth. The project goal is to create 3,000–5,000 high-quality ChatML-formatted instruction-response examples focused on audio plugin development and DSP.
Work is contract, part-time, remote, open to worldwide applicants. The engagement is fixed-price: USD 1,000 total. Expect 20+ hours/week availability during the contract.
- Position type: Contractor, Part-time
- Pay: Fixed price USD 1,000
- Time commitment: 20+ hours per week
- Location: Remote, worldwide
What you'll do (high-level tasks)
Follow a provided resource document and clone script to extract DSP-relevant C++ code and create clean, instruction-response training examples. Your pipeline will combine manual curation with LLM-assisted generation and multi-pass quality filtering.
- Extract DSP-relevant C++ functions from 40+ open-source GitHub repos (examples: Surge, ChowDSP, Airwindows, Vital, JUCE framework).
- Generate high-quality instruction-response pairs using LLM-assisted pipelines (Bespoke Curator or Distilabel with Claude/GPT-4).
- Convert blog posts, tutorials, forum Q&A, and free textbook content into clean ChatML-formatted training examples.
- Perform quality filtering with a second LLM pass, deduplication, and manual review.
- Run QLoRA fine-tuning on Qwen3-Coder using Unsloth and produce a ready-to-evaluate SFT model.
Deliverables and targets
Deliverables include a curated dataset (ChatML examples), a comprehensive resource document, the fine-tuned Qwen3-Coder model checkpoint (via Unsloth workflow), and a short report describing curation decisions and evaluation notes.
Target scope: 3,000–5,000 examples covering key DSP topics listed below.
- Target examples: 3,000–5,000 ChatML-formatted pairs
- Coverage areas: processBlock, AudioBuffer, juce_dsp filters, oscillators, delay lines, reverb, virtual analog modeling, plugin architecture, real-time DSP best practices
- Provide a comprehensive resource document with all repo URLs, blog links, textbook references, and the clone script (a resource doc and clone script will be provided to the hired candidate).
Requirements & qualifications
This is an expert-level role. You must be comfortable working with C++ audio plugin code, DSP concepts, and LLM-assisted data pipelines. Preserve code correctness and real-time audio best practices in every example.
- Experience level: Expert (C++ audio plugin development and digital signal processing required).
- Hands-on experience with the JUCE framework is strongly preferred; if you have JUCE projects or example audio projects, include them with your application.
- Familiarity with building instruction-response SFT datasets and working with ChatML formatting.
- Experience with LLM-assisted tools or pipelines (examples named in scope: Bespoke Curator, Distilabel, Claude, GPT-4).
- Practical knowledge of QLoRA fine-tuning workflows and Unsloth-based training is required or demonstrable.
Who should apply
Apply if you are an expert C++ audio/DSP engineer who has worked with JUCE and can translate code examples, tutorials, and documentation into clean training data. This role suits people who can balance automation with careful manual curation.
If you have public JUCE or audio projects, please share links when applying — they will be very helpful for consideration.
How it works and how to apply
OpenTrain aggregates this opportunity; creating an account is free and applying takes only a few minutes. The selected contractor will receive the full resource pack (repo list, blog/textbook links, and a clone script) to begin work.
We will coordinate milestones for dataset delivery and the QLoRA fine-tuning run. All technical details around datasets, formatting, and training scripts will be specified in the project onboarding materials.
- Labeling type: Fine-tuning, Computer programming/coding, Prompt/response SFT
- Data type: Computer code / programming
- Labeling software: Other (project uses bespoke pipelines and Unsloth for training)