Page 1 of 2

Data Engineer(Voice Data Experience -Mid Level)

📍 Remote | Full-Time | Reports to Head of AI

About Geny

At Geny Labs, we’re building a multilingual, voice-powered AI assistant for skilled service businesses - starting with beauty, wellness, and home services across the U.S., Canada, and emerging markets. From managing schedules and payments to growing online presence, Geny helps entrepreneurs run their business just by speaking.

We’re looking for a Data Engineer who loves working with real-world speech data, messy accents, and diverse voices. You’ll help structure and process the raw fuel that powers Geny’s voice intelligence.

What You’ll Do

- Collect, clean, and organize speech-to-text (STT) data from real users across multiple regions and accents

- Annotate transcripts with intent, speaker, and task labels to create training datasets

- Cluster and group user utterances (e.g., “reschedule my 3PM” = UPDATE_APPOINTMENT)

- Evaluate ASR/STT accuracy, track error cases, and suggest improvements

- Build data pipelines for ingesting, cleaning, and storing voice data

- Test fallback handling: measure how well Geny recovers from ambiguous or failed inputs

- Collaborate with product & AI teams to create regional prompt libraries for key workflows (scheduling, payments, social media)

Tools & Skills

- Python (pandas, json, regex, scikit-learn)

- NLP: spaCy, sentence-transformers, OpenAI embeddings, Bert, LSTM, GRU, word2vec, Gensim, e.t.c

- Familiarity with speech data workflows (Whisper, Coqui, AWS/GCP Speech APIs)

- Basic clustering and similarity search (KMeans, cosine similarity, Faiss)

- SQL/NoSQL (PostgreSQL, Redis) for data storage

- Bonus: experience with annotation tools (Label Studio, Prodigy)

- Data presentation experience using Tableau and Power BI, ETL Pipeline experience using Athena, Glue, Spark, s3

First 3–6 Months

- Help seed Geny’s initial voice dataset: transcribe & annotate ~10K utterances from early pilots

- Build scripts to cluster similar commands and link them to intents

- Evaluate Geny’s performance across accents (US, Canadian French, Spanish, African,Asian, English)

- Create baseline metrics dashboards (WER, intent accuracy, fallback rate)

- Support senior engineers in retraining/fine-tuning models with your datasets

Work Culture & Growth

- Small, global, remote-first team (US, Africa, Brazil, India)

- Fast-paced, collaborative, async-friendly - deliverables matter more than hours

- Mentorship from senior NLP/voice engineers

- Growth path from IC2 → IC3 → IC4 (leading features, mentoring others) Compensation & Benefits

- Salary: $30,000 – $60,000 USD annually (depending on experience and location)

- Equity: Early-stage ESOP (stock options) — share in Geny’s growth

- Benefits: Flexible hours, fully remote, wellness/remote stipend, paid time off, professional development budget