Job Search



Job Title: Senior Data Engineer
Job Description:
A well-established and reputable technology firm based in Abu Dhabi is seeking an experienced Senior Data Engineer to join their team. The ideal candidate will have a strong background in building data pipelines, managing datasets for AI/LLM workflows, and handling complex unstructured data. This is a key role that supports the data infrastructure required for fine-tuning large language models (LLMs) and deploying advanced AI systems.
Key Responsibilities:
-
Prepare, manage, and version datasets used for LLM fine-tuning and AI processes.
-
Design and implement robust ingestion pipelines for both structured and unstructured data using Python.
-
Clean, normalize, and format data into suitable structures (e.g., JSONL, CSV) for LLM training.
-
Build and curate high-quality, task-specific datasets for model evaluation and testing.
-
Apply data version control using tools like DVC or LakeFS for traceability and reproducibility.
-
Generate vector embeddings using HuggingFace or Sentence Transformers libraries.
-
Manage and optimize vector databases (e.g., FAISS, Weaviate) for high-performance data retrieval.
-
Tokenize and chunk long-form text data to optimize LLM context windows and performance.
Required Skills & Qualifications:
-
Minimum 10 years of experience in a data engineering role.
-
At least 2 years of hands-on experience in an AI or LLM-adjacent data role.
-
Proven expertise in managing datasets with object storage solutions (MinIO, NFS).
-
Advanced proficiency in Python, pandas, and modern text/data processing libraries.
-
Experience with tokenization tools such as HuggingFace Tokenizers and SentencePiece.
-
Strong understanding of LLM-related data constraints like prompt formatting and context window limits.
-
Candidates must be available for in-person interviews in Abu Dhabi.
