Role Overview:
The LLM Engineer will join an existing development team to build and ship LLM-powered features in a complex, production application used at scale. This is a hands-on, full-stack role spanning backend services, APIs, and the LLM systems (retrieval, agents, evaluation) that power them. We are looking for a skilled AI Engineer/Software Engineer with experience in Python, Java, Elasticsearch, and GenAI. You are expected to work as an agentic engineer—using AI coding tools and autonomous agents to write code, automate workflows, and optimize how the team delivers. You will collaborate with global teams across multiple time zones and own features end to end.
In this role, you will:
Bring senior-level Python and LLM engineering expertise to the team
Develop and maintain scalable applications using Python and Java
Design and implement RAG-based solutions using LLMs and vector search
Build and optimize search and retrieval systems using Elasticsearch
Develop document ingestion, indexing, embedding, and retrieval pipelines
Integrate GenAI frameworks and APIs into enterprise applications
Execute both planning and hands-on technical work independently
Collaborate effectively with Product Owners and other stakeholders to solve complex problems
Work cross-functionally to deliver impactful solutions across teams.
Continuously develop your technical expertise and stay current with new technologies
Bring curiosity and drive to expand your skills and knowledge
Use a data-driven approach to solve technical challenges and make informed decisions
Apply systems-level thinking that integrates data science and engineering principles
Take full ownership of the features and projects you work on, delivering high-quality solutions independently
Ensure application performance, scalability, and reliability
Must-Have Skills:
Hands-on experience in developing RAG (Retrieval-Augmented Generation) solutions, integrating LLMs, and implementing enterprise search capabilities
Designing and implementing RAG systems end to end: vector databases, semantic search, retrieval quality, and chunking strategy
Hands-on, daily use of AI-assisted and agentic coding tools (e.g., Claude Code, Cursor, GitHub Copilot, autonomous coding agents) to write and refactor code, automate workflows, and optimize engineering processes
Strong experience with Cloud platforms (AWS/Azure/GCP) and REST APIs
Grounding in NLP and machine learning as they relate to building LLM systems
Strong experience working with key LLM models APIs (e.g. OpenAI, Anthropic)
Experience building, deploying, and securing MCP servers at scale
Understanding of multi-agent systems and their applications in complex problem-solving scenarios
Experience with distributed search and indexing technologies (e.g., Elasticsearch, OpenSearch, Solr)
Experience with prompt writing for various use cases
Experience with Java or confidence in agentic coding skills to develop in Java
Experience with generative solutions released to prod, at scale, beyond POCs
Proficiency with server-side events, event-driven architectures, and messaging systems
Strong critical thinking and systems thinking skills, with experience debugging, optimizing, and making sound engineering decisions across complex backend systems, not just solving isolated problems.
Solid understanding of security best practices for backend systems, including authentication and data protection
Knowledge of embeddings, vector search, semantic search, and prompt engineering
Familiarty with frameworks such as LangChain, LlamaIndex, or similar
Other Qualifications:
2+ years of experience developing and experimenting with LLMs
8+ years of software development experience
Experience with vector database (Pinecone, FAISS, Waviate, Elasticsearch Vector Search)
Understanding of NLP, LLM Evaluation, and AI application deployment
Nice-to-Have Skills:
Experience with LLM guardrails
Experience with LLM Frameworks (e.g. LangChain, LlamaIndex)
Experience with LLM monitoring and observability
Experience developing AI/ML technologies within large and business critical applications
Building evaluation into LLM systems: eval harnesses, regression suites, LLM-as-judge, and offline/online quality metrics