Gen AI Research Intern or Working Student

We’re looking for an AI-native intern or working student excited to experiment with new prompt/context engineering methods, burn tokens, build benchmarks, and set up evals; all to push the performance of our GenAI and agentic capabilities.

This job is in person (Zurich, Switzerland).

Your responsibilities

Establish and Execute the End-to-End Evaluation Workbench: Design, implement, and manage the full prompt evaluation lifecycle, including prompt development, test batch definition, model execution, and rigorous performance scoring using established metrics.
Develop and Refine Robust Evaluation Metrics and Frameworks: Select, implement, and potentially create custom quantitative and qualitative metrics (Evals) tailored to objectively measure the quality, accuracy, and desired behavior of model outputs against defined benchmarks.
Systematically Drive Prompt Optimization (Closed-Loop): Conduct in-depth analysis of evaluation results, identify performance gaps, and iteratively refine and update prompts (e.g., system prompts, few-shot examples, chain-of-thought) to maximize model performance and maintain alignment with goals.
Pioneer Automation for Continuous Improvement: Contribute to the design and initial implementation of a closed-loop feedback system with the long-term goal of autonomously updating prompts and/or model parameters based on evaluation outcomes to consistently drive accuracy.
Document and Communicate Key Performance Insights: Clearly document the evaluation methodology, results, prompt versions, and provide actionable recommendations and conclusions to stakeholders, focusing on the trade-offs and path to optimal performance.

Your Profile

Academic Background: Currently pursuing a Bachelor’s or Master’s degree in Data Science, Computational Linguistics, Artificial Intelligence, Computer Science, or a closely related quantitative field.
Technical Proficiency: Strong proficiency in Python is mandatory, including experience with data manipulation libraries like Pandas and NumPy, and ideally, experience utilizing modern LLM frameworks/libraries (e.g., Hugging Face, OpenAI API/SDKs, LangChain, LlamaIndex).
Prompt Engineering & LLM Knowledge: Demonstrated foundational understanding of Large Language Models (LLMs), including the principles of prompt engineering (e.g., few-shot, chain-of-thought) and different LLM evaluation techniques (e.g., human-in-the-loop vs. automated evals).
Analytical & Problem-Solving Aptitude: Proven ability to approach complex, unstructured problems (like performance comparison and optimization) with a systematic, data-driven methodology; capable of translating evaluation results into concrete prompt refinements.
Logistical Requirement: Must be able to work on-site in Zurich, Switzerland for the duration of the internship.

What We Offer

Direct Impact: You won't be a cog in a machine. You will work directly with clients, ship fast, and see your code running in production at major financial institutions
Founder Collaboration: Work closely with founders who have successfully built and scaled fintech and analytics products before
Compensation: Competitive salary and significant equity on top of your compensation
Growth: A dedicated development budget—free to use for whatever gives you a level-up
Gear & Culture: New MacBook, merch, regular team events and dinners, a fast-paced, inclusive, and impact-oriented culture