AI Evaluation Specialist
Binance is a renowned global blockchain ecosystem that operates the largest cryptocurrency exchange platform worldwide, serving over 280 million individuals across 100+ countries. Our solid reputation is built on industry-leading security standards, transparent user fund management, fast trading engine, extensive liquidity, and an unparalleled array of digital asset products. Binance's diverse offerings span trading, financial services, education, research, payments, institutional support, Web3 features, and more. We harness the potential of digital assets and blockchain technology to establish an inclusive financial ecosystem that promotes financial freedom and enhances financial inclusion on a global scale.
We are currently seeking a dedicated AI Evaluation Specialist tasked with designing, implementing, and overseeing comprehensive evaluation frameworks covering every stage in the lifespan of LLM agents. Your role will play a pivotal part in Binance's AI integration journey by ensuring the dependability, adaptability, and regulatory compliance of AI agents deployed across various domains such as Customer Service, Growth, and Compliance.
Responsibilities:
- Engage in the complete software development lifecycle, encompassing requirements analysis, test planning, execution, defect tracking, product release, and maintenance.
- Serve as a primary contact for A.I Agents evaluation and continuous monitoring.
- Develop effective test strategies and conduct hands-on testing to guarantee the accuracy, reliability, and performance of AI and data applications.
- Conduct root cause analysis of test failures and product issues, and facilitate optimization for future enhancements.
- Design and implement internal tools utilizing AI technology to enhance engineering and testing efficiency.
Requirements:
- Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Data Science, or a related field.
- Profound understanding of Large Language Models (LLMs), autonomous AI agents, and their system architectures.
- Experience with AI evaluation methodologies encompassing offline benchmarking, online monitoring, and hybrid human-AI evaluation approaches.
- Knowledge of software engineering best practices such as Test-Driven Development (TDD) and Behavior-Driven Development (BDD) in AI contexts.
- Ability to craft adaptive, lifecycle-spanning evaluation frameworks incorporating quantitative and qualitative metrics.
- Prior experience with evaluation tools and frameworks is beneficial.
- Proficiency in analyzing complex system-level behaviors, including reasoning pipelines, tool integrations, and agent actions.
- Strong analytical abilities with a background in data-driven diagnostics and root cause analysis.
- Excellent communication skills for clear documentation of evaluation plans, results, and recommendations.
- Experience collaborating in cross-functional teams and managing feedback loops between evaluation and development.
- Previous exposure to working with infrastructure or platform teams to enhance AI tooling and automation platforms.
Binance offers a dynamic work environment where you can:
- Play a pivotal role in shaping the future within the foremost blockchain ecosystem globally.
- Collaborate with top-tier professionals in a user-centric, globally-distributed organization with a flat structure.
- Engage in unique, fast-paced projects autonomously in an innovative setting.
- Grow your career and continuously learn within a results-oriented workplace.
- Enjoy competitive compensation and company benefits.
- Benefit from remote work arrangements depending on team-specific work requirements.
At Binance, we are committed to fostering an inclusive work environment by promoting diversity within our workforce as we believe it is essential for our continued success. By applying for a job at Binance, you acknowledge that you have reviewed and agreed to our Candidate Privacy Notice.
