Evaluator

Test, compare and iterate on LLM prompts and models fast.

Create datasets and compare prompts across models.

Automate evals with metrics and model-graded rubrics.

Visualize results and collaborate with your team.