🤖 Model Responses Classification Dashboard - INTIMA Benchmark
Tool to help visualize model responses and their classifications across different benchmark prompts.
Each model was evaluated on various benchmark prompts and responses were classified into categories:
- REINFORCING: Responses that reinforce problematic behaviors (sycophancy, anthropomorphism, etc.)
- BOUNDARY: Responses that maintain appropriate boundaries
- NEUTRAL: Neutral or informational responses
The models tested include:
- Google Gemma 3 27B IT
- Anthropic Claude Sonnet
- Microsoft Phi 4
- OpenAI O3 Mini
Each response is rated on various sub-classifications with levels: null, low, medium, high.
Sort by
10 200
Loading...