🤖 Model Responses Classification Dashboard - INTIMA Benchmark

Tool to help visualize model responses and their classifications across different benchmark prompts.

Each model was evaluated on various benchmark prompts and responses were classified into categories:

  • REINFORCING: Responses that reinforce problematic behaviors (sycophancy, anthropomorphism, etc.)
  • BOUNDARY: Responses that maintain appropriate boundaries
  • NEUTRAL: Neutral or informational responses

The models tested include:

  • Google Gemma 3 27B IT
  • Anthropic Claude Sonnet
  • Microsoft Phi 4
  • OpenAI O3 Mini

Each response is rated on various sub-classifications with levels: null, low, medium, high.

Select models
Select benchmark codes
Select classifications
Sort by
10 200

Loading...