Creating effective AI agents requires thorough testing to ensure they provide accurate, helpful, and appropriate responses. Prisme.ai provides comprehensive testing capabilities to validate your agents before deployment and continuously improve them over time.Documentation Index
Fetch the complete documentation index at: https://prismeai-legacy.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Testing Approaches
Prisme.ai supports multiple testing methodologies to ensure your agents meet your organization’s standards:- Manual Testing
- Automated Evaluation
- Human-in-the-Loop
- Custom Evaluation
Evaluation Framework
Prisme.ai uses a straightforward evaluation system that makes it easy to assess agent performance:Response Quality
Score: 0 (Poor), 1 (Adequate), 2 (Excellent)
Context Quality
Score: 0 (Poor), 1 (Adequate), 2 (Excellent)
Hallucination Check
Score: 0 (Significant), 1 (Minor), 2 (None)
Automated Evaluation Process
The automated evaluation process uses LLMs as judges to assess agent performance:Configure Evaluation Parameters
- Which LLM will serve as the evaluator
- Evaluation frequency (daily, weekly, on-demand)
- Evaluation criteria weighting
Review Results
- Overall performance scores
- Performance trends over time
- Breakdowns by question type
- Detailed analysis of retrieved contexts
Human-in-the-Loop Evaluation
Combine automated testing with human expertise for comprehensive quality control: Human reviewers can:- Review and override automated evaluation scores
- Provide qualitative feedback on responses
- Identify subtle issues that automated systems miss
- Add new test questions based on emerging needs
- Validate context quality and relevance
Custom Evaluation with Webhooks
For specialized evaluation needs, you can implement custom processes using Webhooks and AI Builder:Implement Custom Evaluation Logic
- Domain-specific quality metrics
- Compliance and regulatory checks
- Industry terminology validation
- Integration with existing quality systems
Strategic Benefits of Testing
Comprehensive testing delivers significant benefits beyond simple quality control:Monitor Data Source Changes
Detect when changes to underlying data sources affect response quality.
This allows you to:
- Prevent regressions when content is updated
- Identify when knowledge gaps emerge
- Maintain consistency across content updates
Optimize LLM Selection
Evaluate performance across different LLM providers and models.
This enables you to:
- Select more cost-efficient models
- Reduce energy consumption
- Use specialized or self-hosted models when appropriate
- Make data-driven model migration decisions
Engage Business Stakeholders
Foster ownership of content quality among domain experts.
This helps to:
- Demonstrate the impact of quality source material
- Create accountability for knowledge accuracy
- Build trust in AI system outputs
- Drive continuous content improvement
Establish Tech-Business Alignment
Create a shared understanding of performance metrics and goals.
This leads to:
- Clear performance contracts between teams
- Shared optimization targets
- Better resource allocation
- Transparent communication about capabilities
Testing Methodology: Start Simple
We recommend an iterative testing approach that builds from foundational tests to more complex scenarios:Initial Test Set (15 Questions)
Start with a manageable set of diverse test cases:- 5 Simple Questions
- 5 Moderate Questions
- 5 Complex Questions
- “What is our company’s return policy?”
- “Who is the contact person for technical support?”
- “What are the operating hours for customer service?”
Iterative Optimization
After initial testing, systematically adjust and retest to improve performance:Adjust LLM Parameters
- Prompt engineering adjustments
- Temperature and creativity settings
- Different models or model versions
Refine RAG Configuration
- Chunking strategies
- Indexing methods
- Retrieval mechanisms
- Context handling
Integrate Tools
- Calculators for numerical questions
- Structured data tools for comparisons
- Visualization tools for complex data
Best Practices
Test Creation
Test Creation
- Base test questions on actual user queries when possible
- Include a mix of simple, moderate, and complex questions
- Create test cases that cover all key knowledge domains
- Update test sets as user needs and content evolve
- Include edge cases and potential failure scenarios
Evaluation Approach
Evaluation Approach
- Use automated evaluation for regular monitoring
- Incorporate human review for high-stakes applications
- Test both positive scenarios (what the agent should do) and negative scenarios (what it shouldn’t do)
- Establish clear evaluation criteria before testing
- Compare performance across different agent configurations
Continuous Improvement
Continuous Improvement
- Schedule regular re-evaluation of agent performance
- Analyze patterns in low-scoring responses
- Document configuration changes and their impact
- Establish feedback loops with end users
- Create a prioritization framework for addressing issues
Team Collaboration
Team Collaboration
- Include both technical and business stakeholders in test creation
- Share testing results transparently across teams
- Establish clear ownership for different aspects of quality
- Create shared performance goals and targets
- Celebrate improvements in agent quality
