🐢 Open-Source Evaluation & Testing library for LLM Agents
-
Updated
Mar 27, 2026 - Python
🐢 Open-Source Evaluation & Testing library for LLM Agents
Deliver safe & effective language models
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
52-week journey from QA/SDET to GenAI Testing - learning in public with weekly mini-projects, code, and honest documentation of struggles and wins.
A Python library for verifying code properties using natural language assertions.
🚀 First multimodal AI-powered visual testing plugin for Claude Code. AI that can SEE your UI! 10x faster frontend development with closed-loop testing, browser automation, and Claude 4.5 Sonnet vision.
Open-source framework for stress-testing LLMs and conversational AI. Identify hallucinations, policy violations, and edge cases with scalable, realistic simulations. Join the discord: https://discord.gg/ssd4S37WNW
Statistical evaluation framework for AI agents
Turn plain English into Robot Framework files with AI. No dependencies, no hassle — just validated, ready-to-run tests
Ethical AI Governance Platform | Bias Detection | Compliance | Fairness Testing for ML, LLM & Multimodal AI | Open Source
Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.
The "Cloudflare for AI Agents". 6-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.
AI 测试用例生成系统。基于 DeepSeek + 百炼部署的 RAG 知识库,包含需求分析、测试用例生成、智能运维助手、产品指南等内容
🚀 ARM64 Browser Automation for Claude Code - SaaS testing on 80 Raspberry Pi budget. The first solution that works where Playwright/Puppeteer fail on ARM64. Autonomous testing without human debugging.
Multi-agent simulation using LLMs. Agents autonomously decide actions for survival, reproduction, and social behavior in a grid world.This project aims to replicate a paper published in 2025 (arXiv:2508.12920).
AI Execution Management for Test Automation — 5-layer Selenium architecture with self-building, self-improving enforcement via the Isagawa Kernel
pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions
Integration of OpenAI with Pytest to automate API test generation.
A unified benchmarking framework for evaluating Voice AI agents across conversational quality, audio realism, latency metrics, and safety guardrails with scalable multi-language stress testing.
Add a description, image, and links to the ai-testing topic page so that developers can more easily learn about it.
To associate your repository with the ai-testing topic, visit your repo's landing page and select "manage topics."