Long Context Coding Evaluation Dataset for Gemini CLI
Letian Li
This project builds a long context and complex reasoning coding evaluation dataset for Gemini CLI. It focuses on multi file, multi language software...
Behavioral evals, Quality, and the OSS Community
VedantxMahajan
This proposal focuses on improving the behavioral evaluation (eval) infrastructure of Gemini CLI to ensure reliable and safe AI tool usage. The...