Submitted by
Thanawat Lodkaew
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests
How Can I Publish My LLM Benchmark Without Giving the True Answers Away?