Knowing Which of Your AI Agents' "Successes" Are Real