Creator of mcpbr, schemaflux, and musegpt. MS CS (ML) at Georgia Tech. Researching agentic evaluations.
Researching how LLM-based agents use tools, with a focus on evaluation methodology and benchmark design. Building open-source infrastructure for measuring AI.
- Georgia, USA
-
16:17
(UTC -05:00) - greynewell.com
- https://orcid.org/0009-0001-0714-3800
- @greynewell
- in/greynewell
Highlights
- Pro
Pinned Loading
-
evaldriven.org
evaldriven.org PublicA manifesto for evaluation-driven AI development. Every AI system needs deterministic, automated evaluation as a first-class engineering practice.
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.




