FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.
While today's AI models don't tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according to ... and real analysis to abstract questions in algebraic geometry and ...
On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark ... differs from many existing AI benchmarks because the problem set remains private and unpublished ...