Top artificial intelligence systems now ace many textbook-style math questions, yet they still fall apart on genuinely new problems. The gap between polished performance on familiar benchmarks and ...