This article on Freshmeat goes over some myths about tuning performance in GCC, and some of the improvements in the later versions. I’ve seen it too… Turn on -O69 and your programs will fly, right? Nope. Even quoting some source code, the author walks through some of the performance optimizations and what they really mean, including the differences between -march and -mcpu
Unfortunately, the author neglects to explain some of the terms, such as loop unrolling (he’s got a section on when it comes in to play or not, but never actually tells you what it is). If you’ve got a loop, say you’re cycling through a fixed array from 1..10 and doing a calculation on each element, the computer has to check the index variable each round to see if it should continue looping. If the loop is unrolled, then the compiler gets rid of the loop, and simply replicates the calculation 10 times. The code is bigger, but the CPU does less work.