PHOTOOG Photography writings by Olivier Giroux


Writing Performance-Sensitive Code

Someone I work with closely recently shared some pain he felt while optimizing a tool for speed. He offerred this conclusion so that others need not suffer as he just had:

For performance code, avoid std::copy at all costs, especially if you’re really using it to perform a memcpy.

Because this was broadcast on a rather large distribution list, I took a long time to figure out what my response ought to be. In part I want to believe that C++ compilers can make std::copy as fast as (or close to) memcpy. I'm also not sure everyone reacts in the best way to such conclusions, and how this gets integrated into individuals' toolboxes. Allow me expand.

Everyone at NVIDIA likes to think that their code is performance-sensitive. Nobody answers yes to the question "do you like to write slow programs?" Yet in the wider body of programmers (incl. our own) virtually nobody can tell a-priori what is slow or fast, in spite of the mass delusion that we can.

What we have above is a very valuable meme that I just wish you would not remember as "std::copy is for people who like it slow". Memes like this one are useful pieces of programmer wisdom to put into your book of tricks, but the daily application of such memes alone enables maybe 1% of programmers to write fast code by accident. While this way of working best shows off one’s elite status, it’s completely inappropriate.

When it comes to writing performance-sensitive code, there’s only one golden rule to ensure success:

If you’re writing performance-sensitive code, use a profiler. Early. Often.

The thoughtful applications of the memes you collect from experience (and discussions!) complement this rule by reducing the gross inefficiencies that are there to begin with. This is absolutely a non-negligible effect that we want to capture in our software too, but it's not how you build fast software.

I hope you found value in this.

Filed under: Uncategorized Leave a comment
Comments (2) Trackbacks (0)
  1. Agreed.

    Which profiler do you recommend then? I like AQTime a lot 🙂

  2. I run the Visual C++ profiler on my code on a regular basis. With Visual C++ 2005 there is no reason not to, because it’s been made so easy.

    Basically if your app is set-up to run under the debugger you are within ~4 clicks of profiling it.

    In the past I also used:
    1. AMD’s CodeAnalayst – in my opinion it’s a random number generator (sic). And it doesn’t work on Intel processors.
    2. Intel’s VTUNE – it totally rocks but I’ve had the worst luck with it, where it would often segfault while trying to crunch on the sample data. And it doesn’t work on AMD processors.
    3. Gprof – it’s decent for C code, but it’s unfit for use with STL-heavy C++ because it doesn’t report samples along individual call stacks (the ‘call stack’ feature exists but is too limited).
    4. My own homegrown profiler – it was ok but it only supported VC 2003, not 2005.

    All in all, I think the VC2005 profiler passes over the “good enough” bar and offers simplicity of use unmatched by the others. It has some issues on longer-running processes due to what I perceive is algorithmic inefficiency however.

Leave a comment

No trackbacks yet.