When duplication is not duplication

Posted by Jamis on March 06, 2008 @ 03:56 PM

I was looking through some C code today, and stumbled across this lovely little gem:

1
2
3
4
5
tmp = "\"#";
while (*tmp) {
  FD_SET(*tmp, url_encode_map);
  tmp++;
}

Now, be honest. I don’t care how good you are at C, it takes you a few brain cycles to process that and figure out that it is just setting two bits in a bit field. It really should have been written like this:

1
2
FD_SET('"', url_encode_map);
FD_SET('#', url_encode_map);

This raises the question: why wasn’t it? I’ll tell you why:

Programmers have this burning desire to avoid code duplication. We’re taught, almost since the cradle, to abhor duplicated code and to avoid it all cost. Duplicating code is evil, it leads to unmaintainable code, and propogates bugs. Never, ever, do it!!!

Allow me to let you in on a little secret.

Calling the same function twice is NOT duplicating code. Not if the arguments change between calls.

Even calling the same function three times in a row is kosher. Four times, even. At some point, you might want to consider a loop, if the arguments can be determined functionally, but only do so when the list of similar function calls is harder to read and understand than the loop is. This is often when the loop takes fewer lines of code than the function calls do:

1
2
3
4
for (i = 127; i < 256; i++) {
  FD_SET(i, hdr_encode_map);
  FD_SET(i, url_encode_map);
}

There. Had to get that off my chest. Now, back to work.

Posted in Essays and Rants

Comments

Have something to add? Click here to leave a comment.

06 Mar 2008

1. Nate Klaiber said...

I would agree. Even if your FD_SET method changed it’s arguments, it’s likely you would have to change it in other spots as well. As you said, there is a point where running it through a loop would be beneficial – but for just a few calls there is no need to run a loop, almost seems like overkill.

2. Daniel Berger said...

Where is that code from? Just curious.

Dan

07 Mar 2008

3. Joachim Bengtsson said...

/This/ is duplication of code: http://overooped.com/post/27931909 . Ugggh. I hate working with this code base…

4. Neil Wilson said...

That’s the kind way of looking at it. The unkind reason is that (certain) programmers like to be clever clogs – particularly those using a character code as an array index as in this example.

I wonder what trick they’d use to mask out FD 257?

5. Vidar Hokstad said...

Ignoring the loop, this is a bizarrely beautiful “abuse” of FD_SET – I’d have to admit I’ve never contemplated using it as a generic bitfield before…

Neil, this isn’t an attempt to mask out any filedescriptors – it’s using FD_* as a lookup bitmap for characters only. From the name it sounds like the idea is to use it to loop over a string and check the fd_set to decide whether or not a character should be url encoded.

A bit wasteful (on my system fd_set is 128 bytes vs. the 32 bytes needed, but it can be far larger), but I doubt it’d be that tricky to understand in context, if it hadn’t been for that loop.

Vidar

6. Leon Breedt said...

Like Vidar, I had to do a double take when I saw FD_SET was being used, pointer incrementing to walk over a string is pretty standard in C, but FD_SET for this completely out-of-context use?

Clever, but I’d probably take the responsible programmer out back and shoot him, if he was working for me…After marveling at the thought process which led him to use it :)

7. Vidar Hokstad said...

I guess It’s a bit like Duff’s device (look it up if you haven’t come across it) – evil and beautiful (in the way H.R. Giger pictures are beautiful…) at the same time.

08 Mar 2008

8. Mario Gleichmann said...

Who said that calling the same function in a row (even with the same arguments) is code duplication?

I would say, code duplication is rather about placing the same block of ‘logic’ twice or more within your codebase (same functionality located on different locations, thus violating DRY principle)

Greetings

Mario

9. Ricky Clarkson said...

List(’”’,’#’) foreach FdSet(_, hdrEncodeMap))

This is a Scala equivalent – in this case I don’t think it’s ridiculous to use this to get rid of the duplication. The _ in there is an unnamed lambda parameter (I changed the other identifiers to camelCase so that the _ would not be lost in the noise; presumably if you needed the underscore_convention you would pick a different character than _ for the lambda parameter).

I’m not touting Scala here, just saying that the language/libs you use might affect whether removing code duplication is seen as worth it.. whenever it’s not, consider finding/writing a better language/libs.

10. Alex C said...

“Make things as simple as possible, and no simpler.” – Probably Not Einstein

11 Mar 2008

11. QB said...

Dear Buckblog,

It doesn’t “beg the question”, it “raises the question”. Learn what question begging is before throwing the phrase around. Do you use the word “penultimate” to mean “the best”? I hope not.

There. Had to get that off my chest. Now, back to work.

12. Jamis said...

To make reader @QB happy, I’ve changed “begs” to “raised”, although I’m of the opinion that such was nitpicking on his part. :) (For the curious, begging the question, since QB only wanted to cry foul and not educate, apparently!)

12 Mar 2008

13. Stacy said...

The proper tool to remove duplication is abstraction, if no suitable abstraction can be found then the duplication should be left alone. I think the code above would be fine if it were encapsulated in a function:

FD_SET_MANY(”\”#”, url_encode_map);

The primary concern when refactoring should not be to remove duplication, this is ridiculously low level, textual goal. The primary concern when refactoring should be the creation of appropriate abstractions.

14 Mar 2008

14. JK said...

If they were bent on reducing replication the middle line could have been: FD_SET(*tmp++, url_encode_map); :p

15. Jaosn said...

Daniel Berger: Google tells me it’s in haproxy.c from the HAProxy project.

16 Mar 2008

16. dmnd said...

From the article:

“Calling the same function twice is NOT duplicating code. Not if the arguments change between calls.”

The arguments in this case did not all change. Only one of them did. What if there were 5 arguments to that function, 4 of which remained the same between successive calls? Would you call that duplication?

This example does have some duplication and I don’t really have a problem with abstracting it away. If I wrote this code I’d probably end up following Stacey’s advice (comment 13).

19 Mar 2008

17. Ernie said...

Agreed. I’m relatively new to the Rails community, but one of the things I like most about it is that, as a general rule, Rails programmers seem to allow transparency of intent to trump other considerations when coding. I’ve felt this way about code for a long time.

Why needlessly abbreviate variable and function/method/subroutine/what-have-you names to near-meaninglessness when not forced to by the language you’re implementing in?

Why not try writing 10 lines of clear code instead of 3 lines filled with “cute” tricks?

So many times it seems like people optimize for CPU efficiency instead of developer efficiency, when CPU cycles come cheap. Not to say there aren’t some times that it’s necessary, just that people tend to jump the gun on optimizations… heck, with today’s interpreter/compiler optimizations, you may well end up with identical bytecode/binaries anyway.

Jamis, your articles over at The Rails Way have come in really handy as I’ve been learning, by the way. Thanks for your contribution to the community.