Code Kata 1

Practicedoesn’t just make perfect—it makes what you know tacit. You can stop focusingon the details, and instead work on the big issues.

·Code Kata: How It Started

(This is a long one, itexplains how I discovered that something I do almost every day to improve mycoding is actually a little ritual that has much in common with practice in themartial arts…)

This all starts with the way RubLog does searching(but it isn’t an article about searching, or about bit twiddling. Trust me).Because I eventually want to use cosine-based comparisons to find similararticles, I build vectors mapping word occurrences to each document in the blog.I basically end up with a set of 1,200-1,500 bits for each document. Toimplement a basic ranked search, I needed to be able to perform a bitwise ANDbetween each of these vectors and a vector containing the search terms, andthen count the number of one-bits in the result.

I had a spare 45 minutes last night (I took Zachary tohis karate lesson, but there was no room in the parent’s viewing area), so Ithought I’d play with this. First I tried storing the bit vectors differentways: it turns out (to my surprise) that storing them as an Array of Fixnumshas almost exactly the same speed as storing them as the bits in a Bignum.(Also, rather pleasingly, changing between the two representations involvesaltering just two lines of code). Because it marshals far more compactly, Idecided to stick with the Bignum.

Then I had fun playing with the bit counting itself.The existing code uses the obvious algorithm:

      max_bit.times { |i| count += word[i] }

Just how slow was this? I wrote a simple testbed thatgenerated one hundred 1,000 bit vectors, each with 20 random bits set, andtimed the loop. For all 100, it took about .4s.

Then I tried using the‘divide-and-conquer’ counting algorithm from chapter 5 ofHacker’s Delight(modified to deal with chunking theBignum into 30-bit words).

     ((max_bit+29)/30).times do |offset|
       x = (word >> (offset*30)) & 0x3fffffff
       x = x - ((x >> 1) & 0x55555555)
       x = (x & 0x33333333) + ((x >> 2) & 0x33333333)
       x = (x + (x >> 4)) & 0x0f0f0f0f;
       x = x + (x >> 8)
       x = x + (x >> 16)
       count += x & 0x3f
     end

This ran about 2.5 times faster that the naivealgorithm.

Then I realized that when I was comparing a set ofsearch times with just a few words in it, the bit vector would be very sparse.Each chunk in the loop above would be likely to be zero. So I added a test:

     ((max_bit+29)/30).times do |offset|
       x = (word >> (offset*30)) & 0x3fffffff
       next if x.zero?
       x = x - ((x >> 1) & 0x55555555)
       x = (x & 0x33333333) + ((x >> 2) & 0x33333333)
       x = (x + (x >> 4)) & 0x0f0f0f0f;
       x = x + (x >> 8)
       x = x + (x >> 16)
       count += x & 0x3f
     end

Now I was seven times faster than the original. But mytestbed was using vectors containing 20 set bits (out of 1,000). I changed itto generate timings with vectors containing 1, 2, 5, 10, 20, 100, and 900 setbits. This got even better: I was up to a factor of 15 faster with 1 or 2 bitsin the vector.

But if I could speed things up this much byeliminating zero chunks in the bit-twiddling algorithm could I do the same inthe simple counting algorithm? I tried a third algorithm:

     ((max_bit+29)/30).times do |offset|
        x = (word >> (offset*30)) # & 0x3fffffff
        next if x.zero?
        30.times {|i| count += x[i]}
      end

The inner loop here is the same as for the originalcount, but I now count the bits in chunks of 30.

This code performs identically to the bit-twiddlingcode for 1 set bit, and only slightly worse for 2 set bits. However. once thenumber of bits starts to grow (past about 5 for my given chunk size), theperformance starts to tail off. At 100 bits it’s slower than the original naivecount.

So for my particular application, I could probablychose either of the chunked counting algorithms. Because the bit twiddling onescales up to larger numbers of bits, and because I’ll need that later on if Iever start doing the cosine-based matching of documents, I went with it.

So what’s the point of all this?

Yesterday I posted a blog entryabout the importance ofverbs. It said“Oftenthe true value of a thing isn’t the thing itself, but instead is the activitythat created it.”ChadFowler picked this up and wrote a wonderful piece showing how this was true formusicians. And Brian Marick picked up of Chad’s pieceto emphasize the value of practice when learning a creativeprocess.

At the same time, Andy and I had been discussing a setof music tapes he had. Originally designed to help musicians practice scalesand arpeggios, these had been so popular that they now encompassed a wholespectrum of practice techniques. We were bemoaning the fact that it seemedunlikely that we’d be able to get developers to do the same: to buy some aid tohelp them practice programming. We just felt that practicing was not somethingthat programmers did.

Skip forward to this morning. In the shower, I got tothinking about this, and realized that my little 45 minute exploration of bitcounting was actually a practice session. I wasn’t really worried about theperformance of bit counting in the blog’s search algorithm; in reality it comesback pretty much instantaneously. Instead, I just wanted to play with some codeand experiment with a technique I hadn’t used before. I did it in a simple,controlled environment, and I tried many different variations (more than I’velisted here). And I’ve still got some more playing to do: I want to mess aroundwith the effect of varying the chunk size, and I want to see if any of theother bit counting algorithms perform faster.

What made this a practice session? Well, I had sometime without interruptions. I had a simple thing I wanted to try, and I triedit many times. I looked for feedback each time so I could work to improve.There was no pressure: the code was effectively throwaway. It was fun: I keptmaking small steps forward, which motivated me to continue. Finally, I came outof it knowing more than when I went in.

Ultimately it was having the free time that allowed meto practice. If the pressure was on, if there was a deadline to delivery theblog search functionality, then the existing performance would have beenacceptable, and the practice would never have taken place. But those 45pressure-free minutes let me play.

So how can we do this in the real world? How can wehelp developers do the practicing that’s clearly necessary in any creativeprocess? I don’t know, but my gut tells me we need to do two main things.

The first is to take the pressure off every now andthen. Provide a temporal oasis where it’s OK not to worry about some approachingdeadline. It has to be acceptable to relax, because if you aren’t relaxedyou’re not going to learn from the practice.

The second is to help folks learn how to play withcode: how to make mistakes, how to improvise, how to reflect, and how to measure.This is hard: we’re trained to try to do things right, to play to the score,rather than improvise. I suspect that success here comes only after doing.

So, my challenge for the day: see if you can carve out45 to 60 minutes to play with a small piece of code. You don’t necessarily haveto look at performance: perhaps you could play with the structure, or thememory use, or the interface. In the end it doesn’t matter. Experiment,measure, improve.

Practice.

·Kata, Kumite, Koan, and Dreyfus

A week or so ago I posted a piececalledCodeKata, suggesting that as developers we need tospend more time just practicing: writing throwaway code just to get theexperience of writing it. I followed this up with a firstexercise, an experiment in supermarket pricing.

Those articles generated some interest, both on otherblogs and in e-mail. In particular, I’ve had a couple of wonderful exchangeswith Bob Harwood. In turn, these have lead to a bit of research, and aninteresting confluence of ideas.

Kata (Japanese forformorpattern)are an exercise where the novice repeatedly tries to emulate a master. Inkarate, these kata are a sequence of basic moves (kicks, blocks, punches, andso on), strung together in a way that makes sense. You’ll never be attacked insuch a way that you could repeat a kata to defend yourself: that isn’t theidea. Instead the idea is to practice the feel and to internalize the moves.(Interestingly, kata are not just used in the martial arts. Calligraphers alsolearn using kata, copying their masters’ brush strokes.)

Kata and other artificialexercises form a large part of the work done by a karate novice. They practicefor hour after hour. (Interestingly, I was talking about this with my son’skarate sensei, and he explained that as well as the standard combinations ofmoves in kata, he often has his classes do combinations that don’t feelnatural, or where the body isn’t correctly positioned at the end of one toenter the next. He believes that teaching what doesn’t work is an effective wayto help them improvise whatdoeswork later).

Once you get some way into your training, you startkumite, or sparring. Kumite is a supervised exercise between two students, orbetween a student and a master. Here they learn to assemble the basic movesinto coherent sequences, combining offensive and defensive elements intosomething that works. While kata could be considered static, repeating the samesequence over and over, kumite is dynamic. Sparring continues throughout therest of your training. It is interesting to watch the development of sparringas folks progress through the belt ranks. Beginners often seem to fall into thetrap of being very rigid in their choice of moves. If a kick worked for themlast time, then they’ll continue to use that kick over and over again.Similarly, some beginners attack and forget to defend, or spend all their timedefending. After they become more experienced, their repertoire increases, andthey learn to use appropriate moves which are strung together almost like ajazz improvisation: responding to their opponent but at the same timeexpressing their own plan of attack. Watching good black belts spar isfascinating; they manage to combine attacks and defenses in the same move,executing with both power and a great deal of subtleness.

Then, to quote Bob Harwood“Once a kata has been learned, then thekata needs to be forgotten. That is, at some point in studying Karate,typically the black belt level, the time comes to transcend the motions andseek meaning in the kata. The student discovers how his/her view of the worldis reflected in their performance of kata, and (if lucky) they learn how toadapt the kata to new interpretations. As a student learns to do this, katabecomes more and more a part of their kumite (sparring). …the skill ofself-discovery becomes part of their daily life. The study of koan is oftenused to promote this learning.”Koans are questions without absoluteanswers which are used to break down assumptions and reveal underlying truths.The goal of a koan is not the answer, but thinking about the question. In thesupermarket pricing example, when talking about “buy two, get the third free,”the question “does the third item have a price?” is something of a (minorleague) koan.

All of which brings us back tothe Dreyfus model of skills acquisition (and you thought the title of this blogentry was the name of a law firm). The Dreyfus model suggests that there arefive stages in the acquisition of mastery. We start at novice: unsure and withno experience. We don’t particularly want to know the “why,” we just want to beshown what to do. We need to know the rules, because we just want to achievesome goal. As we get more experience, and progress through the next threelevels, we start to move beyond this immediate, mechanical level. We gain moreunderstanding and start to be able to formulate our own action plans. Finally,when we achieve mastery, we have all that experience internalized, and we canwork from intuition. We no longer need the rules to support us; instead wewrite the new rules. Andy has a great talk about this (HerdingRacehorses and Racing Sheep,availablefromtheJAOOwebsite.)

There’s a lot of obvious similarity between Dreyfusand the way people become masters of karate. The kata is rote learning, copyingthe master. Kumite is where you get to start applying the skills on your own.And then mastery, where you teach others, and where you use koan to attempt todiscover underlying principles for yourself.

So, I’m planning to change my taxonomy of challengessomewhat. I think that as developers we need all three of these levels: katafor the things we’re only just starting to learn, kumite for the things wethink we know, and koan for when we want to dig deeper. To quote from Andy’stalk, “Experience comes from practice”.

点赞