The Curious Schemer

The following sentence is false. The preceding sentence is true.

Archive for the ‘Technology’ Category

When A Synchronized Class Isn’t Threadsafe

with 24 comments

Every Java programmer has heard this advice before: “Prefer ArrayList over Vector. Vector is fully synchronized, and as such you’re paying the synchronization penalty even when you don’t need it.”

ArrayList is not synchronized, so when you need it you need to perform synchronization yourself, or alternatively, as the ArrayList javadoc entry says: “… the list should be “wrapped” using the Collections.synchronizedList method.” Something like this:

List list = Collections.synchronizedList(new ArrayList());

The resulting List will be synchronized, and therefore can be considered safe.
Or is it?
Not really. Consider the very contrived example below.

final List<String> list = Collections.synchronizedList(new ArrayList<String>());
final int nThreads = 1;
ExecutorService es = Executors.newFixedThreadPool(nThreads);
for (int i = 0; i < nThreads; i++) {
    es.execute(new Runnable() {
        public void run() {
            while(true) {
                try {
                    list.clear();
                    list.add("888");
                    list.remove(0);
                } catch(IndexOutOfBoundsException ioobe) {
                    ioobe.printStackTrace();
                }
            }
        }
    });
}

As long nThreads is 1, everything runs just fine. However, increase the number of nThreads to 2, and you start getting this:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

at java.util.ArrayList.RangeCheck(Unknown Source)

at java.util.ArrayList.remove(Unknown Source)

at java.util.Collections$SynchronizedList.remove(Unknown Source)

Changing the synchronized List to Vector doesn't help either. What happened here? Well, individual method calls of synchronized List and Vector are synchronized. But list.add() and list.remove() can be called in any order between the 2 threads. So if you print list.size() after list.add(), the output is not always 1. Sometimes it's 0, sometimes it's 2. Likewise, thread 1 may call list.add(), but before it gets a chance to call list.remove(), thread 2 gets into action and calls list.clear(). Boom, you get IndexOutOfBoundsException.

In that example above, the 3 calls to List's methods have to be atomic. They must happen together as one unit, no interference from other threads, or else we'll get the IndexOutOfBoundsException again. The fact that the individual methods are synchronized is irrelevant. In fact, we can go back to using the non-synchronized ArrayList, and the program will work, as long as we synchronize properly to make the 3 calls happen as one atomic, indivisible unit of execution:

synchronized (list) {
    list.clear();
    list.add("888");
    list.remove(0);
}

The moral of the story is that just because a class is fully synchronized, doesn't mean it's threadsafe (UPDATE: as in doesn't mean your code will be threadsafe from using it--thanks Alex). You still have to be on the look for those sequence of method calls that have to occur atomically, because method level synchronization won't help in this regard. In other words, watch what you're doing. (And yes, we should still prefer ArrayList over Vector.)

Threadsafe Iteration & ConcurrentModificationException

Sometimes it's not so obvious when exactly we're supposed to synchronize our use of Collections. Ever encountered a ConcurrentModificationException before? I bet it's probably because your code looks something like this (a.k.a.: why the for-each loop isn't such a great idea actually):

final List<String> list = new ArrayList<String>();
list.add("Test1");
list.add("Test2");
list.add("Test3");
for(String s : list) {
    if(s.equals("Test1")) {
        list.remove(s);
    }
}

ConcurrentModificationException will be thrown in this case, even when there's only a single thread running. To fix this problem, we can't use the for-each loop since we have to use the remove() method of the iterator, which is not accessible within the for-each loop. Instead we have to do this:

for(Iterator<String> iter = list.iterator(); iter.hasNext();) {
    String s = iter.next();
    if(s.equals("Test1")) {
        iter.remove();
    }
}

The point is that iteration is something that we'd probably want to happen atomically--that is, while we're iterating over a collection, we don't want other threads to be modifying that collection. If it happens most probably something is wrong with our design.

This is why if you look into the JDK source code, the implementation for Iterator usually checks the expected modification count (i.e.: how many times is this collection supposed to have been modified?) against the collection's current modification count (how many times this collection has been modified). If they don't tally, the Iterator assumes that another thread has modified the collection while the iteration is going on, so it throws the ConcurrentModificationException. (This is exactly what happens in our single-threaded case about too by the way--the call list.remove() increases the modification count such that it no longer tallies with the one that the iterator holds (whereas iter.remove() resets the mod count so they still tally.) ConcurrentModificationException is a useful exception--it informs us of a probable fault in our design.

When Volatile Fails

However, ConcurrentModificationException is not 100% reliable. The implementation of Iterator.next() may look something like this:

public E next() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    // other stuff

That is, ConcurrentModificationException is supposed to be thrown when the mod counts don't tally. But it may not get thrown because the thread on which the modCount check is running is seeing a stale value of modCount. That is, let's say you have thread A iterating through a collection. Whenever you call iter.next(), it checks that modCount == expectedModCount. But modCount may have been modified by thread B, and yet A is still seeing the unmodified value. If you remember, this is what the volatile keyword is about--it is to guarantee that a thread will always see the most recent value of a variable marked as such.

So why didn't Joshua Bloch (or whoever took his place in Sun to take care of the Collections API) mark the modCount volatile? That would at least make the concurrent modification detection mechanism more reliable, yes? Well... no. Actually marking modCount volatile won't help, because although volatile guarantees uptodateness, it doesn't guarantee atomicity.

What does that mean? Well, if you examine the implementation of ArrayList, you'll see that methods that modify the list increment the modCount (non-volatile) variable by one (i.e.: modCount++). So theoretically, if we mark modCount as volatile, whenever thread B says modCount++, thread A should always immediately see the value and throws ConcurrentModificationException.

But there is a problem: the increment operator (++) is not atomic. It is actually a compound read-modify-write sequence. So while thread B is in the middle of executing modCount++, it's entirely possible that the thread scheduler will kick thread B out and decide to run thread A, which then checks for modCount before B has a chance to write back the new value of modCount.

Hidden Iteration

As if things aren't hairy enough as they are, it's not always obvious when an iteration over a collection is happening. Sure, it's probably pretty easy to spot iteration code we've written ourselves, so we can synchronize those. However, much less obvious are the iterations that happen within the harmless-looking methods of the Collections API. If you examine the source code of java.util.AbstractCollection class, for example, you'll see that methods like contains(), containsAll(), addAll(), removeAll(), retainAll(), clear()... practically almost all of them trigger an iteration over the collection. Iteration suddenly becomes a LOT harder to spot!

So What Do We Do?

Sounds pretty hopeless, isn't it? Well... nah. A very, very smart CS professor named Doug Lea has figured it out for the rest of us. He came up with concurrent collection classes, which handle the problems listed above for the most common cases. These concurrent collections have been part of the standard Java API in java.util.concurrent package. For most cases, they are drop-in replacement for the corresponding non-threadsafe classes in java.util package, and if you haven't taken a good look at them, it's time that you do!

Written by rayfd

November 11, 2007 at 7:45 am

Posted in Java, Technology

Don’t Force Premature Processing in Your Logging Statements

with one comment

My transformation into a nagging old man is becoming more and more complete everyday I see something like this sprinkled liberally throughout the code:

// some code
log.debug("The result: " + doSomethingReallyExpensive());
// some code

The reason is that even when the log level is not DEBUG and that string ends up not being used at all, doSomethingReallyExpensive() is still evaluated, its result is still being concatenated with the string “The result: “, only to be discarded soon after. In other words, we’re wasting cycles evaluating something that is not going to be used at all, except in DEBUG mode. I’ve worked in a project where fixing these wasteful premature processing improved the performance by more than 30%.

(This doesn’t apply only to Commons Logging, which I used in the examples in this post. Here’s a similar entry in Log4j FAQ. It’s a general case of Java evaluating a method’s arguments first before executing the method itself. The same applies to C# applications using log4net, for instance.)

Fortunately, fixing it is easy. Just check whether we’re in DEBUG mode first:

if(log.isDebugEnabled()) {
    log.debug("The result: " + doSomethingReallyExpensive());
}

This way, doSomethingReallyExpensive() is only evaluated when it is needed, that is, in DEBUG mode. (Of course, it’s good to check whether doSomethingReallyExpensive() has side effects first! Or else other parts that depends on its side effects may stop working because it is no longer called when we turn off DEBUG. But then again, anybody who relies on the evaluation of logging arguments to get side effects should really be sent to Timbuktu. No, not the real Timbuktu. The one Donald Duck often goes to.)

Written by rayfd

July 5, 2007 at 12:45 am

Posted in Java, Technology

3 Simple Questions You Should Ask Yourself If You Wanna Go Places

with 5 comments

I’ve been thinking a lot lately, about why some people rise like meteors to the top, and others are stuck doing the same thing over and over again, toiling day after day from their 20s until they die. It’s amazing, when you see a group of people from similar background, how far some of them can go, leaving the rest of the group in the dust.

What makes them successful? There are many factors, of course, but what I found is that they all ask these three questions, after they’ve finished a work, a project, or anything of significance, really:

  1. What went well?
  2. What went not so well?
  3. How can I make it (even) better next time?

The ones who keep staying in the same spot year after year after year after year don’t ask themselves questions like these, unless it went REALLY wrong. Of course most people are not that bad to get it really wrong all the time, so probably they’re doing OK, which means nobody will fault them–after all they are doing their job “just fine”. But most probably that also means nobody will promote them either.

Those who keep asking these questions, especially the 3rd one, are the ones who get better all the time you meet them. It’s fun meeting them once in a while just to give yourself a “whip” to do better yourself. Those 3 questions are hard questions. They force us to think. But to me, they’re really worth the trouble. I’d really be depressed if one year from now, I find that I haven’t improved at the things I’m doing. It’s like wasting one year of my life, really.

Written by rayfd

June 10, 2007 at 4:16 am

Posted in Mind, Technology

10 Eclipse Navigation Shortcuts Every Java Programmer Should Know

with 138 comments

Man, I’m such an impatient guy. I cringe whenever I see somebody squint and frown, looking for a JSP file in Eclipse by browsing painfully through the gazillion JSPs in multiple folders in the Package Explorer. I squirm whenever I see somebody looking for a Java class by clicking through packages, one by one, backtracking if it’s the wrong package, and so on, until he sees the correct Java class.

I mean, any resource in the workspace is literally seconds away. Ditto to classes (and interfaces, and members, and so on). Why waste time and brain cycles to wade through countless lines in countless files? I thought that every Eclipse user knows this, in fact, if you’re reading this, most probably you already know this too. But thousands of Eclipse JDT users who never bother to read tech blogs in all probability will also never bother to find out what Eclipse can do for them. And it’s a pity, really, because they’re really missing out a lot. So maybe if you know one, you can forward this to them or something. Make them more productive or something, ya know. 30 seconds saved for every file can add up to really a lot!

So without further ado, let’s say you want to:

  • Open any file quickly without browsing for it in the Package Explorer: Ctrl + Shift + R. This shortcut opens a dialog box that accepts the name of the file you’re looking for. It even accepts wildcard characters, yo. Typing *-conversion.properties will give you the list of all files that ends with -conversion.properties. So everytime you want to open a file–stop that hand from going to the mouse, and press Ctrl + Shift + R instead!

Opening a resource in Eclipse

  • Open a type (e.g.: a class, an interface) without clicking through interminable list of packages: Ctrl + Shift + T. If what you want is a Java type, this shortcut will do the trick. Unlike the previous shortcut, this even works when you don’t have the Java source file in your workspace (e.g.: when you’re opening a type from the JDK).

Opening a type in Eclipse

  • Go directly to a member (method, variable) of a huge class file, especially when a lot of methods are named similarly: Ctrl + O. Say, you’re browsing through a file which has 500+ lines of code. How do you look for a method? Don’t use Ctrl + F and then type the method name. Use Ctrl + O, which gives you a list of candidates that match what you’ve typed so far. Select the member you want using the arrow keys, and press Enter. (Alternatively, if you just want to jump from one member to the next (or previous), you can use Ctrl + Shift + ↓ or Ctrl + Shift + ↑, respectively.) UPDATE: As Nick pointed out in the comments section, pressing Ctrl + O again shows the inherited members. Thanks Nick! :)

Browse Member

ctrl_o_2.jpg

  • Go to line number N in the source file: Ctrl + L, enter line number. Of course if the stack trace is in the Eclipse console, you can just click the hyperlink. But if it’s in a log file or something, just use this shortcut to go to the line in a jiffy.

Go to a line number

  • Go to the last edit location: Ctrl + Q for . If you have a big file, it’s annoying to jump from one location in line 1000+ to 2000+ only to realize after looking at line 2017 that you’ve made a mistake in that location near line 1000+ just now. This shortcut brings you right to where you last edited a file. Very handy in a big file. Gone are the days of “let’s see… where did I edit it again… nope, nope… ah there it is”. (This even works when you’re already looking at a different file.)
  • Go to a supertype/subtype: Ctrl + T. Before I found this, if I want to go to the superclass of a class, I’d go the the very top of the file, hover my mouse over its superclass, hold Ctrl, and click. Disgusting. Now I just press Ctrl + T and I get this dialog below, which toggles between supertypes and subtypes when you press Ctrl + T again.

Subtype hierarchy view

Supertype hierarchy view

  • Go to other open editors: Ctrl + E. I know you can cycle through the editors using Ctrl + F6 as well, but I prefer Ctrl + E because Ctrl + F6 has this annoying behaviour of requiring you to keep the Ctrl key down, and the distance between Ctrl and F6 is so far I have to twist my left hand to do that. Just press Ctrl + E, and either use the arrow buttons, or type the name of the file you’re editing.

Open editor

  • Move to one problem (i.e.: error, warning) to the next (or previous) in a file: Ctrl + . for next, and Ctrl + , for previous problem. No need to lift your hands off the keyboard to click on that red or yellow stripe.
  • Hop back and forth through the files you have visited: Alt + ← and Alt + →, respectively. I have to admit I don’t find myself using these two often, though.
  • Go to a type declaration: F3. Alternatively, you can hold Ctrl down and click the hyperlinked variable or class or whatever it is the declaration of which you want to see–but why lift your hand off the keyboard? Just press F3 and Eclipse will bring you to the declaration of whatever is at the cursor at that moment.

OK, that’s it for this post. There are tons of other Eclipse shortcuts not covered by this article. To see the whole list, just open up your Eclipse (I’m assuming Eclipse 3.2 here–in older or more recent versions this may differ slightly), go to Help → Help Contents → Java Development User Guide → Reference → Menus and Actions. The whole motherload is there, from generating comments, correcting indentations, surrounding with, and so on.

The point I’m trying to get across is: Eclipse has a LOT of shortcuts to make things real easy for you. Java (or heck, any software) development is hard. We shouldn’t make it harder on ourselves by fighting our tools! Let our tools help us as much as possible, so we all can go back on the dot and spend more time with our family, lovers, or whatever it is we want to spend more time on. There’s no honour in working hard inefficiently. Only disgrace.

Written by rayfd

May 20, 2007 at 6:22 am

Posted in Java, Technology

The Truth About Positive Thinking

with one comment

I always cringe when positive thinking gurus go around telling people, “There are no limits! You are only limited by your own thoughts! You can do anything and achieve anything as long as you think you can!”

Because I, the skeptics, and the negative thinkers know, that the world doesn’t work that way.

For instance, I know that no matter how hard I train, how many techniques I practice and drill to perfection, how determined and positive I am, I’ll never be able to beat Mike Tyson at his peak (heck, or Mike Tyson NOW) at boxing. My talent, my muscularity, my reaction time to evade and time punches, and so on, are simply not there. If I have to beat Iron Mike in something, I’ll choose writing software. Somehow I’m quite confident I’m better than him in that one.

Similarly, no matter how correct their diet, how spotless their routines, how flawless their lifting techniques, 99.9999% of the guys out there will never have biceps as big as Arnold Schwarzenegger’s. Their genes simply don’t permit that. Whereas the ones with the right genes go on to become the next Ronnie Coleman or Dorian Yates.

Have you ever tried so hard at something, only to see a “natural” breeze through all your obstacles in a tenth of the time you needed to get through them? Of course you have. It’s annoying, I know. But that’s also a fact of life. I think an important part of growing up is finding out that you WILL find a ceiling, and be OK with that and accept that. But an even more important thing is to really find out whether the ceiling you are seeing and hitting against is your true ceiling, your true limitations, or a false, fake, self-imposed ones.

That, is the truth of positive thinking. To find out your true ceiling. It is NOT true that there are no limits. That’s bullcrap. Everybody has limitations. But you’d better be damn sure that you’ve developed yourself to your own full potential, instead of hitting a ceiling that is put there by your insecurity, your fear, your laziness, your negative thinking, or whatever.

This can be quite disheartening at first. Because if you’re like most people, as you grow up, you get to witness your (perhaps unrealistic) childhood dreams being shattered one by one. And in the process, it’s easy to fall into the so-what’s-the-point-I-might-as-well-stop-improving-right-now thinking pattern.

Why do you think parents are so often charged guilty of telling their children to be “realistic” and to “forget their big dreams”? Because 99.9999% of the parents have gone through this before. Only 0.0001% have continued to develop themselves and find that their limits are higher than everyone else in the world–they become Olympian gold medalist, world-class musicians, Tiger Woods, Warren Buffett, and so on.

In fact, before I figured this out, I used to be quite negative about this whole thing. I used to think: what’s the point of even trying when the best I could ever do, to my full potential, is most probably only mediocre?

And now I think I have the answer.

  • Cos until you’ve tried it and do it to the best of your ability, you won’t know. In all probability your true ceiling might be the highest in the world. You might be world-class at something, you just don’t know it yet. Say, can you memorize the first 21 digits of Pi? Then you’re already world-class.
  • You keep doing it because you like it, even when you’ve found that you’re far from world-class level. I love programming, and it’s something that I’ll still do when I’m 60. Even if I’ll probably will never be the world’s best programmer. But I don’t care. I like it.

(Or probably I am the world’s best programmer. Since I’m so humble.)

So what the heck are those positive thinking, supposedly self-improving gurus are for? The good ones help to find out your true capacity. Because I suspect we never really get to know how high the true ceiling of our potential really is. It’s not limitless, surely. But it may also be much higher than we’ve ever dreamed of. And we need the gurus to lie to us and keep telling us there’s no limit so we won’t stop until we’ve hit our real limit. And that probably means that we shouldn’t stop, ever.

Written by rayfd

May 13, 2007 at 1:11 am

Posted in Mind, Technology