Whew, Quay 1.1fc1 is finally out. Too tired to explain details just now. Film at 11.
Whew, Quay 1.1fc1 is finally out. Too tired to explain details just now. Film at 11.
Release of Quay 1.1 has been stuck at the “one more bug” stage, as I’ve mentioned. Today I’ve finally unterstood what is happening, and a fix should be easy – I only have to rebuild most of my inter-thread communication. Herewith the cautionary tale.
Quay 1.01 was (as its help file boasted) “just a simple Cocoa app”. Indeed, clicking on a Quay item in the Dock opened the “QuayMenu” background application (which uses the NSUIElement flag). QuayMenu was optimized for quick startup – it read that item to get the folder it was pointing to, read the folder contents, showed a popup menu with them, and quit again when the menu was closed. Nothing easier; a very linear process.
Then in 1.1, all this changed. Now QuayMenu runs all the time in the background. There are two threads (plus a few more that don’t matter for this discussion). One thread listens (via Quartz Event Taps) for clicks on the Dock’s icons, finds out which icon it was, and if Quay should handle it, does a performSelectorOnMainThread: passing data about the icon to the main thread.
The main thread, in its turn, waits until it gets such an event and then pretty much does what the original 1.01 did: puts up a menu with the folder contents. It also shows an arrow over the clicked icon, both to show which icon it was and to allow the user to distinguish Quay menus from Dock menus.
Of course the arrow has to be in a window to be shown onscreen. 1.01 had an invisible window, since a popup menu has to be associated with a window – but now the window has to be shown only when the menu is onscreen, and hidden again when the menu is taken down.
Then around 1.1b2 I got complaints that, after a menu was canceled, the application that was active at the time of the initial wasn’t properly restored to the background – easy to fix by hiding the QuayMenu app after the menu went down. Soon after that, I thought that it would be prudent to imitate the Dock behavior when successive icons were clicked; that is, clicking on one icon, then on another, would produce two successive menus without any need to cancel the first one. And it also would be nice to be able to cancel a long menu while it was being built.
OK, so I introduced some inter-thread communication – mostly some flags and a cancel message – to take care of that. And after some fiddling it seemed to work, some conveniently ignored exceptions excepted. Then my MacBook Air arrived (my first multi-core Mac), and I noticed that the exceptions were more frequent; in fact, about 1 in 10 times, when I clicked on a second icon the first memu was taken down, and the second one either didn’t appear at all or it only flickered onscreen very briefly.
Well, this of course indicated a threading problem – as my experience with interthread communication was a little limited, that was to be expected – and I tried to find out where I was doing something dumb like not properly locking a shared resource or whatever. Since the bug never happened when I tried to break somewhere and single-step from there, it was indeed related to timing. But where?
Tons of NSLog()s were sprinkled about and everything seemed to be working properly. The messages from the first thread left at the proper time, the main thread caught them, the first menu was closed when necessary, the second menu was popped up as expected, but it mysteriously closed down immediately (1 in 10 attempts) or not (9 in 10 attempts). Everything else was as expected. Moreover, no canceling message was sent, or left over, that could close the second menu! It… just curled up and closed. Argh.
Finally – and it was really dumb of me not to think of that immediately – it occurred to me to look at the value returned to my Carbon Menu event handler for the kEventMenuEndTracking event. (Yes, I have a Carbon event handler because for complex UI reasons I need more control – and a shorter build time – than a Cocoa menu could give me.) Turns out that 9 out of 10 times the menu gets closed by kHIMenuDismissedBySelection (when the user picks a menu item); by kHIMenuDismissedByUserCancel (when the user closes the menu without selecting any item); or by kHIMenuDismissedByCancelMenuTracking (when my first thread canceled the menu because the user clicked on another Dock icon).
However – and here was the smoking gun – 1 out of 10 times the menu turned out to close by kHIMenuDismissedByAppSwitch (meaning QuayMenu went into hiding to put another application into the foreground). Now, that should indeed be happening briefly after the first menu was closed, but before the second menu was opened QuayMenu should jump into the foreground again. Now that I considered it, that’s a somewhat costly process that should be optimized out. But why was the application hiding after the second menu popped up instead of before? The sequence of events shouldn’t allow that at all!
Well, at least now I had a clue of where to put my NSLog()s and after some hours of sweating I had the answer: putting all that window hiding and showing in with application hiding and showing, and trying to optimize that by balancing the load over two threads – in brief, putting all urgent stuff into the first thread and all the potentially slow stuff and UI into the main thread – was fundamentally broken. It shouldn’t have worked at all, ever! So why was it failing only 1 in 10 times?
At this point I was visiting my mother and I was attempting to explain to her what was going on. Now, my mom is 95 – she’s still very sharp but knows absolutely zilch about programming. On the other hand, she’s an excellent work optimizer. I suppose 80+ years of kitchen duty will do that to you. At any rate, I was trying to explain what I was doing by resorting to a metaphor.
Assume two blind cooks working in a restaurant. The specialty is pancakes. An order for a pancake comes in to the first cook. He checks out (by Braille I suppose) what goes into the pancake, throws the ingredient into his pan, waves them briefly over his stove, then with a practiced flip of his hand he flicks the pancake to the second blind cook who’s supposed to finish the pancake.
Since they’ve practiced this a lot, and worked out some sort of signals, the second cook knows when to put up his frying pan just in time and in the right position to catch the pancake flying into his direction, and after that it’s easy. He tosses the pancake to a waiter who in turn serves it to the client.
Now, what was happening is that when a second order comes in before the first pancake is finished the second cook has to clean his pan. OK, we may have ridden the metaphor here until it breaks down (hehe) but let’s assume that he usually manages to do so, and proceeds to finish the second pancake in peace. Maybe the client has changed his mind about what sort of pancake he wanted, or something.
Only 1 times in 10 something weird happens and the second pancake is also thrown out unfinished. The first cook notices nothing amiss; he’s getting orders, setting up, and throwing the pancakes. The second cook notices nothing amiss; he’s catching the pancakes like he’s supposed to, cleaning his pan and all when necessary. It’s just 1 in 10 clients that complain because they never get their pancake!
Well, said my mom, so who was stealing the pancake then? The only explanation I found was that the kitchen shouldn’t have worked at all – it was working just by coincidence! In other words, the second cook wasn’t tossing the pancake to the waiter; he was tossing it out of the window! It just happened that at that crucial time a providential windmill wing swung by in front of the window, the pancake bounced off that and, who’da thought, right on the waiter’s plate!
Of course such a thing can’t work reliably and now and then one of the cook’s timing was off just enough to make the pancake miss the windmill wing, and the pancake would be gone with none of them noticing.
So my mom and I had a good laugh (“I can’t believe you actually think of your work that way!”, she said) and I went home. Tomorrow I must redo the whole kitchen routine. Stand by for news.
In other news, I had a couple more requests for handling “Smart Folders” in Quay. I’d looked briefly at the option before – I don’t use them myself – and had pushed it off to the next version. This time, however, curiosity made me look up how I might implement them if I had time to do so – and less than 2 hours later I had them working in the upcoming 1.1! Wouldn’t have been possible without Cocoa, of course.
Regarding 1.1, I’ve been stuck at the “just one more bug” stage for over a week. Meanwhile several smaller bugs were solved, the French localization (courtesy of Ronald Leroux) is finally finished and revised, and I really truly hope to release 1.1 over this weekend. Stay tuned.
Based on a question from Matt Legend Gemmell, we seized the occasion to write some source code to quickly sweep over a given folder and subfolders, getting file attributes and even contents where necessary. See details on my source code page.
This is based on some code I had handy in Quay, generalized for easy adoption. As usually happens with these side tracks, it also helped me find a couple of bugs in my original code, and I learned some new tricks.
Work on Quay itself is still going on, and I’m very near the 1.1b4 release, or perhaps I should call it 1.1fc1. I had some delay because of problems with my main development machine – the trusty iMac G5, now getting long in the tooth – since its occasional habit of turning off spontaneously suddenly got way too frequent.
Fortunately Jim Correia, a fellow developer, pointed me at an Apple note about this very problem, and it seems my machine is covered under the repair program. If all goes well I’ll have a new power supply next week, and that should solve it. Thanks Jim!
I am the Supreme Nerd God
Yes, it’s official: And what’s more: So I’m dorkier and nerdier than Rainer Brockerhoff, except in history/literature where Rainer beats me by 10%, but I think that’s more than made up for by the 23% extra on sci-fi/comic. Result!…