One of the reasons for my taking a week or two off to work mostly on the just-released Klicko was that I like to rework, and group together, code snippets that worked well for me in earlier applications, and see if I can update them to conform to my slowly growing experience. I’m also prone to digress; one such digression (several months!) resulted in the release of the unexpectedly popular RBSplitView.
Both Klicko and Quay use code that, like RBSplitView, were destined for XRay II, the supposed successor of XRay, which sadly is not reliable under Leopard. Alas, at this writing, it sounds like XRay II will remain in the freezer, its mummy mined for code snippets and general philosophical experience… but the basic idea persists, and something quite equivalent (but also quite different in detail) is already being conceived.
Both Quay and Klicko do part of their seeming magic with a technology called “Quarz Event Taps” (PDF file). This was introduced in Tiger, and perfected in Leopard. Briefly, an event tap is a C callback routine that is called to filter low-level user input events at some points in the system’s event processing, which is actually quite complex. Events can be generated, examined, modified or even suppressed before they’re delivered to an application. Since user input events are usually routed to the foreground window (that is, to the foreground application, even if it has no window), this makes event taps quite powerful.
You can make a global event tap, or a per-process tap. Quay sets up a tap on the Dock process to intercept clicks on Dock icons. Klicko uses a global tap to check for clicks on background windows.
Tapping one application is, in principle, easy: you locate the application to be tapped by its PSN (Process Serial Number), set up the tap, tie it to your application’s main run loop, and that’s it. Here’s what a bare-bones implementation would look like:
// This is the callback routine, called for every tapped event.
CGEventRef ProcessEvent(CGEventTapProxy tapProxy, CGEventType type, CGEventRef event, void *refcon) {
switch (type) {
case kCGEventLeftMouseDown:
// process a mouse down
break;
case kCGEventLeftMouseUp:
// process a mouse down
break;
}
return event; // return the tapped event (might have been modified, or set to NULL)
// returning NULL means the event isn't passed forward
}
// Here's how you set up the tap: we're catching mouse-down and mouse-up
...
ProcessSerialNumber psn;
// get the PSN for the app to be tapped; usually with the routines in <Processes.h>
...
CFMachPortRef tapg = CGEventTapCreateForPSN(&psn, kCGTailAppendEventTap, kCGEventTapOptionDefault,
CGEventMaskBit(kCGEventLeftMouseDown)|CGEventMaskBit(kCGEventLeftMouseUp),
ProcessEvent,NULL);
if (!tapg) { // bail out if the tap couldn't be created
NSLog(@"application tap failed");
[NSApp terminate:nil];
}
CFRunLoopSourceRef source = CFMachPortCreateRunLoopSource(kCFAllocatorDefault, tapg, 0);
if (!source) { // bail out if the run loop source couldn't be created
NSLog(@"runloop source failed");
[NSApp terminate:nil];
}
CFRelease(tapg); // can release the tap here as the source will retain it; see below, however
CFRunLoopAddSource(CFRunLoopGetCurrent(), source, kCFRunLoopCommonModes);
CFRelease(source); // can release the source here as the run loop will retain it
After that, all should work – in principle. The devil is in the details. Here’s how you locate a running application by its application ID and return its PSN:
BOOL GetPSNForApplicationID(NSString* appid, ProcessSerialNumber* outPSN) {
outPSN.highLongOfPSN = outPSN.lowLongOfPSN = kNoProcess;
while (GetNextProcess(outPSN)==noErr) {
NSDictionary* pdict = [(NSDictionary*)ProcessInformationCopyDictionary(&psn,
kProcessDictionaryIncludeAllInformationMask) autorelease];
if ([[pdict stringForKey:(id)kCFBundleIdentifierKey] isEqualToString:appid]) {
return YES;
}
}
return NO;
}
To make a global tap, you don’t need a PSN. Just use the following tap creation call instead:
CFMachPortRef tapg = CGEventTapCreate(kCGAnnotatedSessionEventTap, kCGEventTapOptionDefault,
CGEventMaskBit(kCGEventLeftMouseDown)|CGEventMaskBit(kCGEventLeftMouseUp),
ProcessEvent,NULL);
More details. If you’re tapping an application, it may not be running; CGEventTapCreateForPSN will return NULL in that case. Or it may quit while you have the tap set up. You probably want to monitor that process and either quit, or rerun the application, or wait for it to come back up. In the latter cases, you’ll have to back out of the now-dead tap carefully:
CFMachPortInvalidate(tapg);
CFRunLoopRemoveSource(CFRunLoopGetCurrent(),source,kCFRunLoopDefaultMode);
CFRelease(tapg); // this CFRelease has been moved from before the CFRunLoopAddSource
supposing, of course, that you have held on to those two variables. Note how the CFRelease(tapg) should, in such a case, happen only after the source has been removed from the run loop; otherwise invalidating the tap will cause the run loop to crash. You can use the same technique to close a global event tap, though usually there’s no need; if your app crashes or quits, the tap will be closed automatically.
However, there’s a serious problem while debugging an event tap. If you’re tapping a single application, and set a breakpoint inside yours (or break into the debugger anywhere because of a crash or exception), both applications will stop. If the same happens while a global tap is active, the entire system stops accepting user input! The only way to recover is to ssh/telnet in from another machine, and kill Xcode. So even if you prefer NSLog/printf calls to breakpoints, this will be very inconvenient for all but the simplest callback code.
The solution I found was to always use an application tap while debugging. An easy way is to define, as I always do, a special macro inside the main project build configuration panel (but for the debug configuration only): inside the “Preprocessor Macros Not Used In Precompiled Headers” (aka GCC_PREPROCESSOR_DEFINITIONS_NOT_USED_IN_PRECOMPS) write “DEBUG”, and then, instead of the global tap, compile in an application tap on some always-present application (like the Finder) by using #ifdef DEBUG/#else/#endif statements.
Even that isn’t always sufficient, as Xcode 3 notoriously may invoke the debugger (even on your release build!) if your app crashes. You must either get used to never clicking on “Build & Go” for your release build, or you must make a runtime check for the debugger. The latter will prevent inadvertent freezes, but if you forget to take it out before deployment, your application will behave oddly if a curious user runs it under a debugger.
This post is already too long, so I’ll talk only briefly about what you can do inside the event tap callback itself. Every possible execution path should be short and contain no long loops or wait points. If you’re just watching events, always return the same event passed in as parameter. Return NULL if you want to suppress an event; however, be careful to suppress entire event groups. For instance, if you decide to suppress a mouse-down, store the event number and also suppress subsequent mouse-dragged and mouse-up events with the same number; otherwise the destination application may behave oddly. Some apps may behave oddly when tapped, by the way.
Update: I previously said here that to intercept or generate keyboard events, your application must run with setgid 0 (the “wheel” group). I was mistaken; my apologies. Your application must run setuid root to make an event tap at the third tap point (which I didn’t mention here), which is where the events enter the window server (kCGHIDEventTap).