Mouse follows window GNOME extension

tl;dr

I wrote a GNOME extension to make focus follows mouse play nicely with keyboard window manipulation shortcuts. the Mouse follows window GNOME extension. This post gives a bit of background and walks you through the implementation.

Building stuff to make my life easier is fun! You can do it, too!

The problem

When I changed my employer at the beginning of the year, I was happy that they let me have a laptop with Linux on it. It came with Ubuntu 22.04 preinstalled, which is not the most recent version, but since this is the only Linux distribution and version supported by the IT department, I'm living with it. Ubuntu 22.04 comes with GNOME 42.9 by default. At first, I thought I might install i3wm, which is what I use on my personal machine, but actually, GNOME isn't too bad, and some things just work out of the box, so I decided to stick with it.

One thing I really like about i3wm, a tiling window manager, is the ability to shuffle windows around with keyboard shortcuts. And focus follows mouse. I really can't use a desktop environment where I have to click on windows to focus them. I want to point my mouse at a window and start typing into it. When using the laptop keyboard, this is just a quick movement with my thumb over the touchpad.

Luckily, GNOME supports focus follows mouse and has shortcuts for what they call ‘split left’ and ‘split right’, which means make a window cover the left (or right) half of the screen, as well as maximizing and minimizing windows and moving them between monitors. So far, so good.

Unfortunately, these two features do not play well together. When I am working in a window on the left half of the screen and split it to the right (e.g., because I want to look at something in a window underneath the active window), the mouse pointer stays on the left half of the screen, and the focus switches to whatever is underneath it now. This is typically not what I want. What I really want is for the mouse pointer to move with the window – and the focus to stay in the window. This is the behaviour of i3wm, and I want to replicate it in GNOME.

Also, sometimes splitting a window to a half of the screen that is already occupied by another window results in the moved window being hidden behind the other window. I guess the compositor doesn't change the z index when moving a window this way. This annoys me, and I want to fix this, too.

The solution

After some searching around, I decided that what I needed was a GNOME extension that watched for these window ‘split’ operations and would then warp the mouse pointer so it stayed in the window.

Another option would have been to implement the ‘split’ operation myself and move the window and the mouse pointer together. I may still do that because it would allow me to add custom window positions, but for the moment, left and right half are enough for me, and I want to keep things simple.

Before I come to the implementation, there is one important detail to consider: the system is running Wayland.

Interlude: Mouse pointer warping in Wayland

Wayland is more secure than X11, and one consequence of this is that moving the mouse pointer around programmatically is not that simple. Basically, you have to emulate an input device, which is a bit more work than I wanted to take on for such a small feature.

Fortunately, others have already done the work. There are tools that do exactly this. They emulate input devices and generate input events. One such tool is ydotool. Another is input-emulator. Both offer a CLI that accepts coordinates and moves the mouse pointer.

For my use case, ydotool would have been better because it supports absolute positioning: give it an x/y coordinate, and it will warp the mouse pointer to this position. When I tried ydotool, however, it always moved the mouse pointer to the left margin of the screen. The vertical position (y) was correct, but x was always 0, no matter how I called it. When I tried ydotoold, a daemon belonging to ydotool that provides a persistent virtual input device in the background, it segfaulted on me. Rather than trying to analyze this, I decided to go with input-emulator instead, which did work. The only downside is that input-emulator only supports relative positioning, i.e., I cannot simply say, ‘go there’. I have to calculate the target position relative to the current position of the mouse pointer.

Now that I know how to control the mouse pointer, I can implement the extension.

The implementation

You can find the implementation at github.com/sebhans/mfw-gnome-extension. It is JavaScript and basically consists of a little boilerplate and a single class with about 100 SLOC.

The idea is to track all window geometries (positions and sizes) and which window is focused. When the geometry of the focused window changes and the change is larger than 50 pixels, we assume the window has been moved by a keyboard shortcut[1] and check the position of the mouse pointer. If the mouse pointer is still inside the window, we do nothing (since the window will still have the focus, there is no need to make the pointer jump around). If the mouse pointer is outside the window, however, we move it to the center of the window. And we also ensure that the window is in the foreground to get rid of the ‘hidden window’ phenomenon mentioned above.

Watching for position/size jumps instead of just checking whether the window covers half the screen lets the extension handle other window movements that would result in focus loss, like unmaximizing a window and moving a window to another monitor, without extra effort.

Here is a walkthrough of its methods (leaving out some debug statements).

constructor() {
  this.windowCreatedSignal = null;
  this.windowSignals = new Map();
  this.windowGeometries = new Map();
  this.focusedWindowId = null;
}

The constructor initializes the data structures we are going to use:

enable() {
  Util.spawn(['/usr/local/bin/input-emulator',
              'start', 'mouse',
              '--x-max', '5000',
              '--y-max', '5000']);
  this.windowCreatedSignal =
    global.display.connect(
      'window-created',
      this._onWindowCreated.bind(this)),
  this._connectAllWindows();
}

This method is called when the extension is enabled. It sets up input-emulator to emulate a mouse. This starts a background process that handles the virtual device and allows us to perform pointer movements later on. The parameters --x-max and --y-max specify the maximum coordinates to use. The default is too small for my monitors. 5000 for each is somewhat arbitrary, but I use the laptop in various setups with external monitors with different resolutions, and 5000 works for me.

The method also registers _onWindowCreated() as a signal handler that will be called whenever a new window is created (so we can track it) and immediately begins to track all already existing windows with the helper method _connectAllWindows().

_connectAllWindows() {
  global.get_window_actors().forEach(actor => {
    this._connectWindowSignals(actor.meta_window);
  });
}

_connectWindowSignals(window) {
  if (this.windowSignals.has(window)) {
    return;
  }

  let signals = [
    window.connect(
      'focus',
      this._onFocusWindowChanged.bind(this)),
    window.connect(
      'position-changed',
      this._onWindowChanged.bind(this)),
    window.connect(
      'size-changed',
      this._onWindowChanged.bind(this)),
  ];

  this.windowSignals.set(window, signals);
}

Together, these two methods set up signal handlers for every existing window. As a result, the handler _onFocusWindowChanged() will be called whenever another window receives the focus, and the handler _onWindowChanged() will be called whenever a window changes its position or size. The size is relevant because the window could already be at the correct position, but still lose focus, e.g., because it changes from maximized (at position 0/0) to unmaximized, still at position 0/0, but much smaller, so that the mouse pointer is now outside the window.

The signals returned by the connect() method are saved in the map windowSignals so we can clean them up later (as is the global 'window-created' signal in the enable() method above).

disable() {
  if (this.windowCreatedSignal) {
    global.display.disconnect(
      this.windowCreatedSignal);
    this.windowCreatedSignal = null;
  }
  this._disconnectAllWindows();
  this.windowGeometries.clear();
  Util.spawn(['/usr/local/bin/input-emulator',
              'stop', 'mouse']);
}

_disconnectAllWindows() {
  this.windowSignals.forEach(
    (signals, window) => {
      signals.forEach(signalId =>
        window.disconnect(signalId));
    }
  );
  this.windowSignals.clear();
}

The method disable() is called when the extension is disabled. It is supposed to free all resources used by the extension, which in our case means disconnecting all signal handlers we set up in enable() and _connectWindowSignals() and stopping the input-emulator background process responsible for our virtual mouse.

So far, so good. The extension will now come up, register all necessary handlers to be notified of window changes, and shut down gracefully. But up to now, it can only handle windows that existed when the extension was enabled (which will probably be none if the extension is loaded at the start of the GNOME session). We need to change that. This is what the _onWindowCreated() handler is for.

_onWindowCreated(display, window) {
  switch (window.get_window_type()) {
    case Meta.WindowType.MENU:
    case Meta.WindowType.DROPDOWN_MENU:
    case Meta.WindowType.POPUP_MENU:
    case Meta.WindowType.TOOLTIP:
    case Meta.WindowType.NOTIFICATION:
    case Meta.WindowType.COMBO:
      return;
  }
  this._connectWindowSignals(window);
  this.windowGeometries.set(
    window, window.get_frame_rect());
}

This handler is called whenever a new window is created. We want to set up our window handler for this new window, too (and remember its geometry while we are at it). There's just one minor complication: there are several types of windows, and not all of them are relevant for us. In my first experiments, I didn't pay attention to the window type and was surprised when the mouse pointer jumped around while I tried to delete a row in a table in LibreOffice Writer via the context menu. I don't know exactly why, but menus, tooltips and the like tend to trigger our ‘jumping window’ reaction, and since shooting those kinds of windows around the screen via keyboard doesn't make sense anyway, we may as well filter them out right from the start.

_onFocusWindowChanged(window) {
  this.focusedWindowId =
    window.get_id();
}

This handler is called whenever a window receives the focus and just remembers its ID for later.

_onWindowChanged(window) {
  let frame = window.get_frame_rect();
  if (window.get_id() != this.focusedWindowId) {
    this.windowGeometries.set(
      window, frame);
    return;
  }

  let oldFrame =
    this.windowGeometries.get(window);
  if (oldFrame &&
      this._hasWarped(frame, oldFrame)) {
    this._ensureMouseIsIn(window);
  }
  this.windowGeometries.set(
    window, frame);
}

This handler is called whenever a window changes its position or size. We only need to do anything interesting if it is the focused window (this check is what we kept track of this for). If it is not, we just remember its new geometry and return immediately.

If it is, we check to see whether the window has jumped (‘warped’ I called it in the code) and if so, we call a helper method to move the mouse pointer if necessary. And we also remember its new geometry.

const WARP_DISTANCE = 50;

_hasWarped(frame, oldFrame) {
  return Math.abs(frame.x - oldFrame.x) >= WARP_DISTANCE
      || Math.abs(frame.y - oldFrame.y) >= WARP_DISTANCE
      || Math.abs(frame.width - oldFrame.width) >= WARP_DISTANCE
      || Math.abs(frame.height - oldFrame.height) >= WARP_DISTANCE;
}

_hasWarped() implements our warp detection. We consider the window to have warped if any of its coordinates or dimensions has changed by at least our 50 pixel threshold.

_ensureMouseIsIn(window) {
  const frame = window.get_frame_rect();
  let [mouse_x, mouse_y] = global.get_pointer();
  if (mouse_x >= frame.x &&
      mouse_x < frame.x + frame.width &&
      mouse_y >= frame.y &&
      mouse_y < frame.y + frame.height)
    return;

  window.activate(global.get_current_time());

  let target_x = frame.x + frame.width / 2;
  let target_y = frame.y + frame.height / 2;
  let dx = target_x - mouse_x;
  let dy = target_y - mouse_y;
  Util.spawn(['/usr/local/bin/input-emulator',
              'mouse', 'move',
              `${dx}`, `${dy}`]);
}

This is the real meat of the extension. This function ensures that the mouse pointer is in the newly warped window. First, we check whether the pointer is still inside by comparing its coordinates with the area covered by the window[2]. If so, there is nothing to do and we return immediately.

If the coordinates of the mouse pointer are outside the window, we first activate it, which includes bringing it to the foreground to that it cannot be hidden behind another window. Then we calculate the target coordinates (where we want the mouse pointer to be) at the center of the window and the difference of this position to the current position of the mouse pointer (how far we have to move it). And lastly, we call input-emulator to perform the actual movement.

That's it for the code. I noticed some improvements I could have made while writing this post[3], but I'm letting it stand for now. I've been using the extension for a whole workday now, and I'm pretty pleased with it.

Limitations

I wrote this extension to scratch an itch with minimal effort. As a result, it is narrowly tailored to my use case. It is also the first GNOME extension I have ever written, which is certainly not without consequence. These are the limitations I am currently aware of:

Resources

Here is a collection of links I found useful while I tried to get the extension working (in addition to those mentioned above):

Conclusion

While the extension is very basic and limited in several ways, it has improved my daily life a lot – and I write this on the first day of using it. I'm glad I went to the trouble. Now that I know how GNOME extensions work, I'm probably going to tweak my desktop even more.

I firmly believe that computers are there to support people and not the other way around. If my interaction with the computer is not satisfactory, it is the computer that has to change, not me. And the best thing about being a developer in an open source environment is that I can change it.

And so can you. What's your itch, and will you scratch it?


Footnotes

  1. Experimentation has shown that the changes when dragging or resizing a window with the mouse are reported at a pretty fine granularity, even when I am moving the mouse as fast as I can. In practice, 50 pixels work very well as a threshold. ↩︎

  2. For the window frame, x and y are the coordinates of the top left corner, so x+width and y+height are the coordinates of the bottom right corner. ↩︎

  3. Use const more, for example, and apply the window type filter to the initial connection, too. ↩︎

  4. But not trivial. input-emulator requires access to /dev/uinput, which is restriced to root by default. See the instructions coming with input-emulator for how to set this up. ↩︎