Vim hanging while running VMWare: fixed

I’ve just fixed a bug on my linux desktop which had been annoying me for a while. Since there seems to be little online written about it, here’s a blog post to help future Googlers.

Here’s the symptoms: while you’re running VMWare, your Vim editing sessions freeze up for 20 seconds or so, roughly every 5 minutes. The editor is entirely hung.

If you strace -p the process ID before the hang occurs, you’ll see something like this:

select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
_llseek(7, 4096, [4096], SEEK_SET)      = 0
write(7, "tp\21\0\377\0\0\0\2\0\0\0|\0\0\0\1\0\0\0\1\0\0\0\6\0\0"..., 4096) = 4096
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost -isig -icanon -echo ...}) = 0
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
_llseek(7, 20480, [20480], SEEK_SET)    = 0
write(7, "ad\0\0\245\4\0\0\341\5\0\0\0\20\0\0J\0\0\0\250\17\0\0\247"..., 4096) = 4096
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost -isig -icanon -echo ...}) = 0
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
fsync(

In other words, the hung process is sitting in an fsync() call, attempting to flush changed data for the current file to disk.

Investigation threw up the following: a kerneltrap thread about disk activity, poor responsiveness with Firefox 3.0b3 on linux, and a VIM bug report regarding this feature interfering with laptop-mode and spun-down hard disks.

VMWare must be issuing lots of unsynced I/O, so when Vim issues its fsync() or sync() call, it needs to wait for the VMWare I/O to complete before it can return — even though the machine is otherwise idle. A bit of a Linux kernel (or specifically, ext3) misfeature, it seems.

Synthesising details from those threads comes up with this fix: edit your ~/.vimrc and add the following lines —

set swapsync=
set nofsync

This will inhibit use of both fsync() and sync() by Vim, and the problem is avoided nicely.

Update: one of the Firefox developers discusses how this affects FF 3.0.

This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

12 Comments

  1. David Malone
    Posted March 12, 2008 at 20:28 | Permalink

    Will that prevent vim from calling fsync after you :w a file? If so, that’s potentially dangerous – I’ve frequently done things like this:

    vim module.c

    make && make install

    kldload module

    Resulting in a kernel panic ‘cos I didn’t get the code quite right. Without an fsync of the module.c file, there’s a good chance that you’ll end up with an empty module.c file, ‘cos vim truncated it before rewriting it and the writes haven’t happened but the truncate has.

  2. Posted March 12, 2008 at 21:27 | Permalink

    No matter how many times I stared at the code I couldn’t see a face? What am I doing wrong?

    select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout) select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout) select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout) _llseek(7, 4096, [4096], SEEK_SET) = 0 write(7, “tp\21\0\377\0\0\0\2\0\0\0|\0\0\0\1\0\0\0\1\0\0\0\6\0\0″…, 4096) = 4096 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost -isig -icanon -echo …}) = 0 select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout) _llseek(7, 20480, [20480], SEEK_SET) = 0 write(7, “ad\0\0\245\4\0\0\341\5\0\0\0\20\0\0J\0\0\0\250\17\0\0\247″…, 4096) = 4096 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost -isig -icanon -echo …}) = 0 select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout) fsync(

  3. Posted March 12, 2008 at 21:46 | Permalink

    Dave, sounds like you might have the one scenario where you’re better off with the fsync(). Me — I’m not expecting kernel panics in my code, so not so much ;)

  4. David Malone
    Posted March 12, 2008 at 21:54 | Permalink

    I guess I may not have described typical vim usage ;-)

    I did once panic a machine with a perl script – I never understood quite how, but…

  5. Posted March 12, 2008 at 23:31 | Permalink

    Looks from the thread that noatime or relatime could be a big help.

    What if you want to sync when you actually :w or :wq, but not when Vim decides on its own to write to a file?

  6. Nix
    Posted March 13, 2008 at 01:32 | Permalink

    Um, what about everything else that fsync()s? It’s a pretty common operation: all sorts of things do it from databases to MTAs to some games (!) and they all still freeze solid…

  7. Posted March 14, 2008 at 21:06 | Permalink

    Don: I’m using noatime, but it doesn’t help. If it was to sync at :wq time, it’d still hang while that was happening…

    Nix: that’s true — however, Vim is a very interactive app, and I haven’t run into any other app on my desktop that uses fsync() noticeably.

  8. Posted May 21, 2008 at 23:04 | Permalink

    Hey Justin, I’ve started getting mails for comments on posts on your site that I haven’t replied to (and so haven’t ticked the “mail me follow-ups”), including this one!

  9. Posted May 22, 2008 at 09:39 | Permalink

    Jon: oh great ;) I’ll see if I can spot anything…

  10. Posted June 28, 2008 at 16:11 | Permalink

    Nix: then report them as bugs (except for the databases, but if you’re running a database, you know to use the data=… options to mount and/or more appropriate filesystems and filesystem settings, don’t you?), because they are bugs.

    The user knows best. If they are about to modprobe an experimental kernel module, then let them sync manually — that’s what I do. When I’m editing certain files in xemacs, and I save them when I’m running on laptop_mode, then xemacs doesn’t sync and I don’t get a HD spinup and I remain happy. But if I’m editing my thesis, I’m on the road, and I just made a big rearrangement of text, I’ll issue a sync in the nearest xterm.

  11. Nix
    Posted June 28, 2008 at 17:11 | Permalink

    TimC, it’s not considered a bug for things to fsync(). If it is best for the machine that it not spin its disks up even when fsync()ing, because it’s a laptop and systemwide data integrity is considered less important than power savings, then that is a systemwide policy decision best made by flipping some kernel knob to deactivate fsync() or delay it appropriately (and the Linux kernel has a knob, laptop_mode, that could have this behaviour added to it).

    The bug is not that the app fsync()s; it’s that fsync() is equivalent to a full disk-wide sync() on ext3. That’s bad.

    The right thing to do is probably to force such things into the journal ASAP, cooperating with the block layer to make sure that the syncs, and necessary preceding block allocations, get there as fast as possible. Unfortunately doing this properly feels to me very much like softupdates (`work out the dependencies between fs operations and arrange them accordingly’ being the core of softupdates), and softupdates are such a bastard to get working right that only one filesystem under one OS has ever managed it.

  12. Posted June 29, 2008 at 02:57 | Permalink

    You’re getting close with the systemwide policy. Indeed, the default systemwide policy is it starts heading towards the disk after 5 seconds anyway. What does fsync add, other than forcing everyone’s vi to force a flush even when systemwide policy says otherwise? It simply waits for the commit to finish, and will take exactly the same amount of time (modulo those 5 seconds of headstart) to finish as if it was just left to its own devices. Really, I suggest you look at the data=* mounttime options if you really expect your kernel to crash seconds after saving a file in vi. And issue a sync manually if you are playing with experimental module loading. Most of us aren’t doing that, and shouldn’t be hobbled by a badly thought out application of fsync() in each individual application.

    It is definitely not a good thing to add some kludge to delay fsync() in the kernel (and there’s a big discussion on LKML about this between Ingo and everyone else, regarding the crappy ordering of ext3 operations when CFQ is used), because sometimes the user really does want their HD to spin up and commit something.