Links for 2013-05-15

  • Rusty’s API Design Manifesto

    This classic came up in discussions yesterday…

    In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not.  It’s a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.

    (tags: rusty-russell quality coding kernel linux apis design code-reviews code)

  • Sup relaunched

    hooray! Command-line gmailish goodness returns. And with a signed gem, to boot

    (tags: gems ruby sup mail gmail mua)

  • Martin Thompson, Luke “Snabb Switch” Gorrie etc. review the C10M presentation from Schmoocon

    on the mechanical-sympathy mailing list. Some really interesting discussion on handling insane quantities of TCP connections using low volumes of hardware:

    This talk has some good points and I think the subject is really interesting.  I would take the suggested approach with serious caution.  For starters the Linux kernel is nowhere near as bad as it made out.  Last year I worked with a client and we scaled a single server to 1 million concurrent connections with async programming in Java and some sensible kernel tuning.  I’ve heard they have since taken this to over 5 million concurrent connections. BTW Open Onload is an open source implementation.  Writing a network stack is a serious undertaking.  In a previous life I wrote a network probe and had to reassemble TCP streams and kept getting tripped up by edge cases.  It is a great exercise in data structures and lock-free programming.  If you need very high-end performance I’d talk to the Solarflare or Mellanox guys before writing my own. There are some errors and omissions in this talk.  For example, his range of ephemeral ports is not quite right, and atomic operations are only 15 cycles on Sandy Bridge when hitting local cache.  A big issue for me is when he defined C10M he did not mention the TIME_WAIT issue with closing connections.  Creating and destroying 1 million connections per second is a major issue.  A protocol like HTTP is very broken in that the server closes the socket and therefore has to retain the TCB until the specified timeout occurs to ensure no older packet is delivered to a new socket connection.

    (tags: mechanical-sympathy hardware scaling c10m tcp http scalability snabb-switch martin-thompson)

  • ec2-consistent-snapshot

    This program creates an EBS snapshot for an Amazon EC2 EBS volume. To help ensure consistent data in the snapshot, it tries to flush and freeze the filesystem(s) first as well as flushing and locking the database, if applicable. Filesystems can be frozen during the snapshot. Prior to Linux kernel 2.6.29, XFS must be used for freezing support. While frozen, a filesystem will be consistent on disk and all writes will block. There are a number of timeouts to reduce the risk of interfering with the normal database operation while improving the chances of getting a consistent snapshot. If you have multiple EBS volumes in a RAID configuration, you can specify all of the volume ids on the command line and it will create snapshots for each while the filesystem and database are locked. Note that it is your responsibility to keep track of the resulting snapshot ids and to figure out how to put these back together when you need to restore the RAID setup.

    (tags: ubuntu ec2 aws linux ebs snapshots ops tools alestic)

  • Measuring & Optimizing I/O Performance

    Another good writeup on iostat and EBS, from Ilya Grigorik

    (tags: io optimization sysadmin performance iostat ebs aws ops)

  • AWS forum post on interpreting iostat output for EBS

    Great post from [email protected] on interpreting iostat output on EBS volumes — from 2009, but still looks reasonable enough

    (tags: iostat ebs disks hardware aws ops)

  • Operations is Dead, but Please Don’t Replace it with DevOps

    This is so damn spot on.

    Functional silos (and a standalone DevOps team is a great example of one) decouple actions from responsibility. Functional silos allow people to ignore, or at least feel disconnected from, the consequences of their actions. DevOps is a cultural change that encourages, rewards and exposes people taking responsibility for what they do, and what is expected from them. As Werner Vogels from Amazon Web Services says, “you build it, you run it”. So a “DevOps team” is a risky and ultimately doomed strategy. Sure there are some technical roles, specifically related to the enablement of DevOps as an approach and these roles and tools need to be filled and built. Self service platforms, collaboration and communication systems, tool chains for testing, deployment and operations are all necessary. Sure someone needs to deliver on that stuff. But those are specific technical deliverables and not DevOps. DevOps is about people, communication and collaboration. Organizations ignore that at their peril.

    (tags: devops teams work ops silos collaboration organisations)

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.