Links for 2018-11-19

  • Java’s ByteBuffer native memory “leak”

    Well this is suboptimal:

    The Java NIO APIs use ByteBuffers as the source and destination of I/O calls, and come in two flavours. Heap ByteBuffers wrap a byte[] array, allocated in the garbage collected Java heap. Direct ByteBuffers wrap memory allocated outside the Java heap using malloc. Only “native” memory can be passed to operating system calls, so it won’t be moved by the garbage collector. This means that when you use a heap ByteBuffer for I/O, it is copied into a temporary direct ByteBuffer. The JDK caches one temporary buffer per thread, without any memory limits. As a result, if you call I/O methods with large heap ByteBuffers from multiple threads, your process can use a huge amount of additional native memory, which looks like a native memory leak. This can cause your process to unexpectedly run into memory limits and get killed.

    (tags: jvm performance java memory leaks bytebuffers netty threads coding bugs)

  • The Time Our Provider Screwed Us

    Good talk (with transcript) from Paul Biggar about what happened when CircleCI had a massive security incident, and how Jesse Robbins helped them do incident response correctly. ‘On the left, Jesse pointed out that we needed an incident commander. That’s me, Paul. And this is very good, because I was a big proponent, I think lots of were around the 2013 mark, of flat organizational structures, and so I hadn’t really got a handle of this whole being in charge thing. The fact that someone else came in and said, “No, no, no, you are in charge”: extremely useful. And he also laid out the order of our priorities. Number one priority; safety of customers. Number two priority: communicate with customers. Number three priority: recovery of service. I think a reasonable person could have put those in a different order, especially under the pressure and time constraints of the potential company-ending situation. So I was very happy to have those in order. If this is ever going to happen to you, I’d memorize them, maybe put it on an index card in your pocket, in case this ever happens. The last thing he said is to make sure that we log everything, that we go slow, and that we code review and communicate. His point there is that if we’re going to bring our site back up, if we’re going to do all the things that we need to do in order to save our business and do the right thing for our customers and all that, we can’t be making quick, bad decisions. You can’t just upload whatever code is on your computer now, because I have to do this now, I have to fix it. So we set up a Slack channel … This was pre-Slack; it was a HipChat channel, where all of our communications went. Every single communication that we had about this went in that chatroom. Which came in extremely useful the next day, when I had to write a blog post that detailed exactly what had happened and all the steps that we did to fix it and remediate this, and I had an exact time stamps of all the things that had happened.’

    (tags: incidents incident-response paul-biggar circleci security communication outages)

  • Deep learning can “discover” new knowledge from scans/images

    Amazing paper:

    Here, we show that deep learning can extract new knowledge from retinal fundus images. Using deep-learning models trained on data from 284,335 patients and validated on two independent datasets of 12,026 and 999 patients, we predicted cardiovascular risk factors not previously thought to be present or quantifiable in retinal images, such as age (mean absolute error within 3.26 years), gender (area under the receiver operating characteristic curve (AUC)?=?0.97), smoking status (AUC?=?0.71), systolic blood pressure (mean absolute error within 11.23?mmHg) and major adverse cardiac events (AUC?=?0.70). We also show that the trained deep-learning models used anatomical features, such as the optic disc or blood vessels, to generate each prediction.

    (tags: deep-learning data analysis ml machine-learning health medicine papers)

  • OpsMop

    ‘a next-generation, no-compromise automation system’.

    Uses: Web-scale configuration management of all Linux/Unix systems; Application deployment; Immutable systems build definition; Maintaining stateful services such as database and messaging platforms; Automating one-off tasks & processes; Deployment and management of the undercloud. Features: Python 3 DSL; Declarative resource model with imperative capabilities; Type / Provider plugin seperation; Implicit ordering (with handler notification); Formalized “Plan” vs “Apply” evaluation stages; Early validation prior to runtime; Programatically scoped variables; Strong object-orientation

    (tags: opsmop ops configuration-management deployment build)

  • The JVM in Docker 2018

    Later JDK versions have made it far easier to run a JVM application in a Linux container. The memory support means that if you relied on JVM ergonomics before than you can do the same inside a container where as previously you had to override all memory related settings. The CPU support for containers needs to be carefully evaluated for your application and environment. If you’ve previously set low cpu_shares in environments like Kubernetes to increase utilisation while relying on using up unused cycles then you might get a shock.

    (tags: jvm docker kubernetes linux containers ops)

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.

2 Comments

  1. Nix
    Posted November 20, 2018 at 15:53 | Permalink

    “The Time Our Provider Screwed Us” is basically a condemnation of the cloud model from someone who’s so invested in it that he doesn’t realise that it is. So you’re a continuous integration company, so you hold keys to a huge number of customer things, including passwords to build stuff run by major payment vendors… but you decided to host that database on someone else’s machine for which you have no real control over its security… and then it turns out those other people are security clowns and that machine was compromised and you didn’t find out for a day or two.

    Perhaps security-critical data like access keys should not be held unencrypted in databases run by other people? Just perhaps? At the very least encrypt them with a key held in a local hardware token (cost: a few hundred dollars, max, and probably much less), and only decrypt them locally, transiently. Elementary security awareness, you’d think, but noooo it’s not swish and cool and cloudy enough.

    Bah!

  2. Posted November 21, 2018 at 11:52 | Permalink

    Absolutely agreed! Although when you think about it, the same applies to a degree with colo servers, or even to the developers of the network hardware you use if you operate your own DCs. The only safe thing to do is rely on crypto with safe key storage hardware. At least nowadays we have AWS KMS to rely on….