Some good reasons not to adopt microservices blindly. Testability and distributed-systems complexity are my biggest fears
Solid warts-and-all confessional blogpost about a team failing to implement a microservices architecture. I’d put most of the blame on insufficient infrastructure to support them (at a code level), inter-personal team problems, and inexperience with large-scale complex multi-service production deployment and the work it was going to require
How Box introduced COE-style dev/ops outage postmortems, and got them working. This PIE metric sounds really useful to head off the dreaded “it’ll all have to come out missus” action item:
The picture was getting clearer, and we decided to look into individual postmortems and action items and see what was missing. As it was, action items were wasting away with no owners. Digging deeper, we noticed that many action items entailed massive refactorings or vague requirements like “make system X better” (i.e. tasks that realistically were unlikely to be addressed). At a higher level, postmortem discussions often devolved into theoretical debates without a clear outcome. We needed a way to lower and focus the postmortem bar and a better way to categorize our action items and our technical debt. Out of this need, PIE (“Probability of recurrence * Impact of recurrence * Ease of addressing”) was born. By ranking each factor from 1 (“low”) to 5 (“high”), PIE provided us with two critical improvements: 1. A way to police our postmortems discussions. I.e. a low probability, low impact, hard to implement solution was unlikely to get prioritized and was better suited to a discussion outside the context of the postmortem. Using this ranking helped deflect almost all theoretical discussions. 2. A straightforward way to prioritize our action items. What’s better is that once we embraced PIE, we also applied it to existing tech debt work. This was critical because we could now prioritize postmortem action items alongside existing work. Postmortem action items became part of normal operations just like any other high-priority work.
An accurate clock is required to negotiate SSL/TLS, so clock sync is important for internet-of-things usage. but:
Unfortunately for us, the traditional and most widespread method for clock synchronisation (NTP) has been caught up in a DDoS issue which has recently caused some ISPs to start blocking all NTP communication. [….] Because the DDoS attacks are so widespread, and the lack of obvious commercial pressure to fix the issue, it’s possible that the days of using NTP as a mechanism for setting clocks may well be numbered. Luckily for us there is a small but growing project that replaces it. tlsdate was started by Jacob Appelbaum of the Tor project in 2012, making use of the SSL handshake in order to extract time from a remote server, and its usage is on the rise. [….] Since we started encountering these problems, we’ve incorporated tlsdate into an over-the-air update, and have successfully started using this in situations where NTP is blocked.
This is a lovely demo of integrating modern IoT connectivity functionality (remote app control, etc.) with a washing machine using Bergcloud’s hardware and backend, and a little logic-analyzer reverse engineering.
While there are many defensible aspects of Systemd, other aspects boggle the mind. Not the least of these was that, as of a few months ago, trying to debug the kernel from the boot line would cause the system to crash. This was because of Systemd’s voracious logging and the fact that Systemd responds to the “debug” flag on the kernel boot line — a flag meant for the kernel, not anything else. That, straight up, is a bug. However, the Systemd developers didn’t see it that way and actively fought with those experiencing the problem. Add the fact that one of the Systemd developers was banned by Linus Torvalds for poor attitude and bad design and another was responsible for causing significant issues with Linux audio support, but blamed the problem on everything else but his software, and you have a bad situation on your hands. There’s no shortage of egos in the open source development world. There’s no shortage of new ideas and veteran developers and administrators pooh-poohing something new simply because it’s new. But there are also 45 years of history behind Unix and extremely good reasons it’s still flourishing. Tools designed like Systemd do not fit the Linux mold, to their own detriment. Systemd’s design has more in common with Windows than with Unix — down to the binary logging.The link re systemd consuming the “debug” kernel boot arg is a canonical example of inflexible coders refusing to fix their own bugs. (via Jason Dixon)
The mining operation resides on an old, repurposed factory floor, and contains 2500 machines hashing away at 230 Gh/s, each. (That’s 230 billion calculations per second, per unit). […] The operators told me that the power bill of this specific operation is in excess of ¥400,000 per month [..] about $60,000 USD.
Pretty serious speedup. 81 MB/sec with Tsunami UDP, compared to 9 MB/sec with plain old scp. Probably kills internet performance for everyone else though!
Ha, great name. We use this (in the form of Smartstack).
For what it is worth, we faced a similar challenge in earlier services (mostly due to existing C/C++ applications) and we created what was called a “sidecar”. By sidecar, what I mean is a second process on each node/instance that did Cloud Service Fabric operations on behalf of the main process (the side-managed process). Unfortunately those sidecars all went off and created one-offs for their particular service. In this post, I’ll describe a more general sidecar that doesn’t force users to have these one-offs. Sidenote: For those not familiar with sidecars, think of the motorcycle sidecar below. Snoopy would be the main process with Woodstock being the sidecar process. The main work on the instance would be the motorcycle (say serving your users’ REST requests). The operational control is the sidecar (say serving health checks and management plane requests of the operational platform).