Another sorry tale of Storm issues:
Storm has been successful at Librato, but we experienced many of the limitations cited in the Twitter Heron: Stream Processing at Scale paper and outlined here by Adrian Colyer, including: Inability to isolate, reason about, or debug performance issues due to the worker/executor/task paradigm. This led to building and configuring clusters specifically designed to attempt to mitigate these problems (i.e., separate clusters per topology, only running a worker per server.), which added additional complexity to development and operations and also led to over-provisioning. Ability of tasks to move around led to difficult to trace performance problems. Storm’s work provisioning logic led to some tasks serving more Kafka partitions than others. This in turn created latency and performance issues that were difficult to reason about. The initial solution was to over-provision in an attempt to get a better hashing/balancing of work, but eventually we just replaced the work allocation logic. Due to Storm’s architecture, it was very difficult to get a stack trace or heap dump because the processes that managed workers (Storm supervisor) would often forcefully kill a Java process while it was being investigated in this way. The propensity for unexpected and subsequently unhandled exceptions to take down an entire worker led to additional defensive verbose error handling everywhere. This nasty bug STORM-404 coupled with the aforementioned fact that a single exception can take down a worker led to several cascading failures in production, taking down entire topologies until we upgraded to 0.9.4. Additionally, we found the performance we were getting from Storm for the amount of money we were spending on infrastructure was not in line with our expectations. Much of this is due to the fact that, depending upon how your topology is designed, a single tuple may make multiple hops across JVMs, and this is very expensive. For example, in our time series aggregation topologies a single tuple may be serialized/deserialized and shipped across the wire 3-4 times as it progresses through the processing pipeline.
Librato’s service discovery library using Zookeeper (so strongly consistent, but with the ZK downside that an AZ outage can stall service discovery updates region-wide)
“Big companies didn’t only rely on safe harbour: they also rely on binding corporate rules and standard contractual clauses. But it’s interesting that the court decided the case on fundamental rights grounds: so it doesn’t matter remotely what ground you transfer on, if that process is still illegal under 7 and 8 of charter, it can’t be done.”Also:
“Ireland has no interest in doing its job, and will continue not to, forever. Clearly it’s an investment issue – but overall the policy is: we don’t regulate companies here. The cost of challenging any of this in the courts is prohibitive. And the people don’t seem to care.”:(
Sounds like the CJEU’s Bara decision may cause problems for the Irish government’s wilful data-sharing:
Articles 10, 11 and 13 of Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995, on the protection of individuals with regard to the processing of personal data and on the free movement of such data, must be interpreted as precluding national measures, such as those at issue in the main proceedings, which allow a public administrative body of a Member State to transfer personal data to another public administrative body and their subsequent processing, without the data subjects having been informed of that transfer or processing.