Skip to content


Links for 2018-11-15

  • Tuning Spark Back Pressure by Simulation

    Interesting, Spark uses a PID controller algorithm to manage backpressure:

    Spark back pressure, which can be enabled by setting spark.streaming.backpressure.enabled=true, will dynamically resize batches so as to avoid queue build up. It is implemented using a Proportional Integral Derivative (PID) algorithm. This algorithm has some interesting properties, including the lack of guarantee of a stable fixed point. This can manifest itself not just in transient overshoot, but in a batch size oscillating around a (potentially optimal) constant throughput. The overshoot incurs latency; the undershoot costs throughput. Catastrophic overshoot leading to OOM is possible in degenerate circumstances (you need to choose the parameters quite deviously to cause this to happen). Having witnessed undershoot and slow recovery in production streaming jobs, I decided to investigate further by testing the algorithm with a simulator.

    (tags: backpressure streaming queueing pid-controllers algorithms congestion-control)

  • New – EC2 Auto Scaling Groups With Multiple Instance Types & Purchase Options | AWS News Blog

    Basically getting EC2 Fleet’s featureset into ASGs, good news

    (tags: ec2 fleet asg ops architecture cost-control)

  • SpamAssassin is back []

    The SpamAssassin 3.4.2 release was the first from that project in well over three years. At the 2018 Open Source Summit Europe, Giovanni Bechis talked about that release and those that will be coming in the near future. It would seem that, after an extended period of quiet, the SpamAssassin project is back and has rededicated itself to the task of keeping junk out of our inboxes.
    This is good to see! Also, newsy thread:

    (tags: spamassassin open-source oss anti-spam)

  • Google ‘betrays patient trust’ with DeepMind Health move | Technology | The Guardian

    Now that Streams is a Google product itself, that promise appears to have been broken, says privacy researcher Julia Powles: “Making this about semantics is a sleight of hand. DeepMind said it would never connect Streams with Google. The whole Streams app is now a Google product. That is an atrocious breach of trust, for an already beleaguered product.” A DeepMind spokesperson emphasised that the core of the promise remains intact: “All patient data remains under our partners’ strict control, and all decisions about its use lie with them. This data remains subject to strict audit and access controls and its processing remains subject to both our contracts and data protection legislation. The move to Google does not affect this.”

    (tags: google deepmind health nhs data-protection privacy healthcare)