This is a really nice illustration of the use of control theory to set tunable thresholds automatically in a complex storage system. Nice work Scylla:
At any given moment, a database like ScyllaDB has to juggle the admission of foreground requests with background processes like compactions, making sure that the incoming workload is not severely disrupted by compactions, nor that the compaction backlog is so big that reads are later penalized. In this article, we showed that isolation among incoming writes and compactions can be achieved by the Schedulers, yet the database is still left with the task of determining the amount of shares of the resources incoming writes and compactions will use. Scylla steers away from user-defined tunables in this task, as they shift the burden of operation to the user, complicating operations and being fragile against changing workloads. By borrowing from the strong theoretical background of industrial controllers, we can provide an Autonomous Database that adapts to changing workloads without operator intervention.
Proposed solution: complementing Datomic with an erasure-aware key/value store. In cases where Excision is not a viable solution, the solution I’ve come up with is store to privacy-sensitive values in a complementary, mutable KV store, and referencing the corresponding keys from Datomic.This seems to be turning into a common pattern for GDPR compliant storage.
Martin Kleppmann: “What’s current best practice for GDPR compliance (in particular, right to deletion) in systems with append-only logs/event sourcing/blockchains, which are supposed to keep history forever?” Ben Kehoe: “Crypto delete. The immutable store keeps an encrypted copy, and the key is stored elsewhere. Forget me = throw away the key”. That seems to be the most practical suggestion in general in this thread.