Knowing how to communicate your work is an important skill for an engineer to have. Unfortunately, communication skills, both written and oral, are very often neglected in technical jobs. The attitude that strong technical skills are sufficient, and in particular for software engineers that code says it all, is rather prevalent. Not everyone is a … Continue reading I have to give a talk, now what?
This is the talk I have given at ApacheCon @Home - 2020 in the Streaming track. I start by motivating stream data and the need of storage for streams. Traditional storage systems on one side of the abstraction spectrum and messaging systems on the other have not really solved the problem of storage for streaming … Continue reading Pravega: Storage for streams (ApacheCon @Home 2020)
Exactly-once semantics is an intriguing, controversial, and for me an exciting topic. I have been dealing with it in the recent past across different systems and one particular issue that got me interested is the connection I have seen a few people make to FLP, the distributed consensus impossibility result . I argue here informally … Continue reading No consensus in exactly-once
Apparently Yahoo! Labs removed access to tech reports, including the proof of correctness we had written for Zab (ZooKeeper Atomic Broadcast). I'm consequently making it available here. YL-2010-007
This blog post from Martin Kleppmann triggered this note. That blog post discusses an issue with locks in Redis and argues that a solution to avoid the issue of depending on timing is to use a combination of distributed locks with ZooKeeper and fencing. This argument caused some confusion and I wanted to address it … Continue reading Note on fencing and distributed locks
In distributed computing jargon, properties are classified as either safety or liveness properties [1, 2]. Consistency is a typical safety property: the state of the system is never inconsistent for some definition of consistent. Of course, "never inconsistent" assumes an ideal world in which us developers do everything right in the code, which history has … Continue reading Keep moving forward: Liveness in distributed systems
This post is about silent data loss in replicated systems (state-machine replication a la ZooKeeper) due to the disk state being wiped out. The disk state is crucial in such systems to guarantee that replicated data isn't lost in the case a server crashes and recovers. The replication protocol in ZooKeeper assumes that servers can … Continue reading Dude, where’s my metadata?
This post is about the replication scheme we use in Apache BookKeeper. It is mostly about why we did it this way rather than how to use it. There are other sources of information about how to use BookKeeper. When we started thinking about the design of BookKeeper and the replication scheme, we had some … Continue reading So many ways of replicating…