Context: In highly concurrent distributed systems, handling tens of millions of operations is a massive challenge! Delta Lake, which relies on Optimistic Concurrency Control, avoids locking entirely - allowing transactions to proceed in parallel without knowing each other’s existence. Instead, we resolve conflicts at commit time.
Handling concurrency in Delta Lake means navigating through multiple scenarios that are tough to catch, given how transactions execute independently:1. Different operations can conflict in unpredictable ways, and we need to track them with fine-grain accuracy.2. We need a way to easily interleave operations, pause them at a precise phase, and check for conflicts.
Solution: Thanks to Scala, we were able to build a powerful, idiomatic, and easy-to-use testing framework with the following benefits:
1. Fine-grained error catching: We can pause an operation mid-execution, allowing us to perform our validations.
2. Reusable building blocks: Creating generic mixin helpers, or building blocks that are easily reusable for different operations, and can scale to any number of transactions.
3. Focus on conflict scenarios: Instead of worrying about how to implement tests, the framework lets us spend more time thinking through the real problem—which is reasoning about the complex interactions and resolving them.
4. Abstracted setup and cleanup: Test code that focuses on testing, and abstracts away the setup and cleanup, allowing users to write tests easier.
Goal: From this talk, I hope to:
1. Spread the main ideas of the Concurrency Testing Framework, I have not found another similar framework.
2. How to use Scala’s functional programming power to build a clean, idiomatic, and reusable testing framework.
3. Express my passion for Scala and databases.
4. This will be my first time doing live on-stage conference speaking if I get accepted, it will certainly be a thrilling but rewarding experience.
5. Hope to hear from other Scala enthusiasts for knowledge sharing and idea cross-pollination.
References:
Some notable PRs and resource:
https://github.com/delta-io/delta/commit/01c0ef91565d9a3e06fb9f62295f7c80fba09351
https://github.com/delta-io/delta/commit/615c184a3677afb26b7ca82c630ebe595421555b
https://github.com/delta-io/delta/commit/dbdcd0b143d1b13b4f06f1928c0e19607145edf3
https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html
In this talk, I will discuss why it's hard to use the power of RT to test side-effect-heavy apps.
This talk will introduce Mill: a newer build tool that does everything SBT does, but better. Faster, simpler, easier, Mill democratizes the build so you don't need to be a build tool expert to work on it.
I will demonstrate how Pillars can take you from zero to production in record time. By leveraging Pillars’ integration of well-known libraries, you can bypass the usual complexities of setting up observability (traces, metrics, and logs), database access, API calls, and feature flag management.
In this talk, we'll cover the essentials of macros, why they are useful, why you should care about them, and how to become as good as you need with them for practical purposes.