- Day
- Lecture
- Readings
- Fri, Aug 30
- Tue, Sep 3
- Class 2: Data Systems Architectures Essentials - Part 1
- [P] “Architecture of a Database System”, Foundations and Trends in Databases, 2007
[B] “The Design and Implementation of Modern Column-Oriented Database Systems”, Foundations and Trends in Databases, 2012
- Class 2: Data Systems Architectures Essentials - Part 1
- Fri, Sep 6
- Class 3: Data Systems Architectures Essentials - Part 2
- [P] “The Seattle Report on Database Research”, SIGMOD Record, 2022
[B] “The Seattle Report on Database Research”, SIGMOD Record, 2018
- Class 3: Data Systems Architectures Essentials - Part 2
- Tue, Sep 10
- Class 4: Row-Stores vs. Column-Stores
- [P] “Column-Stores vs. Row-Stores: How Different Are They Really?”, SIGMOD, 2008
Technical Question 1 What are the paper’s main contributions? What is “late materialization”?
[B] “C-Store: A Column-oriented DBMS”, VLDB, 2005
- Class 4: Row-Stores vs. Column-Stores
- Fri, Sep 13
- Class 5: LSM Basics - Part 1
- [P] “LSM-based Storage Techniques: A Survey”, VLDB Journal, 2019
[B] “Dissecting, Designing, and Optimizing LSM-based Data Stores”, SIGMOD, 2022
- Class 5: LSM Basics - Part 1
- Tue, Sep 17
- Class 6: LSM Basics - Part 2
- [P] “Monkey: Optimal Navigable Key-Value Store”, SIGMOD, 2017
Technical Question 2 When does a tiered LSM-tree behave similarly to a leveled LSM-tree? What is the key contribution of the paper?
[B] “The LSM Design Space and its Read Optimizations”, ICDE, 2023
- Class 6: LSM Basics - Part 2
- Fri, Sep 20
- Class 7: Class Project Overview
- [P] “Data Structures for Data-Intensive Applications”, Foundations and Trends in Databases, 2023
[B] “The Log-Structured Merge-Tree (LSM-Tree)”, Acta Informatica, 1996
- Class 7: Class Project Overview
- Tue, Sep 24
- Class 8: LSM Memory Buffer
Guest Lecture| Shubham Kaushik - [P] “Anatomy of the LSM Memory Buffer: Insights & Implications”, DBTest, 2024
Technical Question 3 How does the memory allocation strategy affect the query latency for a pre-allocated vector vs. a dynamically allocated one? Given a workload, how would do decide the prefix length for a hash-hybrid buffer implementation?
[B] “Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems”, VLDB, 2020
- Class 8: LSM Memory Buffer
- Fri, Sep 27
- Class 9: LSM Compactions
- [P] “Constructing and Analyzing the LSM Compaction Design Space”, VLDB, 2021
[B] “Compactionary: A Dictionary for LSM Compactions”, SIGMOD, 2022
- Class 9: LSM Compactions
- Tue, Oct 1
- Class 10: LSM Tuning & Robustness
Guest Lecture| Andy Huynh - [P] “Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty”, VLDB, 2022
Review Paper 1
[B] “CliffGuard: A Principled Framework for Finding Robust Database Designs”, SIGMOD, 2015
- Class 10: LSM Tuning & Robustness
- Fri, Oct 4
- Rosh Hashanah – No Class
- Tue, Oct 8
- Class 11: LSM Filters
Guest Lecture| Zichen Zhu - [P] “LSM-Tree Under (Memory) Pressure”, ADMS, 2022
Technical Question 4 SHaMBa achieves better lookup performance when large Bloom filters do not fit in memory. Can we do something similar if we have large index blocks? Give a concrete example to explain why it could work or not.
[B] “Reducing Bloom Filter CPU Overhead in LSM-trees on Modern Storage Devices”, DaMoN, 2021
- Class 11: LSM Filters
- Fri, Oct 11
- Yom Kippur – No Class
- Tue, Oct 15
- Class 12: LSM Deletes
- [P] “Lethe: A Tunable Delete-Aware LSM Engine”, SIGMOD, 2020
Technical Question 5 To ensure bounded delete persistent latency, Lethe aggressively performs compactions. How does these eager compactions affect write amplification of the overall system?
[B] “Acheron: Persisting Tombstones in LSM Engines”, SIGMOD, 2023
- Class 12: LSM Deletes
- Fri, Oct 18
- Class 13: LSM Tuning Optimization
Student Presentation - 1 - [P] “Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads”, SIGMOD, 2023
[B] “Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging”, SIGMOD, 2018
- Class 13: LSM Tuning Optimization
- Tue, Oct 22
- Brandeis Thursday – No Class
- Fri, Oct 25
- Class 14: Class canceled!
- Tue, Oct 29
- Class 15: Adaptive Radix Trees
Student Presentation - 2 - [P] “The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases”, ICDE, 2013
Review Paper 2
- Class 15: Adaptive Radix Trees
- Fri, Nov 1
- Class 16: Adaptive Indexing & Cracking
Student Presentation - 3 - [P] “Adaptive Adaptive Indexing”, ICDE, 2018
[B] “Self-organizing Tuple Reconstruction in Column-stores”, SIGMOD, 2009
- Class 16: Adaptive Indexing & Cracking
- Tue, Nov 5
- Class 17: Sortedness-Aware Indexing
Guest Lecture| Aneesh Raman - [P] “Indexing for Near-Sorted Data”, ICDE, 2023
Review Paper 3
[B] “BoDS: A Benchmark on Data Sortedness”, TPCTC, 2022
- Class 17: Sortedness-Aware Indexing
- Fri, Nov 8
- Tue, Nov 12
- Class 19: Parametric I/O Model
Guest Lecture| Tarikul Islam Papon - [P] “ACEing the Bufferpool Management Paradigm for Modern Storage Devices”, ICDE, 2023
Technical Question 6 Consider you have three SSDs with the following asymmetry values: (i) α = 2.0, (ii) α = 4.0, (iii) α = 6.0. Your workload contains 50% writes with a predictive pattern. Which device will have the highest speedup after implementing ACE bufferpool and why? Also, will you use any prefetcher? Justify your answer.
[B] “A Parametric I/O Model for Modern Storage Devices”, DaMoN, 2021
- Class 19: Parametric I/O Model
- Fri, Nov 15
- Class 20: Self-Designing Storage Engine
Guest Lecture| Dr. Subarna Chatterjee - [P] “Cosine: A Cloud-Cost Optimized Self-Designing Key-Value Storage Engine”, VLDB, 2022
[B] “Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn”, CIDR, 2019
- Class 20: Self-Designing Storage Engine
- Tue, Nov 19
- Class 21: : HTAP Systems
Student Presentation - 4 - [P] “FASTER: A Concurrent Key-Value Store with In-Place Updates”, SIGMOD, 2018
Technical Question 7 How does FASTER manage data between the “hot” in-memory section and the “cold” disk-resident section? What criteria determine when data is moved between these sections?
- Class 21: : HTAP Systems
- Fri, Nov 22
- Class 22: Relational Memory
Guest Lecture| Prof. Ju Hyoung Mun - [P] “Relational Memory: Native In-Memory Accesses on Rows and Columns”, EDBT, 2023
[B] “Relational Fabric: Transparent Data Transformation”, ICDE, 2023
- Class 22: Relational Memory
- Tue, Nov 26
- Class 23: Data Structures in Databases
Student Presentation - 5 - [P] “The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models”, SIGMOD, 2018
Technical Question 8 What factors should the Data Calculator consider when synthesizing an optimal data structure for this workload? How would the framework decide between a B-tree and a log-structured merge (LSM) tree?
[B] “The Periodic Table of Data Structures”, DEBull, 2018
- Class 23: Data Structures in Databases
- Fri, Nov 29
- Thanksgiving Holiday – No Class
- Tue, Dec 3
- Class Project – Review
- Extended office hours to discuss project progress in Volen 259.
- Fri, Dec 6
- Class Project – Presentations
- Sign up here!
Presentation duration: 8 minutes
Q&A: 2 minutes
- Tue, Dec 10
- Class Project – Code Review
- Sign up here!
Code Review duration: 10 minutes