From 59c1fa92e1960bdf6eded3dffb8b9881f39b3a17 Mon Sep 17 00:00:00 2001 From: Yuriy Dupyn <2153100+omedusyo@users.noreply.github.com> Date: Mon, 16 Oct 2023 01:06:37 +0200 Subject: [PATCH] Add a design document. --- DESIGN.md | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 28 ++---------- 2 files changed, 134 insertions(+), 25 deletions(-) create mode 100644 DESIGN.md diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 0000000..fe86d24 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,131 @@ +# MiniSQL + +## Official Description +MiniSQL server + +Create a simple SQL server that supports SELECT (including column selection), INSERT and DELETE. +The SELECT and DELETE statements support a WHERE clause with a single column. +It is also possible to create indexes (hashes). + +The database works with persistent storage - it can be turned off and on without data loss. +Create an interface to the database to work - either CLI or desktop application. + +## Resources + +### DB internals +* [CMU intro to Database Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjaKScG3l0nuOiDTTqpfnWFf) +* Tutorial on sqlite clone in C: [Let's build a Simple Database. How Does a Database work?](https://cstack.github.io/db_tutorial/). +* [sqlite opcodes](https://www.sqlite.org/opcode.html) + +### Parsing +Parser Combinator libraries +* [nom](https://github.com/rust-bakery/nom) +* [parser_combinators](https://docs.rs/parser-combinators/latest/parser_combinators/) +### TCP socket programming +* See [simple http server](https://youtu.be/hzSsOV2F7-s) implementation with tcp sockets + + +## Scope +* Primarily in-memory db. +* Occasionally will save its state on to the disk. +* Almost non-existent concurrency control? +* Inspiration from SQLite, but not the server part. For the server part take a look at Postgres. +* Can create custom column indexes. +* What about Joins? +* What about composite queries (i.e. instead of a table name in a select expression we include another select expression)? + +# Building a Minimal Viable Product (MVP) +Possible usage: +* You start the db server with +```./minisql server start --db path/to/db/my-db.db --port 1433``` +which will store the database as a file `path/to/db/my-db.db` and open a TCP server on port `1433` +* Then on possibly a different machine you run `./minisql client connect server_ip_address:6666` to start a client. This will open a REPL with which you can send queries/db management commands +* TODO: We should also consider writing a rust library that allows you to spin up a client that connects to the server. + How would the interface look like? +``` + use mysql::{DB, DBConnection} + + let maybe_conn: Result = DB.connect({ + port: 1433, + db_name: "db-name", + username: "bojack", + password: "12345"} + ): Result ?; + let conn: DBConnection = maybe_conn?; + + // then we can execute queries + conn.perform("SELECT id, name, title, salary FROM employees"): ??? +``` +* Then with the client cli we can request creation of tables/insertion of data etc +``` +> CREATE TABLE persons(id u32 PRIMARY KEY, name String, salary Float) + +> INSERT 1, "Alice", 20.0 INTO persons; +> INSERT 2, "Bob", 30 INTO persons; +> INSERT 3, "Claire", 15 INTO persons; + +``` +* How should the SQL AST look like? For example when server parses `SELECT id, name FROM persons;`, + how will the parsing output look like? + Consider something like +``` +// Could also be called `SQLAbstractSyntaxTree` +enum Operations { + Select(Vector, TableName), + Update(...) +} +``` +* We also have to write an interpreter for these operations. How will the db-state be represented in memory? + For example how can we implement a table? +``` +// Not exactly efficient, but how could we do better? +enum DbValue { + DbString(String), + DbNumber(Float), + DbByte(u8), + DbUUID(u32) +} + +type ColumnName = String + +// The below type is a type of a table row +type Row = HashMap + +// Or you know... some appropriate Dictionary Type +HashMap::make![("id", 1), ("name", "Alice"), ("salary", 20.0)] : Row + +// possible optimization: have a mapping +// column names ~> indexes +// so that we could represent rows as +type Row = Vec + + +// How to represent a table? +table : HashMap + +Vec<(Id, Row)>. + +// suppose the row corresponds to 'INSERT 1, "Yura", 20.0 INTO persons;' +Row ~> Vec +e.g. Row ~> vec![DbUUID 1, DbSTring "Yura"] + +Vec> + +``` +* TODO: Consider streaming the response to the client and not just dumping 10K rows at once. + + + +## Server +1. **Client input parsing/validation** string input from the client to Abstract Syntax Tree (AST) that represents SQL query +2. **Code gen** (Not necessary for MVP) from SQL AST generating bytecode for a more low-level VM. +3. **VM** (Not necessary for MVP) implement low-level VM that governs the in-memory db. +4. **Persistence** Serialize in-memory db state to a file. What format should it have? At first perhaps just a dumb json serialization? You also have to implement the deserialization. +5. **Client response** Stream selected rows/status/error messages back to the client. What should the protocol look like? Take a look at [Tabular Data Stream](https://en.wikipedia.org/wiki/Tabular_Data_Stream) or just respond with json. +6. **Concurrency control** Consider what happens when two clients simultaneously wish to updat ethe same row. There has to be some minimal mutual exclusion/locking. +7. **Security** Should we worry about secure communication over TCP? + +## Client +1. **Opens TCP connection to server** +2. **REPL** has to provide a basic REPL interface +3. **Server Response decoder** Has to properly format the data/status/error messages from the server. Does the client have to parse the queries? Or is that only the server's responsibility? diff --git a/README.md b/README.md index 09faa32..42a9ca6 100644 --- a/README.md +++ b/README.md @@ -1,29 +1,7 @@ -# minisql +# MiniSQL - - -## Getting started - -To make it easy for you to get started with GitLab, here's a list of recommended next steps. - -Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)! - -## Add your files - -- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files -- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command: +## Installation ``` -cd existing_repo -git remote add origin https://gitlab.fi.muni.cz/x433485/minisql.git -git branch -M main -git push -uf origin main +git clone git@gitlab.fi.muni.cz:x433485/minisql.git ``` - -## Collaborate with your team - -- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/) -- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html) -- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically) -- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/) -- [ ] [Set auto-merge](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)