minisql/DESIGN.md

# MiniSQL

## Official Description
MiniSQL server

Create a simple SQL server that supports SELECT (including column selection), INSERT and DELETE.
The SELECT and DELETE statements support a WHERE clause with a single column.
It is also possible to create indexes (hashes).

The database works with persistent storage - it can be turned off and on without data loss.
Create an interface to the database to work - either CLI or desktop application.

## Resources

### DB internals
* [CMU intro to Database Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjaKScG3l0nuOiDTTqpfnWFf)
* Tutorial on sqlite clone in C: [Let's build a Simple Database. How Does a Database work?](https://cstack.github.io/db_tutorial/).
* [sqlite opcodes](https://www.sqlite.org/opcode.html)

### Parsing
Parser Combinator libraries
* [nom](https://github.com/rust-bakery/nom)
* [parser_combinators](https://docs.rs/parser-combinators/latest/parser_combinators/)
### TCP socket programming
* See [simple http server](https://youtu.be/hzSsOV2F7-s) implementation with tcp sockets


## Scope
* Primarily in-memory db.
* Occasionally will save its state on to the disk.
* Almost non-existent concurrency control?
* Inspiration from SQLite, but not the server part. For the server part take a look at Postgres.
* Can create custom column indexes.
* What about Joins?
* What about composite queries (i.e. instead of a table name in a select expression we include another select expression)?

# Building a Minimal Viable Product (MVP)
Possible usage:
* You start the db server with
```./minisql server start --db path/to/db/my-db.db --port 1433```
which will store the database as a file `path/to/db/my-db.db` and open a TCP server on port `1433`
* Then on possibly a different machine you run `./minisql client connect server_ip_address:6666` to start a client. This will open a REPL with which you can send queries/db management commands
* TODO: We should also consider writing a rust library that allows you to spin up a client that connects to the server.
  How would the interface look like?
```
  use mysql::{DB, DBConnection}

  let maybe_conn: Result<DBConnection> = DB.connect({
    port: 1433,
    db_name: "db-name",
    username: "bojack",
    password: "12345"}
    ): Result<DBConnection> ?;
  let conn: DBConnection = maybe_conn?;

  // then we can execute queries
  conn.perform("SELECT id, name, title, salary FROM employees"): ???
```
* Then with the client cli we can request creation of tables/insertion of data etc
```
> CREATE TABLE persons(id u32 PRIMARY KEY, name String, salary Float)

> INSERT 1, "Alice", 20.0 INTO persons;
> INSERT 2, "Bob", 30 INTO persons;
> INSERT 3, "Claire", 15 INTO persons;

```
* How should the SQL AST look like? For example when server parses `SELECT id, name FROM persons;`,
  how will the parsing output look like?
  Consider something like
```
// Could also be called `SQLAbstractSyntaxTree`
enum Operations {
  Select(Vector<FieldName>, TableName),
  Update(...)
}
```
* We also have to write an interpreter for these operations. How will the db-state be represented in memory?
  For example how can we implement a table?
```
// Not exactly efficient, but how could we do better?
enum DbValue {
  DbString(String),
  DbNumber(Float),
  DbByte(u8),
  DbUUID(u32)
}

type ColumnName = String

// The below type is a type of a table row
type Row = HashMap<ColumnName, DbValue>

// Or you know... some appropriate Dictionary Type
HashMap::make![("id", 1), ("name", "Alice"), ("salary", 20.0)] : Row

// possible optimization: have a mapping
// column names ~> indexes
// so that we could represent rows as
type Row = Vec<DbValue>


// How to represent a table?
table : HashMap<Id, Row>

Vec<(Id, Row)>.

// suppose the row corresponds to 'INSERT 1, "Yura", 20.0 INTO persons;'
Row ~> Vec<DbValue>
e.g. Row ~> vec![DbUUID 1, DbSTring "Yura"]

Vec<Vec<DbValue>>

```
* TODO: Consider streaming the response to the client and not just dumping 10K rows at once.


## Server
1. **Client input parsing/validation** string input from the client to Abstract Syntax Tree (AST) that represents SQL query
2. **Code gen** (Not necessary for MVP) from SQL AST generating bytecode for a more low-level VM.
3. **VM** (Not necessary for MVP) implement low-level VM that governs the in-memory db.
4. **Persistence** Serialize in-memory db state to a file. What format should it have? At first perhaps just a dumb json serialization? You also have to implement the deserialization.
5. **Client response** Stream selected rows/status/error messages back to the client. What should the protocol look like? Take a look at [Tabular Data Stream](https://en.wikipedia.org/wiki/Tabular_Data_Stream) or just respond with json.
6. **Concurrency control** Consider what happens when two clients simultaneously wish to updat ethe same row. There has to be some minimal mutual exclusion/locking.
7. **Security** Should we worry about secure communication over TCP?

## Client
1. **Opens TCP connection to server**
2. **REPL** has to provide a basic REPL interface
3. **Server Response decoder** Has to properly format the data/status/error messages from the server. Does the client have to parse the queries? Or is that only the server's responsibility?