Explain the role of the DESIGN_OLD.md file
This commit is contained in:
parent
e22b31dc41
commit
a3e3390c85
1 changed files with 10 additions and 9 deletions
240
DESIGN.md
240
DESIGN.md
|
|
@ -1,240 +0,0 @@
|
|||
# MiniSQL
|
||||
|
||||
## Official Description
|
||||
MiniSQL server
|
||||
|
||||
Create a simple SQL server that supports SELECT (including column selection), INSERT and DELETE.
|
||||
The SELECT and DELETE statements support a WHERE clause with a single column.
|
||||
It is also possible to create indexes (hashes).
|
||||
|
||||
The database works with persistent storage - it can be turned off and on without data loss.
|
||||
Create an interface to the database to work - either CLI or desktop application.
|
||||
|
||||
## Resources
|
||||
|
||||
### DB internals
|
||||
* [CMU intro to Database Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjaKScG3l0nuOiDTTqpfnWFf)
|
||||
* Tutorial on sqlite clone in C: [Let's build a Simple Database. How Does a Database work?](https://cstack.github.io/db_tutorial/).
|
||||
* [sqlite opcodes](https://www.sqlite.org/opcode.html)
|
||||
|
||||
### Parsing
|
||||
Parser Combinator libraries
|
||||
* [nom](https://github.com/rust-bakery/nom)
|
||||
* [parser_combinators](https://docs.rs/parser-combinators/latest/parser_combinators/)
|
||||
### TCP socket programming
|
||||
* See [simple http server](https://youtu.be/hzSsOV2F7-s) implementation with tcp sockets
|
||||
|
||||
|
||||
## Scope
|
||||
* Primarily in-memory db.
|
||||
* Occasionally will save its state on to the disk.
|
||||
* Almost non-existent concurrency control?
|
||||
* Inspiration from SQLite, but not the server part. For the server part take a look at Postgres.
|
||||
* Can create custom column indexes.
|
||||
* What about Joins?
|
||||
* What about composite queries (i.e. instead of a table name in a select expression we include another select expression)?
|
||||
|
||||
# Building a Minimal Viable Product (MVP)
|
||||
Possible usage:
|
||||
* You start the db server with
|
||||
```./minisql server start --db path/to/db/my-db.db --port 1433```
|
||||
which will store the database as a file `path/to/db/my-db.db` and open a TCP server on port `1433`
|
||||
* Then on possibly a different machine you run `./minisql client connect server_ip_address:6666` to start a client. This will open a REPL with which you can send queries/db management commands
|
||||
* TODO: We should also consider writing a rust library that allows you to spin up a client that connects to the server.
|
||||
How would the interface look like?
|
||||
```
|
||||
use mysql::{DB, DBConnection}
|
||||
|
||||
let maybe_conn: Result<DBConnection> = DB.connect({
|
||||
port: 1433,
|
||||
db_name: "db-name",
|
||||
username: "bojack",
|
||||
password: "12345"}
|
||||
): Result<DBConnection> ?;
|
||||
let conn: DBConnection = maybe_conn?;
|
||||
|
||||
// then we can execute queries
|
||||
conn.perform("SELECT id, name, title, salary FROM employees"): ???
|
||||
```
|
||||
* Then with the client cli we can request creation of tables/insertion of data etc
|
||||
```
|
||||
> CREATE TABLE persons(id u32 PRIMARY KEY, name String, salary Float)
|
||||
|
||||
> INSERT 1, "Alice", 20.0 INTO persons;
|
||||
> INSERT 2, "Bob", 30 INTO persons;
|
||||
> INSERT 3, "Claire", 15 INTO persons;
|
||||
|
||||
```
|
||||
* How should the SQL AST look like? For example when server parses `SELECT id, name FROM persons;`,
|
||||
how will the parsing output look like?
|
||||
Consider something like
|
||||
```
|
||||
// TODO: Parser has access to all table metadata
|
||||
|
||||
// Could also be called `SQLAbstractSyntaxTree`
|
||||
enum Operation {
|
||||
Select(TableName, ColumnSelection, Option<Condition>),
|
||||
Insert(TableName, Vec<(ColumnName, DbValue)>), // String because we don't yet know which type of value this is for sure
|
||||
Delete(TableName, Option<Condition>),
|
||||
// Update(...),
|
||||
}
|
||||
|
||||
enum ColumnSelection {
|
||||
All,
|
||||
Columns(Vec<ColumnName>),
|
||||
}
|
||||
|
||||
enum Condition = {
|
||||
// And(Condition, Condition),
|
||||
// Or(Condition, Condition),
|
||||
// Not(Condition),
|
||||
|
||||
Eq(ColumnName, DbValue)
|
||||
// LessOrEqual(ColumnName, DbValue)
|
||||
// Less(ColumnName, DbValue)
|
||||
|
||||
// StringCondition(StringCondition)
|
||||
}
|
||||
|
||||
enum StringCondition {
|
||||
Prefix(ColumnName, String)
|
||||
Substring(ColumnName, String)
|
||||
}
|
||||
|
||||
|
||||
|
||||
INSERT 123
|
||||
```
|
||||
* We also have to write an interpreter for these operations. How will the db-state be represented in memory?
|
||||
For example how can we implement a table?
|
||||
```
|
||||
// Not exactly efficient, but how could we do better?
|
||||
enum DbValue {
|
||||
DbString(String),
|
||||
DbNumber(Float),
|
||||
DbUUID(u32)
|
||||
}
|
||||
|
||||
// We also need a type of db-types
|
||||
enum DbType {
|
||||
TString,
|
||||
TNumber,
|
||||
TId,
|
||||
}
|
||||
|
||||
value_to_type(db_val: DbValue) -> DbType
|
||||
|
||||
|
||||
// table-metadata and data
|
||||
|
||||
type TableName = String
|
||||
|
||||
// Note that it is nice to split metadata from the data because
|
||||
// then you can give the metadata to the parser without giving it the data.
|
||||
struct TableMetaData {
|
||||
name: TableName, // TODO: Is this really necessary? probably not
|
||||
columns: Vec<(ColumnName, DbType, ColumnPosition)>
|
||||
}
|
||||
|
||||
fn column(TableMetaData, ColumnName) -> ColumnPosition
|
||||
|
||||
struct Table {
|
||||
meta: TableMetaData,
|
||||
rows: Rows // defined below
|
||||
indexes:
|
||||
BTree<ColumnName, Index> // TODO: Consider generalizing ColumnName to semething that would also apply to a pair of ColumnNames etc
|
||||
}
|
||||
|
||||
type Tables = HashMap<TableName, Table>
|
||||
|
||||
// We also need a function that for a given value computes its type (for validation)
|
||||
|
||||
|
||||
type ColumnName = String
|
||||
type ColumnPosition = u32
|
||||
|
||||
// The below type is a type of a table row
|
||||
type Row = HashMap<ColumnName, DbValue>
|
||||
|
||||
// Or you know... some appropriate Dictionary Type
|
||||
HashMap::make![("id", 1), ("name", "Alice"), ("salary", 20.0)] : Row
|
||||
|
||||
type Rows =
|
||||
BTree<Id, Row>
|
||||
|
||||
// possible optimization: have a mapping
|
||||
// column names ~> indexes
|
||||
// so that we could represent rows as
|
||||
type Row = Vec<DbValue>
|
||||
|
||||
|
||||
// How to represent a table?
|
||||
table : HashMap<Id, Row>
|
||||
|
||||
Vec<(Id, Row)>.
|
||||
|
||||
// suppose the row corresponds to 'INSERT 1, "Alice", 20.0 INTO persons;'
|
||||
Row ~> Vec<DbValue>
|
||||
e.g. Row ~> vec![DbUUID 1, DbSTring "Alice"]
|
||||
|
||||
Vec<Vec<DbValue>>
|
||||
|
||||
```
|
||||
* Interpreter
|
||||
```
|
||||
trait SqlConsumer {
|
||||
// TODO:
|
||||
???
|
||||
}
|
||||
|
||||
fn interpret<T: SqlConsumer>(operation: Operation, tables: &mut Tables, consumer: T) -> () {
|
||||
// TODO: lock stuff
|
||||
match operation {
|
||||
Select(table_name, column_selection, maybe_condition) => {
|
||||
let table: Table = ...
|
||||
// TODO: Wrap this into a response
|
||||
select(table, column_selection, maybe_condition, consumer)
|
||||
},
|
||||
Insert(table_name, Vec<(ColumnName, DbValue)>) => {
|
||||
insert(table, ???)
|
||||
}
|
||||
Delete(table_name, maybe_condition) => {
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
response = interpret(...)
|
||||
knows_how_to_respond(response, client)
|
||||
|
||||
|
||||
enum Response {
|
||||
Selected(impl Iter<???>) // TODO: How to do this? Some reference to an iterator somehow... slice..?
|
||||
Inserted(???),
|
||||
Deleted(usize), // how many were deleted
|
||||
}
|
||||
|
||||
fn select(table: Table, ColumnName
|
||||
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
* TODO: Consider streaming the response to the client and not just dumping 10K rows at once.
|
||||
|
||||
|
||||
|
||||
## Server
|
||||
1. **Client input parsing/validation** string input from the client to Abstract Syntax Tree (AST) that represents SQL query
|
||||
2. **Code gen** (Not necessary for MVP) from SQL AST generating bytecode for a more low-level VM.
|
||||
3. **VM** (Not necessary for MVP) implement low-level VM that governs the in-memory db.
|
||||
4. **Persistence** Serialize in-memory db state to a file. What format should it have? At first perhaps just a dumb json serialization? You also have to implement the deserialization.
|
||||
5. **Client response** Stream selected rows/status/error messages back to the client. What should the protocol look like? Take a look at [Tabular Data Stream](https://en.wikipedia.org/wiki/Tabular_Data_Stream) or just respond with json.
|
||||
6. **Concurrency control** Consider what happens when two clients simultaneously wish to updat ethe same row. There has to be some minimal mutual exclusion/locking.
|
||||
7. **Security** Should we worry about secure communication over TCP?
|
||||
|
||||
## Client
|
||||
1. **Opens TCP connection to server**
|
||||
2. **REPL** has to provide a basic REPL interface
|
||||
3. **Server Response decoder** Has to properly format the data/status/error messages from the server. Does the client have to parse the queries? Or is that only the server's responsibility?
|
||||
Loading…
Add table
Add a link
Reference in a new issue