5.1 KiB
5.1 KiB
MiniSQL
Official Description
MiniSQL server
Create a simple SQL server that supports SELECT (including column selection), INSERT and DELETE. The SELECT and DELETE statements support a WHERE clause with a single column. It is also possible to create indexes (hashes).
The database works with persistent storage - it can be turned off and on without data loss. Create an interface to the database to work - either CLI or desktop application.
Resources
DB internals
- CMU intro to Database Systems
- Tutorial on sqlite clone in C: Let's build a Simple Database. How Does a Database work?.
- sqlite opcodes
Parsing
Parser Combinator libraries
TCP socket programming
- See simple http server implementation with tcp sockets
Scope
- Primarily in-memory db.
- Occasionally will save its state on to the disk.
- Almost non-existent concurrency control?
- Inspiration from SQLite, but not the server part. For the server part take a look at Postgres.
- Can create custom column indexes.
- What about Joins?
- What about composite queries (i.e. instead of a table name in a select expression we include another select expression)?
Building a Minimal Viable Product (MVP)
Possible usage:
- You start the db server with
./minisql server start --db path/to/db/my-db.db --port 1433which will store the database as a filepath/to/db/my-db.dband open a TCP server on port1433 - Then on possibly a different machine you run
./minisql client connect server_ip_address:6666to start a client. This will open a REPL with which you can send queries/db management commands - TODO: We should also consider writing a rust library that allows you to spin up a client that connects to the server. How would the interface look like?
use mysql::{DB, DBConnection}
let maybe_conn: Result<DBConnection> = DB.connect({
port: 1433,
db_name: "db-name",
username: "bojack",
password: "12345"}
): Result<DBConnection> ?;
let conn: DBConnection = maybe_conn?;
// then we can execute queries
conn.perform("SELECT id, name, title, salary FROM employees"): ???
- Then with the client cli we can request creation of tables/insertion of data etc
> CREATE TABLE persons(id u32 PRIMARY KEY, name String, salary Float)
> INSERT 1, "Alice", 20.0 INTO persons;
> INSERT 2, "Bob", 30 INTO persons;
> INSERT 3, "Claire", 15 INTO persons;
- How should the SQL AST look like? For example when server parses
SELECT id, name FROM persons;, how will the parsing output look like? Consider something like
// Could also be called `SQLAbstractSyntaxTree`
enum Operations {
Select(Vector<FieldName>, TableName),
Update(...)
}
- We also have to write an interpreter for these operations. How will the db-state be represented in memory? For example how can we implement a table?
// Not exactly efficient, but how could we do better?
enum DbValue {
DbString(String),
DbNumber(Float),
DbByte(u8),
DbUUID(u32)
}
type ColumnName = String
// The below type is a type of a table row
type Row = HashMap<ColumnName, DbValue>
// Or you know... some appropriate Dictionary Type
HashMap::make![("id", 1), ("name", "Alice"), ("salary", 20.0)] : Row
// possible optimization: have a mapping
// column names ~> indexes
// so that we could represent rows as
type Row = Vec<DbValue>
// How to represent a table?
table : HashMap<Id, Row>
Vec<(Id, Row)>.
// suppose the row corresponds to 'INSERT 1, "Alice", 20.0 INTO persons;'
Row ~> Vec<DbValue>
e.g. Row ~> vec![DbUUID 1, DbSTring "Alice"]
Vec<Vec<DbValue>>
- TODO: Consider streaming the response to the client and not just dumping 10K rows at once.
Server
- Client input parsing/validation string input from the client to Abstract Syntax Tree (AST) that represents SQL query
- Code gen (Not necessary for MVP) from SQL AST generating bytecode for a more low-level VM.
- VM (Not necessary for MVP) implement low-level VM that governs the in-memory db.
- Persistence Serialize in-memory db state to a file. What format should it have? At first perhaps just a dumb json serialization? You also have to implement the deserialization.
- Client response Stream selected rows/status/error messages back to the client. What should the protocol look like? Take a look at Tabular Data Stream or just respond with json.
- Concurrency control Consider what happens when two clients simultaneously wish to updat ethe same row. There has to be some minimal mutual exclusion/locking.
- Security Should we worry about secure communication over TCP?
Client
- Opens TCP connection to server
- REPL has to provide a basic REPL interface
- Server Response decoder Has to properly format the data/status/error messages from the server. Does the client have to parse the queries? Or is that only the server's responsibility?