MiniSQL

Official Description

MiniSQL server

Create a simple SQL server that supports SELECT (including column selection), INSERT and DELETE. The SELECT and DELETE statements support a WHERE clause with a single column. It is also possible to create indexes (hashes).

The database works with persistent storage - it can be turned off and on without data loss. Create an interface to the database to work - either CLI or desktop application.

Resources

DB internals

CMU intro to Database Systems
Tutorial on sqlite clone in C: Let's build a Simple Database. How Does a Database work?.
sqlite opcodes

Parsing

Parser Combinator libraries

TCP socket programming

See simple http server implementation with tcp sockets

Scope

Primarily in-memory db.
Occasionally will save its state on to the disk.
Almost non-existent concurrency control?
Inspiration from SQLite, but not the server part. For the server part take a look at Postgres.
Can create custom column indexes.
What about Joins?
What about composite queries (i.e. instead of a table name in a select expression we include another select expression)?

Building a Minimal Viable Product (MVP)

Possible usage:

You start the db server with ./minisql server start --db path/to/db/my-db.db --port 1433 which will store the database as a file path/to/db/my-db.db and open a TCP server on port 1433
Then on possibly a different machine you run ./minisql client connect server_ip_address:6666 to start a client. This will open a REPL with which you can send queries/db management commands
TODO: We should also consider writing a rust library that allows you to spin up a client that connects to the server. How would the interface look like?

  use mysql::{DB, DBConnection}

  let maybe_conn: Result<DBConnection> = DB.connect({
    port: 1433,
    db_name: "db-name",
    username: "bojack",
    password: "12345"}
    ): Result<DBConnection> ?;
  let conn: DBConnection = maybe_conn?;

  // then we can execute queries 
  conn.perform("SELECT id, name, title, salary FROM employees"): ???

Then with the client cli we can request creation of tables/insertion of data etc

> CREATE TABLE persons(id u32 PRIMARY KEY, name String, salary Float)

> INSERT 1, "Alice", 20.0 INTO persons;
> INSERT 2, "Bob", 30 INTO persons;
> INSERT 3, "Claire", 15 INTO persons;

How should the SQL AST look like? For example when server parses SELECT id, name FROM persons;, how will the parsing output look like? Consider something like

// Could also be called `SQLAbstractSyntaxTree`
enum Operations {
  Select(Vector<FieldName>, TableName),
  Update(...)
}

We also have to write an interpreter for these operations. How will the db-state be represented in memory? For example how can we implement a table?

// Not exactly efficient, but how could we do better?
enum DbValue {
  DbString(String),
  DbNumber(Float),
  DbByte(u8),
  DbUUID(u32)
}

type ColumnName = String

// The below type is a type of a table row
type Row = HashMap<ColumnName, DbValue>

// Or you know... some appropriate Dictionary Type
HashMap::make![("id", 1), ("name", "Alice"), ("salary", 20.0)] : Row

// possible optimization: have a mapping
// column names ~> indexes
// so that we could represent rows as
type Row = Vec<DbValue>


// How to represent a table?
table : HashMap<Id, Row>

Vec<(Id, Row)>.

// suppose the row corresponds to 'INSERT 1, "Alice", 20.0 INTO persons;'
Row ~> Vec<DbValue> 
e.g. Row ~> vec![DbUUID 1, DbSTring "Alice"]

Vec<Vec<DbValue>>

TODO: Consider streaming the response to the client and not just dumping 10K rows at once.

Server

Client input parsing/validation string input from the client to Abstract Syntax Tree (AST) that represents SQL query
Code gen (Not necessary for MVP) from SQL AST generating bytecode for a more low-level VM.
VM (Not necessary for MVP) implement low-level VM that governs the in-memory db.
Persistence Serialize in-memory db state to a file. What format should it have? At first perhaps just a dumb json serialization? You also have to implement the deserialization.
Client response Stream selected rows/status/error messages back to the client. What should the protocol look like? Take a look at Tabular Data Stream or just respond with json.
Concurrency control Consider what happens when two clients simultaneously wish to updat ethe same row. There has to be some minimal mutual exclusion/locking.
Security Should we worry about secure communication over TCP?

Client

Opens TCP connection to server
REPL has to provide a basic REPL interface
Server Response decoder Has to properly format the data/status/error messages from the server. Does the client have to parse the queries? Or is that only the server's responsibility?

5.1 KiB Raw Blame History