6.5 KiB
Partial Syntax Notes
This document records the current candidate design for partial concrete syntax in the Lisp parser experiment.
The goal is not a generic parser framework. The goal is to make the current toy Lisp syntax rich enough to represent recovered malformed syntax while preserving the ability to distinguish valid trees from partial trees.
Core Constraints
ValidConcreteSyntaxshould be a subtype ofPartialConcreteSyntax.- If a
PartialConcreteSyntaxcontains no errors, it should be safe to coerce it toValidConcreteSyntaxwithout rebuilding the tree. - Concrete syntax may preserve syntactic choices that are semantically irrelevant.
- Error payloads should remain structured and span-aware.
The never parameter is the main trick: when Error = never, error branches and error?: never fields become unconstructable.
Current Candidate Types
export type ConcreteSyntaxResult =
| { tag: "valid"; value: ValidConcreteSyntax }
| { tag: "invalid"; value: PartialConcreteSyntax };
export type ValidConcreteSyntax =
Program<{ span: CodePointSpan }, never>;
export type PartialConcreteSyntax =
Program<{ span: CodePointSpan }, ConcreteError>;
export type ConcreteError = ConcreteErrorNode[]; // Convention: non-empty.
export namespace ConcreteError {
export function single(node: ConcreteErrorNode): ConcreteError {
return [node];
}
}
export type ConcreteErrorNode = {
span: CodePointSpan;
error: ParseError;
panickedOver?: CodePointSpan;
};
export type DelimiterToken =
| { tag: "open-paren"; span: CodePointSpan }
| { tag: "close-paren"; span: CodePointSpan }
| { tag: "open-bracket"; span: CodePointSpan }
| { tag: "close-bracket"; span: CodePointSpan };
export type Program<Info, Error> = ({
expressions: Expr<Info, Error>[];
error?: Error;
} & Info);
export type Expr<Info, Error> =
| Literal<Info, Error>
| List<Info, Error>
| ({ tag: "error-expression"; error: Error } & Info);
export type List<Info, Error> = ({
tag: "list";
open: DelimiterToken;
items: ListItem<Info, Error>[];
close?: DelimiterToken;
error?: Error;
} & Info);
export type ListItem<Info, Error> =
| Expr<Info, Error>
| ({ tag: "error-list-separator"; error: Error } & Info);
export type Literal<Info, Error> =
| ({ tag: "number"; value: number } & Info)
| ({ tag: "error-number"; error: Error } & Info)
| ({ tag: "identifier"; value: Identifier } & Info)
| ({ tag: "error-identifier"; error: Error } & Info);
export type Identifier = string;
Error Ownership
Errors are owned by the smallest useful syntax node.
error-expression: syntax that cannot reasonably be interpreted as any expression node.error-number: malformed numeric literal, such as123fasd.error-identifier: malformed identifier, if the language later has such cases.error-list-separator: malformed relationship between neighboring list items.list.error: structural error about the whole list, such as missing or mismatched close delimiter.program.error: top-level recovery errors that do not belong to one expression.
ConcreteErrorNode.span is the primary diagnostic focus. panickedOver is recovery/debug metadata showing what source region was skipped while recovering.
Delimiters
Delimiter tokens are stored explicitly because this is concrete syntax.
Even if round and square lists are semantically equivalent later, the concrete tree should preserve whether the source used:
(a b c)
[a, b, c]
This is useful for UI, formatting, recovery diagnostics, and syntax experiments. A later semantic AST can erase this distinction.
Lisp Syntax Under Test
The experiment now has two list syntaxes.
Round lists have no separators:
(a b c d)
Square lists require commas between neighboring elements:
[a, b, c, d]
Square lists allow optional leading and trailing commas:
[,a, b, c, d]
[a, b, c, d,]
[,a, b, c, d,]
Adjacent top-level expressions are allowed:
foo(bar)
This is equivalent to:
foo (bar)
But malformed token fragments should not silently split into valid expressions:
123fasd
This should probably become an error-number, not number 123 followed by identifier fasd.
Examples To Drive Implementation
Valid Program
foo 123 (a b) [c, d, e]
Expected: ConcreteSyntaxResult.valid.
Unexpected Top-Level Close
foo )
Likely: valid foo plus program.error, or invalid program containing a top-level recovery error.
Unknown Expression In Round List
(foo @@@ 1)
Likely: error-expression item inside the list, with recovery continuing at 1.
Missing Close Delimiter
(foo 1
Likely: list node with open, no close, and list.error.
Mismatched Close Delimiter
[foo)
Likely: list node preserving open-bracket and close-paren, plus list.error.
Missing Square List Separator
[a, b c, d]
Likely: error-list-separator between b and c.
Extra Square List Separator
[a,, b]
Possible interpretations:
- allow repeated commas as empty separators
- produce
error-list-separator - produce
error-expressionfor a missing element
This needs a deliberate choice.
Malformed Number
123fasd
Likely: error-number covering the full malformed fragment.
Recovery Strategies To Compare
Panic Until Expression Start
Skip until a plausible expression start appears.
Good for simple garbage recovery, but may split malformed token fragments too aggressively.
Panic Until Delimiter Or Expression Start
Inside a list, skip until:
- close delimiter
- expression start
- EOF
Good for preserving list structure.
Panic Until Whitespace Boundary
For token-like errors, skip the rest of the non-whitespace fragment.
Useful for:
123fasd
Separator-Aware Recovery
Inside square lists, use commas and close brackets as synchronization points.
Useful for:
[a, b c, d]
[a,, b]
Delimiter-Aware Recovery
Preserve exact open and close delimiter tokens, even if they mismatch.
Useful for:
[foo)
(foo]
Current Recommendation
The current type design is good enough to try.
Implementation should focus on concrete examples rather than further type abstraction:
123fasd
(foo @@@ 1)
(foo 1
[a, b c, d]
[foo)
After implementing those, the UI should reveal whether node-owned errors, error-list-separator, and explicit delimiter tokens feel useful or too heavy.