syntax-lab/PARTIAL_SYNTAX.md
2026-04-25 16:42:57 +02:00

6.5 KiB

Partial Syntax Notes

This document records the current candidate design for partial concrete syntax in the Lisp parser experiment.

The goal is not a generic parser framework. The goal is to make the current toy Lisp syntax rich enough to represent recovered malformed syntax while preserving the ability to distinguish valid trees from partial trees.

Core Constraints

  • ValidConcreteSyntax should be a subtype of PartialConcreteSyntax.
  • If a PartialConcreteSyntax contains no errors, it should be safe to coerce it to ValidConcreteSyntax without rebuilding the tree.
  • Concrete syntax may preserve syntactic choices that are semantically irrelevant.
  • Error payloads should remain structured and span-aware.

The never parameter is the main trick: when Error = never, error branches and error?: never fields become unconstructable.

Current Candidate Types

export type ConcreteSyntaxResult =
  | { tag: "valid"; value: ValidConcreteSyntax }
  | { tag: "invalid"; value: PartialConcreteSyntax };

export type ValidConcreteSyntax =
  Program<{ span: CodePointSpan }, never>;

export type PartialConcreteSyntax =
  Program<{ span: CodePointSpan }, ConcreteError>;

export type ConcreteError = ConcreteErrorNode[]; // Convention: non-empty.

export namespace ConcreteError {
  export function single(node: ConcreteErrorNode): ConcreteError {
    return [node];
  }
}

export type ConcreteErrorNode = {
  span: CodePointSpan;
  error: ParseError;
  panickedOver?: CodePointSpan;
};

export type DelimiterToken =
  | { tag: "open-paren"; span: CodePointSpan }
  | { tag: "close-paren"; span: CodePointSpan }
  | { tag: "open-bracket"; span: CodePointSpan }
  | { tag: "close-bracket"; span: CodePointSpan };

export type Program<Info, Error> = ({
  expressions: Expr<Info, Error>[];
  error?: Error;
} & Info);

export type Expr<Info, Error> =
  | Literal<Info, Error>
  | List<Info, Error>
  | ({ tag: "error-expression"; error: Error } & Info);

export type List<Info, Error> = ({
  tag: "list";
  open: DelimiterToken;
  items: ListItem<Info, Error>[];
  close?: DelimiterToken;
  error?: Error;
} & Info);

export type ListItem<Info, Error> =
  | Expr<Info, Error>
  | ({ tag: "error-list-separator"; error: Error } & Info);

export type Literal<Info, Error> =
  | ({ tag: "number"; value: number } & Info)
  | ({ tag: "error-number"; error: Error } & Info)
  | ({ tag: "identifier"; value: Identifier } & Info)
  | ({ tag: "error-identifier"; error: Error } & Info);

export type Identifier = string;

Error Ownership

Errors are owned by the smallest useful syntax node.

  • error-expression: syntax that cannot reasonably be interpreted as any expression node.
  • error-number: malformed numeric literal, such as 123fasd.
  • error-identifier: malformed identifier, if the language later has such cases.
  • error-list-separator: malformed relationship between neighboring list items.
  • list.error: structural error about the whole list, such as missing or mismatched close delimiter.
  • program.error: top-level recovery errors that do not belong to one expression.

ConcreteErrorNode.span is the primary diagnostic focus. panickedOver is recovery/debug metadata showing what source region was skipped while recovering.

Delimiters

Delimiter tokens are stored explicitly because this is concrete syntax.

Even if round and square lists are semantically equivalent later, the concrete tree should preserve whether the source used:

(a b c)
[a, b, c]

This is useful for UI, formatting, recovery diagnostics, and syntax experiments. A later semantic AST can erase this distinction.

Lisp Syntax Under Test

The experiment now has two list syntaxes.

Round lists have no separators:

(a b c d)

Square lists require commas between neighboring elements:

[a, b, c, d]

Square lists allow optional leading and trailing commas:

[,a, b, c, d]
[a, b, c, d,]
[,a, b, c, d,]

Adjacent top-level expressions are allowed:

foo(bar)

This is equivalent to:

foo (bar)

But malformed token fragments should not silently split into valid expressions:

123fasd

This should probably become an error-number, not number 123 followed by identifier fasd.

Examples To Drive Implementation

Valid Program

foo 123 (a b) [c, d, e]

Expected: ConcreteSyntaxResult.valid.

Unexpected Top-Level Close

foo )

Likely: valid foo plus program.error, or invalid program containing a top-level recovery error.

Unknown Expression In Round List

(foo @@@ 1)

Likely: error-expression item inside the list, with recovery continuing at 1.

Missing Close Delimiter

(foo 1

Likely: list node with open, no close, and list.error.

Mismatched Close Delimiter

[foo)

Likely: list node preserving open-bracket and close-paren, plus list.error.

Missing Square List Separator

[a, b c, d]

Likely: error-list-separator between b and c.

Extra Square List Separator

[a,, b]

Possible interpretations:

  • allow repeated commas as empty separators
  • produce error-list-separator
  • produce error-expression for a missing element

This needs a deliberate choice.

Malformed Number

123fasd

Likely: error-number covering the full malformed fragment.

Recovery Strategies To Compare

Panic Until Expression Start

Skip until a plausible expression start appears.

Good for simple garbage recovery, but may split malformed token fragments too aggressively.

Panic Until Delimiter Or Expression Start

Inside a list, skip until:

  • close delimiter
  • expression start
  • EOF

Good for preserving list structure.

Panic Until Whitespace Boundary

For token-like errors, skip the rest of the non-whitespace fragment.

Useful for:

123fasd

Separator-Aware Recovery

Inside square lists, use commas and close brackets as synchronization points.

Useful for:

[a, b c, d]
[a,, b]

Delimiter-Aware Recovery

Preserve exact open and close delimiter tokens, even if they mismatch.

Useful for:

[foo)
(foo]

Current Recommendation

The current type design is good enough to try.

Implementation should focus on concrete examples rather than further type abstraction:

123fasd
(foo @@@ 1)
(foo 1
[a, b c, d]
[foo)

After implementing those, the UI should reveal whether node-owned errors, error-list-separator, and explicit delimiter tokens feel useful or too heavy.