Esc
Start typing to search...

Changelog

All notable changes to Keel are documented here. This project follows Keep a Changelog and Semantic Versioning.

Unreleased

234 changes

Added

132 items
  • Distribution module — New stdlib module for probability distributions with 12 distribution types (Normal, Uniform, Exponential, Poisson, Bernoulli, Binomial, Gamma, Beta, ChiSquared, StudentT, Weibull, LogNormal). Functions include sample, sampleSeeded, sampleN, pdf, cdf, quantile, mean, variance, stdDev, skewness, entropy. Constructor functions return Result Distribution String for parameter validation. Backed by statrs and rand crates.
  • Unified KeelError type — New src/errors.rs module with a KeelError enum that wraps all phase-specific errors (lexer, parser, type checker, compiler, VM) into a single error type. Includes shared SymbolError variants for undeclared variables, modules, functions, enums, and enum variants, used by both parser and compiler.
  • Structured parser error modules — Extracted parser errors into src/parser/errors/ with dedicated files: parser_error.rs (syntax errors), scope_error.rs (scope/symbol errors), type_error.rs (type checking errors), typed.rs (unified TypedParserError enum).
  • StdlibFunction abstraction — New unified struct in stdlib/mod.rs that defines each stdlib function once with name, arity, implementation, type signature, and documentation. Helper functions register_from_definitions(), docs_from_definitions(), and names_from_definitions() derive both module registration and documentation from the same source, eliminating drift.
  • LetBindingLiteralPattern error — New typed parser error for literal patterns in let bindings. let 5 = 5, let "hello" = "hello", and similar literal patterns now produce a helpful error directing users to use case expressions instead.
  • Pre-parse syntax validation — New check_module_syntax(), check_type_alias_syntax(), and check_enum_syntax() functions detect common syntax errors before full parsing, with typed error variants: ModuleMissingExposing, ModuleMissingExposingParens, ModuleUnclosedExposing, TypeAliasMissingEquals, TypeAliasMissingName, EnumMissingEquals, EnumMissingName, EnumMissingVariants, EnumVariantLowercase.
  • Test coverage for untested error variants — Added tests for RecordAccessMissingFieldName, RecordAccessDoubleDot, TrailingComma, and BlockNotNested error variants.
  • Decimal type — New primitive type for exact decimal arithmetic, avoiding floating-point precision issues. Supports literal syntax with d suffix (42d, 3.14d, -0.001d). Full arithmetic (+, -, *, /, %, ^), comparison (==, !=, <, <=, >, >=), and negation. Backed by rust_decimal crate with 28-digit precision.
  • Decimal module — 40+ stdlib functions for decimal operations:
  • Creation: fromInt, fromFloat, fromString, parse
  • Conversion: toInt, toFloat, toString, toStringWithPrecision
  • Arithmetic: add, sub, mul, div, rem, pow, abs, negate
  • Rounding: round, roundTo, floor, ceil, trunc, truncTo
  • Comparison: compare, min, max, isPositive, isNegative, isZero
  • Constants: zero, one, pi, e, maxValue, minValue
  • Math: sqrt, ln, log10, exp, sin, cos, tan
  • Date module — 25 stdlib functions for date manipulation:
  • Creation: fromYmd, today, epoch
  • Parsing: parseIso, parse
  • Formatting: toIsoString, format
  • Components: year, month, day, weekday, dayOfYear, weekNumber, daysInMonth, isLeapYear
  • Arithmetic: addDays, addWeeks, addMonths, addYears
  • Comparison: isBefore, isAfter, isEqual, compare, daysBetween
  • Time module — 23 stdlib functions for time manipulation:
  • Creation: fromHms, fromHmsNano, midnight, noon
  • Parsing: parseIso, parse
  • Formatting: toIsoString, format
  • Components: hour, minute, second, nanosecond
  • Arithmetic: addHours, addMinutes, addSeconds, addNanos
  • Comparison: isBefore, isAfter, isEqual, compare
  • Duration module — 24 stdlib functions for duration manipulation:
  • Creation: fromNanos, fromMicros, fromMillis, fromSecs, fromMins, fromHours, fromDays, fromWeeks, zero
  • Components: nanos, micros, millis, secs, mins, hours, days, weeks
  • Arithmetic: add, sub, mul, div, abs, negate
  • Comparison: isPositive, isNegative, isZero, compare
  • DateTime interop functions — 4 new functions for converting between Date, Time, and DateTime:
  • DateTime.getDate : DateTime -> Date — extract date component
  • DateTime.getTime : DateTime -> Time — extract time component
  • DateTime.fromDateType : Date -> DateTime — convert Date to DateTime at midnight UTC
  • DateTime.combine : Date -> Time -> DateTime — combine Date and Time into DateTime
  • DataFrame temporal type conversion — Runtime extraction of temporal types from Polars DataFrames. AnyValue::Date, AnyValue::Time, AnyValue::Duration, and AnyValue::Datetime now convert to native Keel Date, Time, Duration, and DateTime objects wrapped in Maybe.
  • FunctionDoc examples for temporal modules — All functions in Date (25), Time (23), Duration (24), and DateTime (57) modules now include working code examples in their FunctionDoc.
  • Parser error variants for inline files — New typed errors for inline expression parsing: InlineMissingPath (missing file path), InlineInvalidPath (non-string path), InlinePassingMissingParens (missing parentheses), InlineInvalidVar (invalid token in var list), InlineSpreadWithNamed (mixing .. with named vars), InlineMultipleSpread (multiple .. operators). Each error includes helpful hints and notes.
  • Parser error variants for lambda expressionsLambdaUnclosedPipe (missing closing |) and LambdaMissingBody (no body after |params|) with recovery and hints.
  • Parser error variants for parameterized modulesModuleParamMissingType and ModuleExposeMissingType for parameters and exposed variables missing type annotations in module (x) exposing (y) syntax.
  • Comprehensive parser test coverage — All parser error tests now check against specific TypedParserError variants via state.typed_errors instead of generic ast.is_err() checks. Tests verify exact error types like TypedParserError::Parser(ParserError::LambdaUnclosedPipe).
  • DataFrame.Expr module — Composable, type-safe column expressions that compile directly to Polars operations with SIMD optimization and parallel execution. Unlike closures (which may fall back to slower row-by-row evaluation), expressions are always fast. The module provides:
  • Column references and literals: col "name", lit 42, lit 3.14, lit "hello"
  • Arithmetic: add, sub, mul, div, mod, pow
  • Comparison: eq, neq, gt, gte, lt, lte
  • Boolean: and, or, not
  • Aggregations: sum, mean, min, max, count, first, last, nUnique, std, var, median, quantile
  • String operations: strLength, strUpper, strLower, strContains, strStartsWith, strEndsWith, strReplace, strTrim, strSlice
  • Math: abs, sqrt, floor, ceil, round, log, log10, exp
  • Null handling: fillNull, isNull, isNotNull, dropNulls
  • Conditional: cond for if-then-else expressions
  • Window functions: over, rowNumber, rank, denseRank, lag, lead
  • Naming: named to alias output columns
  • Compile-time constant evaluation framework — New const_eval.rs module provides comprehensive constant folding during compilation. Evaluates arithmetic (+, -, *, /, //, %, ^), comparison (==, !=, <, <=, >, >=), boolean (&&, ||, not), string concatenation (++), list cons (::), and if-then-else expressions with constant conditions at compile time. Includes a stdlib function registry for const-evaluating Math.abs, String.length, List.length, List.isEmpty, List.reverse, and List.sum with constant arguments. Lambda-safe: parameters correctly shadow outer variables to prevent incorrect folding inside function bodies. Refactored try_eval_const_string() to use the new unified framework. Comprehensive test suite with 57 tests covering arithmetic, strings, booleans, edge cases, and lambda scoping.
  • has_explicit_type field in Binding AST node — The Binding struct now includes a has_explicit_type: bool field that tracks whether the user explicitly wrote a type annotation in the source code. This allows formatters and other tools to distinguish user-written annotations (let x : Int = 42) from parser-inferred types, enabling preservation of explicit annotations while omitting inferred ones.
  • DataFrame.describeLabel — Returns a formatted string describing value labels for a single column. Takes column name and DataFrame, returning a multi-line string showing each value-label mapping. Returns empty string if the column has no value labels.
  • DataFrame.describeLabels — Returns a formatted string describing all value labels in a DataFrame, sorted by column name. Each column's value labels are shown with their integer codes and string labels.
  • DataFrame.describeVariables — STATA-style variable overview returning a DataFrame with one row per column, showing name, type, variable label, value labels (abbreviated), and metadata. Useful for exploring dataset structure.
  • Column-selective DataFrame I/OreadCsvColumns, readJsonColumns, readParquetColumns, readDtaColumns accept a [String] column list and a file path, reading only the specified columns. Compile-time schema validation catches nonexistent columns. 8 tests covering shape, column selection, compile-time validation, pipe chains, and JSON/Parquet roundtrips.
  • Parser case-sensitivity diagnostics — New parser errors with hints for lowercase module names (ModuleNameLowercase), type names (TypeNameLowercase), import aliases (ImportAliasLowercase), and uppercase pattern aliases (PatternAliasUppercase). Each error suggests the corrected casing.
  • Curried readXxxColumns type inference — Compiler and type-inference engine now handle curried 2-arg readXxxColumns calls, caching column-filtered DataFrame schemas for downstream type checking.
  • Table module — New Table stdlib module for cross-tabulation and summary tables, inspired by Stata's table command. Quick forms (Table.freq "sex", Table.cross "sex" "bp") and a builder pattern (Table.create |> Table.rows ["sex"] |> Table.cols ["bp"] |> Table.count |> Table.show). Supports 8 statistics (count, percent, meanOf, sdOf, medianOf, minOf, maxOf, sumOf), faceting (facetBy), totals suppression (noTotals), layout rearrangement without recomputation (relayout), and DataFrame export (toDataFrame). freq/cross accept both String and [String] for flexible dimensioning. Dedicated Table display type with hierarchical headers, box-drawing separators, value label integration, and comma-formatted numbers. 70 tests covering core functionality, edge cases, error messages, display rendering, and VM integration.
  • FunctionDoc categories for all stdlib modules — Every FunctionDoc across all stdlib modules now has a category field. Categories group related functions within a module (e.g., List: Access/Build/Reduce/Search/Slice/Sort/Transform; Json: Parse/Encode/Access; Http: Request/Modify/Execute). Categories render in LSP hover documentation via to_markdown().
  • CODE_REVIEW.md — Comprehensive code review document tracking 18 findings across security, performance, and architecture dimensions with fix status.
  • FFI safety documentation — Added // SAFETY: comments to all unsafe blocks and unsafe extern "C" callbacks in dataframe_dta.rs, documenting pointer provenance, exclusive access invariants, buffer validity, and ReadStat's ownership model.
  • DTA fuzz target — New fuzz_dta fuzz target that feeds arbitrary bytes to read_dta_file (with and without row limits), plus seed corpus of valid .dta files covering mixed types, nullable columns, value labels, and metadata.
  • VmError::AllocationLimitExceeded — New error variant with hint for when data structure allocation exceeds the 10M element safety limit.
  • ValueLabelSet module — Bidirectional Int ↔ String mapping for statistical value labels. Functions: empty, fromList, insert, remove, getLabel, getValue, values, labels, toList, size, isEmpty, merge, remap. Integrates with DataFrame value label system.
  • DataFrame variable labels — STATA-style descriptive labels for columns. Functions: withVarLabel, getVarLabel, getVarLabels, removeVarLabel. Labels are preserved through filter/select/join operations and round-trip through .dta files.
  • DataFrame value labels — Map integer codes to human-readable labels (e.g., 1→"Male", 2→"Female"). Functions: withValueLabels, withValueLabelsStrict (validates all values have labels), getValueLabels, getAllValueLabels, removeValueLabels. Labels propagate through operations and survive .dta I/O.
  • DataFrame display modes — Control how labeled columns display: Raw (show codes), Labeled (show labels), Both (show "label (code)"). Function: withDisplayMode, getDisplayMode.
  • DataFrame.recode — Remap integer values in a column with automatic value label transfer. E.g., recode [(1, 10), (2, 20)] changes 1→10, 2→20 and updates associated value labels.
  • STATA .dta label support — Full round-trip support for variable labels and value labels in readDta/writeDta. Labels stored in DataFrameMetadata.var_labels and DataFrameMetadata.value_labels (using ValueLabelSet).
  • Fuzz regression test suite — 64 tests in tests/infrastructure/fuzz_regression.rs covering crash artifacts, OOM artifacts, timeout artifacts, determinism checks, and normal program regression. 5 tests are #[ignore]d documenting known stack overflow bugs with mixed nesting patterns.
  • VmError::ExecutionLimitExceeded — New error variant for when the VM exceeds its step limit.
  • Elm-style multi-line literals as function arguments — Lists, records, and tuples can now be formatted with leading commas on separate lines when used as function arguments, matching Elm's indentation-sensitive syntax. This enables clean, readable data definitions without needing intermediate let bindings.
  • DataFrame window functions — SQL-style window functions for advanced analytics: ranking, running totals, moving averages, and lag/lead operations. Window functions preserve row count (unlike aggregations) and partition data for grouped calculations.
  • partitionBy [cols] — Define partition boundaries (can be nested for different groupings)
  • orderBy [cols] — Define row ordering within partitions
  • collect — Materialize WindowedDataFrame back to DataFrame
  • withRowNumber "col" — Sequential numbering (1, 2, 3, ...) per partition
  • withRank "col" — Ranking with gaps for ties (1, 2, 2, 4, ...)
  • withDenseRank "col" — Ranking without gaps (1, 2, 2, 3, ...)
  • withLag "result" "source" offset — Value from N rows before (returns Maybe T)
  • withLead "result" "source" offset — Value from N rows ahead (returns Maybe T)
  • withRollingSum "result" "source" N — Sum over N rows (returns Maybe T)
  • withRollingMean "result" "source" N — Average over N rows (returns Maybe Float)
  • withRollingMin "result" "source" N — Minimum over N rows (returns Maybe T)
  • withRollingMax "result" "source" N — Maximum over N rows (returns Maybe T)
  • withCumSum "result" "source" — Cumulative sum from partition start
  • withCumMean "result" "source" — Cumulative average (returns Float)
  • withCumMin "result" "source" — Cumulative minimum
  • withCumMax "result" "source" — Cumulative maximum
  • Compile-time schema tracking through window operations
  • Column name validation for partition/order columns
  • Type propagation: withRowNumber adds Int, withLag adds Maybe T, etc.
  • Gradual typing support for untyped DataFrames
  • Backed by Polars window functions (rank, cum_agg, rolling_window features)
  • WindowedDataFrame type tracks partition/order metadata
  • Function overloading: partitionBy works on both DataFrame and WindowedDataFrame
  • Comprehensive test suite: 45 tests covering normal operations, edge cases, and error conditions
  • DataFrame schema validation with compile-time type checking — DataFrames can now have their schemas validated at compile time using type annotations. This enables "data contracts" where you declare the expected schema and get compile errors if the actual data doesn't match.
  • DataFrame { col: Type, ... } — DataFrame type constructor with schema
  • { col: Type, .. } — Open record type (allows extra fields)
  • { col: Type } — Closed record type (exact match required)
  • Column existence — Missing columns produce compile errors
  • Type compatibility — Column types must match declarations
  • Extra columns — Closed schemas reject extras, open schemas allow them
  • Variable paths skip validation (can't validate non-literals)
  • Untyped DataFrames continue to work without annotations
  • Schemas propagate through operations (select, drop, etc.)
  • Import aliases — Modules can now be imported with alternative names using the as keyword. Aliases provide convenient shorthand for module references throughout your code.
  • DateTime standard library module (48 functions) — UTC-based date and time operations backed by the chrono crate. All DateTime values are opaque NativeObject instances wrapping chrono::DateTime. Functions follow the pipe-last convention for natural composition.
  • Creation (7 functions): now, fromParts, fromDate, fromTimestamp, fromTimestampMillis, toTimestamp, toTimestampMillis
  • Parsing (4 functions): parse, parseIso8601, parseRfc3339, parseFormat — return Maybe DateTime for safe handling
  • Formatting (3 functions): format, formatRfc3339, formatCustom — support ISO8601, RFC3339, and custom strftime patterns
  • Components (9 functions): year, month, day, hour, minute, second, weekday, dayOfYear, weekNumber — extract datetime parts as integers
  • Manipulation (8 functions): addMillis, addSeconds, addMinutes, addHours, addDays, addWeeks, addMonths, addYears — immutable time arithmetic
  • Comparison (4 functions): isBefore, isAfter, isEqual, compare — total ordering for DateTime values
  • Duration/Difference (5 functions): diffMillis, diffSeconds, diffMinutes, diffHours, diffDays — calculate time spans as Int milliseconds
  • Calendar Boundaries (8 functions): startOfDay, endOfDay, startOfWeek, endOfWeek, startOfMonth, endOfMonth, startOfYear, endOfYear — calendar-aware rounding
  • DataFrame.fromLists function — Create DataFrames from a list of (column name, values) tuples. This provides a column-oriented alternative to fromRecords that is more ergonomic for programmatic data construction and composes naturally with List.zip:
  • Compile-time DataFrame column validation — When DataFrame.readCsv, readJson, readParquet, or readDta is called with a literal string path, the compiler reads the file's schema (column names and types) at compile time and validates column references in subsequent operations. Invalid column names produce a DataFrameColumnNotFound compile error with a list of available columns. Type information propagates through pipe chains (select, drop, rename, withColumn, column, sort/filter operations). Gradual typing: untyped DataFrames (e.g., from variables or non-literal paths) skip validation.
  • Maybe-wrapped DataFrame columns — DataFrame column values are now properly wrapped in Maybe types (Just x for values, Nothing for nulls) when crossing from Polars to Keel. This makes null/missing data explicit and pattern-matchable instead of silently coercing nulls to defaults (0, "", false). Applies to column, toRecords, and all column extraction paths.
  • KeelSchema runtime type system for DataFrames — Every DataFrame now carries a KeelSchema that maps each column to its Keel type (always Maybe T since Polars columns are nullable). Schema is auto-derived from the Polars DataFrame. Column types are displayed as Keel types (e.g., Maybe Int, Maybe String, Maybe Float) instead of Polars dtypes in DataFrame output and dtypes results. Type mapping: Polars Int8/16/32/64 and UInt8/16/32/64 → Maybe Int, Float32/64 → Maybe Float, Boolean → Maybe Bool, String → Maybe String, other → Maybe String (fallback).
  • Maybe-aware withColumn and fromRecordswithColumn and fromRecords now accept both Maybe-wrapped lists ([Just 1, Nothing, Just 3]) and bare value lists ([1, 2, 3]) for backward compatibility. Maybe-wrapped values are unwrapped when creating Polars columns: Just x → value, Nothing → null.
  • unwrap_maybe helper — Internal helper function for detecting and unwrapping Maybe enum values in the DataFrame module, following the same pattern as unwrap_enum_value in the Maybe module.

Changed

71 items
  • All stdlib modules refactored to StdlibFunction pattern — Every stdlib module (List, String, Math, Decimal, Date, Time, Duration, DateTime, IO, Http, Json, DataFrame, Table, ValueLabelSet, Maybe, Result) now defines functions as Vec instead of manually building export HashMaps. Reduces boilerplate and ensures documentation is always in sync with implementation.
  • Typed errors passed directly to Chumsky — All parser error emissions now pass ParserError variants directly to Rich::custom() instead of calling .to_string(), improving type safety and consistency across 11 locations in tuples, functions, modules, patterns, type aliases, enums, imports, and let bindings.
  • Test suite reorganization — Restructured test directories for clarity: tests/errors/tests/compiler/ (compiler error tests), tests/features/tests/integration/ (language feature integration tests), tests/infrastructure/tests/runtime/ (VM and runtime tests). Added new test files for lambda type inference, nested pattern types, type aliases, exhaustiveness, guards, pattern type mismatch, and more.
  • Parser test improvements — Extended test coverage across 20+ parser test files with strong assertions against TypedParserError variants, consistent helper functions, and comprehensive error scenario coverage.
  • Unified parser test suite — Standardized all 48 parser test files (tests/parser/*.rs) with consistent naming (dropped test_ prefixes), structured /// doc comments explaining why each test should pass or fail, removed debug println! statements, and organized sections (helpers → success tests → comments → errors → edge cases). Added pass/fail reasoning with bullet points to every test across all files including edge_cases.rs, error_hints.rs, expr_list_access.rs, trailing_tokens.rs, expr_types.rs, expr_inline_file.rs, lambda_case_scope.rs, and types.rs.
  • Parser emits typed errors to state — All parsers now push errors to state.typed_errors in addition to emitting chumsky errors, enabling tests and tooling to match on specific error variants.
  • Parameterized module exposing requires typesparse_exposing_args_typed() enforces type annotations on all exposed variables in parameterized modules.
  • DataFrame module refactored into submodules — Split the 7,780-line dataframe.rs into a directory-based module structure: security.rs (sandbox config), metadata.rs (value labels, display modes), lineage.rs (transformation tracking), types.rs (KeelDataFrame, KeelSchema). Reduces main module to 6,680 lines with cleaner separation of concerns.
  • DataFrame label functions category — Renamed FunctionDoc category from "Labels" to "Label" (singular) for consistency with other categories (Window, Transform, Metadata, etc.).
  • VM fields restricted to pub(crate) — All 16 VM struct fields (registers, heap, call_stack, etc.) changed from pub to pub(crate). External consumers use VM::compile() and accessor methods; no API change. GC tests moved from integration tests to unit tests inside vm_core.rs.
  • Type module moved to crate rootsrc/compiler/types.rssrc/types.rs as a top-level shared module. Breaks the parser → compiler dependency. Re-exported from compiler::types for backward compatibility. Added types_compatible() function for overload resolution.
  • Compiler struct refactored — Extracted FileContext (run stack, current dir, inline file types) and TypeChecker (inference + errors) into dedicated structs. Reduces field count on KeelCompiler and groups related state.
  • Parser scope uses flat arenaScopeState now stores scopes in a flat Vec indexed by ScopeId with parent links, replacing nested ownership. Deduplicated scope-walking logic into collect_names_matching() helper.
  • Dual-scope architecture documented — Both compiler::scope and parser::scope now have module-level docs explaining why two separate scope systems exist and their different storage models.
  • Typed parser error tracking — Parser now pushes TypedParserError entries to state.typed_errors alongside existing Rich error emissions. This enables structured, pattern-matchable error inspection without string comparison. All parser error sites (indentation, blocks, enums, modules, type aliases, functions, lambdas, lists, records, tuples, patterns, let bindings) now emit typed errors.
  • Stdlib error handling improvements — Replaced panics and unwrap() calls with proper error propagation in dataframe.rs (kdf_to_output, describe), datetime.rs (datetime_to_output, date parsing), string.rs (pad functions), and enum_helpers.rs (use VmError instead of string errors).
  • Parser tests use typed error assertions — Migrated parser tests (decl_modules, enum_scope_errors, parse_types) from string-matching against Rich error messages to typed pattern matching via state.typed_errors, making tests more robust and less brittle.
  • All stdlib FunctionDoc examples are now self-contained and tested — Every example string across all 10 stdlib modules (List, String, Math, Result, Maybe, Json, IO, Http, DataFrame, DateTime) is now a complete, runnable Keel program with proper import statements. Added 47 missing DateTime examples. A new test (test_all_doc_examples_compile) validates all 234 examples compile and run via VM::compile_checked. Fixed issues with multi-line example parsing, constructor precedence in pipes (Ok/Err/Just now wrapped in parens), Math.pi/Math.e auto-evaluation, and DataFrame column metadata examples.
  • Test suite migrated to strict error handling with VM::compile_checked — All 920+ test cases now use VM::compile_checked() instead of VM::compile() for explicit error handling. This internal change enforces zero-tolerance for silent test failures and provides better error messages when tests fail. The VM::compile() method silently swallowed errors and returned empty vectors, while VM::compile_checked() returns Result, KeelError> for proper error propagation. No user-facing API changes.
  • DataFrame lineage test suite rewritten with comprehensive assertions — Expanded from 15 shallow tests to 60 tests with deep structural verification. Tests now validate exact record structures from DataFrame.lineage and DataFrame.columnLineage, verify origin types (File, Aggregated, JoinedFrom), check transformation histories, confirm dependency tracking, and test error messages with available column suggestions.
  • Internal refactoring: centralized stdlib helper functions — Eliminated ~200 lines of duplicate code across stdlib modules by creating two centralized utility modules: enum_helpers (for Maybe/Result enum constructors) and type_extractors (for RegisterValue type extraction). This refactoring consolidates 15 duplicates of enum constructor functions (make_just, make_nothing, make_ok, make_err) and 12+ duplicates of type extractors (get_int, get_string, get_float) that were previously scattered across 10 stdlib modules. All stdlib modules now import from the centralized helpers. No user-facing changes — this is purely internal code organization that improves maintainability and ensures consistent behavior across the stdlib.
  • DateTime.now now requires Unit argument — Changed from DateTime.now (zero-arity) to DateTime.now () to follow Keel's function calling conventions. Prevents parser ambiguity with property access.
  • DateTime.parseFormat now supports date-only formats — The parseFormat function now accepts both datetime formats (with time) and date-only formats (e.g., "%Y-%m-%d"). Date-only formats assume midnight (00:00:00) UTC. Previously only datetime formats were supported.
  • DataFrame.column returns [Maybe a] instead of [a] — The column function type signature changed from String -> DataFrame -> [a] to String -> DataFrame -> [Maybe a]. Values that were previously bare (e.g., Int(30)) are now Just(Int(30)), and nulls that were silently coerced to defaults are now Nothing. Code that extracts columns needs to handle Maybe values.
  • DataFrame.dtypes returns Keel types instead of Polars dtypes — The dtypes function now returns Keel type names (e.g., "Maybe Int", "Maybe String") instead of Polars dtype names (e.g., "i64", "str"). DataFrame display output also shows Keel types.
  • DataFrame.toRecords returns Maybe-wrapped field values — Record fields from toRecords are now Maybe-wrapped. A field that was age: 30 is now age: Just 30, and null fields are Nothing instead of Unit.
  • Compile-time security checks for disabled module functions — When a security-gated module is disabled, calling its restricted functions now produces a compile-time error instead of a runtime error. The compiler checks the environment variable at compilation time and rejects calls to disabled functions before any code runs. Applies to IO (18 gated functions), Http (send), and DataFrame (6-8 I/O functions). Error messages include actionable hints showing which environment variable to set:
  • STATA .dta file support (behind stata feature flag) — Read and write STATA .dta files with full metadata support via ReadStat C FFI. Vendored ReadStat as a git submodule, compiled via build.rs with the cc crate. Two new DataFrame functions: readDta and writeDta.
  • DataFrame metadata system — Attach dataset-level and column-level metadata to DataFrames via a KeelDataFrame wrapper around Polars DataFrames. Metadata is preserved through operations, persisted to Parquet files, and displayed in output. 8 new functions in the DataFrame module:
  • Dataset-level: setMeta, getMeta, allMeta — set/get/list key-value metadata on a DataFrame
  • Column-level: setColumnMeta, getColumnMeta, allColumnMeta — set/get/list metadata on individual columns
  • Inspection: describeMeta — returns a DataFrame summarizing all metadata
  • I/O: writeParquet — writes DataFrames to Parquet files with metadata persisted via the "keel:metadata" key-value entry in the Parquet file metadata
  • MetaValue type for metadata storage — Self-contained value type (String, Int, Float, Bool, List, Record) that owns all its data without GC concerns. Derives Serialize/Deserialize for JSON roundtripping to Parquet file metadata.
  • KeelDataFrame wrapper struct — Wraps polars::DataFrame with DataFrameMetadata (dataset-level HashMap + per-column HashMap>). Convenience methods: filter_columns, rename_column, merge, dataset_only, is_empty.
  • DataFrame metadata test suite (24 new tests) — Covers set/get/all for dataset and column metadata, describeMeta, metadata propagation through head/filter/sort/select/rename/aggregation, pipe chains with metadata, Parquet roundtrip with and without metadata, and dataset name in display output.
  • readParquet metadata restorationreadParquet now reads the "keel:metadata" key from Parquet file metadata and restores it into the KeelDataFrame, enabling metadata roundtripping through Parquet files.
  • MimeFormatter trait for rich frontend output — New MimeFormatter trait in vm::values::mime_format enables rich MIME type output for Jupyter kernels, LSPs, and web UIs. Provides default implementations for DataFrame (HTML tables) and Record (JSON) types:
  • Tuple index bounds checking — Compile-time validation that tuple indices are within bounds. The compiler now checks that tuple.N has N < tuple_size and reports CompileError::TupleIndexOutOfBounds with helpful hints showing the valid index range. Runtime bounds checks in TupleGet instruction prevent crashes from dynamically constructed tuples. Error messages show both the attempted index and tuple size:
  • Function signature in type errorsCompileError::TypeMismatch now includes an optional function_signature field showing the full function type when argument types don't match. Error hints display the signature so users can see what types were expected:
  • Rich DataFrame display — DataFrames now render as formatted, column-aligned tables instead of opaque in both REPL output and IO.print. Shows shape, column names, dtypes, and data rows. Large DataFrames (>10 rows) show first 5 + last 5 with separator:
  • NativeObject::new_with_output() constructor — Creates native objects with a custom display callback (fn(&dyn Any) -> OutputValue), enabling type-specific rich output. The existing NativeObject::new() is unchanged (falls back to ).
  • Compiler::clear_type_errors() method — Clears accumulated type errors, useful for REPL environments where each evaluation should start fresh.
  • Result module (6 functions) — Elm-style functional composition for Result values (Ok/Err). All functions take the Result as the last argument for natural piping. Functions: map, mapError, andThen, withDefault, toMaybe, fromMaybe.
  • Maybe module (3 functions) — Elm-style functional composition for Maybe values (Just/Nothing). All functions take the Maybe as the last argument for natural piping. Functions: map, andThen, withDefault.
  • DataFrame module (36 functions) — Polars-backed tabular data analysis with a pipe-friendly API. All functions take the DataFrame as the last argument for natural piping. Backed by Polars for high-performance columnar operations.
  • I/O: readCsv, readJson, readParquet, writeCsv, writeJson — file errors throw structured VmError variants (not Keel-level Result)
  • Column ops: select, drop, rename, withColumn, column, columns, dtypes
  • Row ops: head, tail, slice, sort, sortDesc, unique, sample
  • Filters: filterEq, filterNeq, filterGt, filterGte, filterLt, filterLte, filterIn — named predicate filters that build Polars expressions natively
  • Aggregation: groupBy, agg (supports sum/mean/min/max/count/first/last/std/var/median), count, describe
  • Multi-DataFrame: join (inner), concat, pivot
  • Inspection: shape returns (rows, columns) tuple
  • Conversion: toRecords converts to Keel record lists, fromRecords creates DataFrames from Keel records
  • DataFrame error variants — 5 structured VmError variants for DataFrame errors, replacing generic UnsupportedOperation(String): DataFrameDisabled, DataFrameSandboxViolation { path, sandbox }, DataFrameFileError { path, reason }, DataFrameColumnNotFound { column }, DataFrameOperationError { operation, reason }. Each variant has ErrorHint implementations with actionable hints (e.g., "Use DataFrame.columns to list available columns") and notes about security configuration. I/O functions (readCsv, readJson, readParquet, writeCsv, writeJson) now throw DataFrameFileError directly instead of returning Keel-level Result values, simplifying usage:
  • NativeObject infrastructure — Opaque native values in the VM via RegisterValue::NativeObject wrapping Arc. Enables embedding Rust types (like Polars DataFrames) that are passed by reference, compared by pointer equality, and displayed with type-specific formatting. Updated all exhaustive match sites: GC traversal, output conversion, numeric coercions, and IO formatting.
  • Type::Custom("Record") compatibility — The type inference system now recognizes that Type::Custom("Record", []) (from type signatures like [Record] -> DataFrame) is compatible with any Type::Record(fields) (from record literals). This enables functions like DataFrame.fromRecords to accept Keel record literals without type errors.
  • Comprehensive DataFrame test suite (73 tests) — Integration tests covering all DataFrame operations: I/O (CSV/JSON read/write roundtrips), column operations (select, drop, rename, withColumn), row operations (head, tail, slice, sort, unique, sample), all 7 filter types, aggregation (groupBy with sum/mean/min/max/count), joins, concat, describe, toRecords/fromRecords roundtrips, complex multi-step pipelines, and pipe-friendly argument order verification. Error tests use VM::compile_checked() and assert against specific VmError variants (DataFrameFileError, DataFrameColumnNotFound). Plus 6 ErrorHint tests for all DataFrame error variants.
  • Type annotations required for generic parse resultsJson.parse (and similar functions returning unresolvable type variables like Result a String) now requires an explicit type annotation when assigning to a variable. Without annotation, the compiler produces a GenericTypeRequiresAnnotation error with a helpful hint. This ensures the compiler knows the expected shape for field access and pattern matching:
  • Type::contains_type_vars() method — Detects unresolved type variables in types, mirroring contains_unknown() but checking for Var instead
  • CompileError::GenericTypeRequiresAnnotation — New error variant with ErrorHint providing actionable suggestions and context
  • All existing DataFrame functions preserve metadata — All ~35 existing DataFrame functions updated to propagate metadata through the KeelDataFrame wrapper. Functions that previously worked with raw polars::DataFrame now extract and reattach metadata according to category-specific propagation rules (shape-preserving, column-selecting, aggregation, etc.).
  • DataFrame display shows dataset name — When a DataFrame has a "name" metadata key, it is displayed above the shape line in both plain text and HTML output (e.g., PISA 2022\nshape: (1000, 3)).
  • polars-parquet added as direct dependency — Required for low-level Parquet metadata write API (BatchedWriter + FileWriter::end() with KeyValue metadata).
  • Expanded Result and Maybe module documentationModuleDoc descriptions for Result and Maybe modules now include common patterns with code examples, comparison tables (Result vs Maybe), usage guidance, and categorized function listings.
  • Let binding type annotation resolution — Type annotations in let bindings are now resolved through resolve_type() (expanding type aliases) and unified with the inferred expression type via unify_types(), enabling type variable substitution from annotations
  • Http module (14 functions) — HTTP networking with a request-as-data pattern. Pure request-building functions (get, post, put, patch, delete, request) return Record values that can be composed via |>. Modifier functions (withHeader, withHeaders, withBody, withJsonBody, withTimeout, withQueryParam) return modified request records. Only Http.send performs the actual network call. Http.jsonBody parses response bodies into Keel values via the Json module. Security controls: KEEL_HTTP_DISABLED (default: disabled), KEEL_HTTP_ALLOWED_HOSTS (domain allowlist).
  • Json module (8 functions) — JSON encoding/decoding with full Keel type mapping. parse converts JSON strings to Keel values (objects→Records, arrays→Lists, etc.). Type-specific parsers (parseString, parseInt, parseFloat, parseBool) for safe extraction. encode and encodePretty convert Keel values to JSON strings. get extracts fields from JSON strings. All parsing functions return Result types for safe error handling.
  • Http error variantsHttpDisabled, HttpHostNotAllowed, HttpError in VmError with ErrorHint implementations
  • Comprehensive Http test suite (31 tests) — Request construction, field defaults, modifier functions, chaining, jsonBody parsing, security controls, and network tests (ignored by default)
  • Comprehensive Json test suite (62 tests) — Parsing all JSON types, type-specific parsers, encoding, pretty printing, field extraction, roundtrip tests, and pipe/let-binding usage

Fixed

24 items
  • List parser indentation errors — Misaligned commas in multiline lists now produce typed IndentationError with helpful hints instead of generic parse failures. The parser uses lookahead to distinguish misaligned commas (error) from legitimate dedents before closing brackets (valid multiline function calls inside lists).
  • DTA writer integer overflow — Value label keys and Int64/UInt32 cell values now use i32::try_from() instead of unchecked as i32 casts. Values outside i32 range fall back to f64 instead of silently truncating.
  • VM arity checking — Functions now return ArityMismatch error when called with more arguments than they accept, instead of silently ignoring excess arguments.
  • VM negative index on heap listsHeapListGet now checks for negative indices before casting to usize, returning a proper ListIndexOutOfBounds error instead of wrapping to a large positive index.
  • **KEEL_* environment variable protection** — IO.setEnv now rejects attempts to modify KEEL_* variables at runtime, preventing scripts from bypassing sandboxing and security controls.
  • Integer overflow in VM arithmetic — All arithmetic operations (+, -, *, **, %, negation) now use checked operations (checked_add, checked_sub, etc.). On overflow, results auto-promote to Float instead of silently wrapping.
  • Unchecked RegisterValue→usize conversion — Replaced the generic as_index() method with direct pattern matching at each call site. Negative integers, non-finite floats, and non-indexable types now return proper type-specific errors instead of silently producing garbage indices.
  • HTTP host validation bypass — Replaced hand-rolled extract_host() with RFC-compliant url::Url::parse(). Userinfo attacks like http://allowed.com@evil.com/ now correctly extract evil.com as the host.
  • Instruction cloning in VM hot loop — Replaced per-instruction .clone() with Arc::clone of the instruction slice (refcount bump only), eliminating allocation in the VM dispatch loop.
  • Unbounded allocation size — Added MAX_ALLOC_CAPACITY (10M elements) limit. AllocTuple, AllocList, and AllocRecord now check capacity before calling Vec::with_capacity, returning VmError::AllocationLimitExceeded on overflow.
  • IO.extension doc example — Fixed malformed syntax in the documentation example that caused doc example tests to fail.
  • VM time machine disabled by default — The time machine unconditionally cloned the entire VM state (registers, bytecode, call stack, heap, string interner) on every single instruction. This turned any non-trivial program into a memory bomb during fuzzing. Now guarded behind time_machine_enabled: bool (default false).
  • Parser nesting depth limit — Added check_nesting_depth() pre-parsing check (limit: 128 levels) to parse_file_with_state() and parse_file_lenient(). Prevents stack overflow and OOM on pathological inputs like deeply nested parentheses (((((((....
  • VM execution step limit — Added max_steps (default 10M) and step counter to the VM execution loop. Returns VmError::ExecutionLimitExceeded when exceeded, preventing infinite loops from running forever.
  • Empty input handlingcompile_checked() now returns Ok(vec![]) immediately for empty or whitespace-only input, instead of crashing.
  • DataFrame.fromRecords now has compile-time type inference — DataFrames created with DataFrame.fromRecords now have their schema inferred from the record literal at compile time, enabling full type safety for subsequent operations like column, select, and agg. Previously, fromRecords created untyped DataFrames that couldn't benefit from column-level type checking.
  • DataFrame operations support gradual typing — DataFrame operations now gracefully handle both typed and untyped DataFrames. When column schemas can't be determined at compile time (e.g., when using variables for aggregation specs), operations return untyped DataFrames that compile successfully but skip column-level validation. This enables flexible data workflows while preserving type safety where possible.
  • DateTime tests fixed with serial execution — All 61 DateTime tests now use #[serial] attribute to prevent environment variable pollution between tests. Previously, tests would fail when run in parallel due to the KEEL_DATETIME_DISABLED security test affecting other tests.
  • Unit type now parsed correctly in type signatures — The () unit type can now be used in type signatures and as a function argument. Previously, empty parentheses were only recognized as tuple syntax, not as the Unit type.
  • FunctionDoc examples for Result, Maybe, List, DataFrame, and IO modules now execute correctly — Fixed 15 stdlib FunctionDoc examples that previously failed to compile or execute. Issues addressed include: lambda syntax in List.foldl, foldr, zipWith (now uses |acc x| not |acc, x|); escape sequence handling in String.lines and unlines; enum constructor precedence in pipes for Result.map, mapError, andThen, withDefault and Maybe.map, andThen (now wrapped in parentheses like (Ok 5) |> ...); type inference issues in DataFrame.fromLists, getMeta, getColumnMeta (now inlined); unused let binding in IO.readLine; and newline escaping in IO.appendFile. Added comprehensive test suite (test_all_function_doc_examples) validating all 264 FunctionDoc examples across 10 stdlib modules. Test results: 243 passing (92%), 21 failing (all expected failures from security-disabled features and missing test files).
  • Enum constructors (Just, Nothing, Ok, Err) now parse correctly in pipe expressions — Fixed parser precedence so enum constructors stop at pipe operators, enabling natural composition like Just 5 |> Maybe.map f and Ok "test" |> Result.map String.toUpper without requiring parentheses around the constructor. The parser now uses simple_arg for constructor arguments, which accepts literals, lists, records, and function calls, but stops at binary operators (including |>). Added parse_expr_maybe_atom and parse_expr_result_atom variants that only accept literal atoms for use in contexts where full expression parsing would be ambiguous. Comprehensive test suite added with 150+ tests covering pipe precedence, nested constructs, and edge cases.
  • Large .dta file reading (>10GB) no longer fails with "Unable to read from file" — ReadStat's C unistd I/O layer is now fully bypassed by Rust I/O handlers (std::fs::File) registered via readstat_set_{open,close,seek,read}_handler. This eliminates C-level type mismatches (ssize_t vs size_t), int overflow in row handling for >2B rows, and O_LARGEFILE/_FILE_OFFSET_BITS non-issues on 64-bit Linux. The Rust read handler includes short-read retry and EINTR handling. Previous C-side patches (_FILE_OFFSET_BITS=64, O_LARGEFILE, EINTR retry in unistd_read_handler) are retained as defense-in-depth but are no longer the primary I/O path.
  • Result parameter order: Result ok err — The internal Type::Result(first, second) representation now consistently means first=ok, second=err, matching the syntax Result OkType ErrType. Previously, the extraction and creation code treated the first box as err and the second as ok, which was backwards from the Display impl, stdlib signatures, and documentation. This affected type inference for Ok/Err constructors and pattern matching on Result values.
  • Type alias parsing in compound types — Type aliases like Person can now be used directly as arguments to Result, Maybe, and List without parentheses. Previously, Result Person String would fail because the parser greedily consumed String as a type argument of Person. Now Result Person String, Maybe Person, and List Person all parse correctly. Parenthesized forms like Result (Person) String continue to work for backward compatibility.

Removed

7 items
  • Binary and REPL — Removed src/bin/keel.rs and src/repl.rs. The binary and REPL have been moved to the keel-cli and keel-repl crates respectively. keel-core is now a pure library crate. Removed clap dependency.
  • Time machine — Removed the entire src/time_machine/ module (8 files: mod.rs, timeline.rs, navigation.rs, mapping.rs, integration.rs, compiler_snapshot.rs, lexer_snapshot.rs, parser_snapshot.rs, vm_snapshot.rs). Removed the time-machine feature flag from Cargo.toml.
  • Fuzz regression tests — Removed tests/infrastructure/fuzz_regression.rs (obsolete test infrastructure).
  • CODE_REVIEW.md — All 18 findings have been resolved across previous commits; tracking document removed.
  • Scope uses persistent data structure — Switched Scope internals from std::collections::HashMap to im::HashMap. enter_scope now does an O(1) clone instead of deep-copying all variable/function/type maps.
  • Register vector pooling on function calls — Added register_pool to the VM. On function call, a pooled vector is reused via clone_from instead of allocating a new one. On return, the callee's vector is returned to the pool.
  • String interner uses Arc — Replaced double String allocation (one for HashMap key, one for Vec value) with a single Arc shared between both collections, halving per-string allocation cost.

Showing page 1 of 3 (3 versions)