Changelog
All notable changes to Keel are documented here. This project follows Keep a Changelog and Semantic Versioning.
Unreleased
Added
132 items- Distribution module — New stdlib module for probability distributions with 12 distribution types (Normal, Uniform, Exponential, Poisson, Bernoulli, Binomial, Gamma, Beta, ChiSquared, StudentT, Weibull, LogNormal). Functions include
sample,sampleSeeded,sampleN,pdf,cdf,quantile,mean,variance,stdDev,skewness,entropy. Constructor functions returnResult Distribution Stringfor parameter validation. Backed bystatrsandrandcrates. - Unified
KeelErrortype — Newsrc/errors.rsmodule with aKeelErrorenum that wraps all phase-specific errors (lexer, parser, type checker, compiler, VM) into a single error type. Includes sharedSymbolErrorvariants for undeclared variables, modules, functions, enums, and enum variants, used by both parser and compiler. - Structured parser error modules — Extracted parser errors into
src/parser/errors/with dedicated files:parser_error.rs(syntax errors),scope_error.rs(scope/symbol errors),type_error.rs(type checking errors),typed.rs(unifiedTypedParserErrorenum). StdlibFunctionabstraction — New unified struct instdlib/mod.rsthat defines each stdlib function once with name, arity, implementation, type signature, and documentation. Helper functionsregister_from_definitions(),docs_from_definitions(), andnames_from_definitions()derive both module registration and documentation from the same source, eliminating drift.LetBindingLiteralPatternerror — New typed parser error for literal patterns in let bindings.let 5 = 5,let "hello" = "hello", and similar literal patterns now produce a helpful error directing users to usecaseexpressions instead.- Pre-parse syntax validation — New
check_module_syntax(),check_type_alias_syntax(), andcheck_enum_syntax()functions detect common syntax errors before full parsing, with typed error variants:ModuleMissingExposing,ModuleMissingExposingParens,ModuleUnclosedExposing,TypeAliasMissingEquals,TypeAliasMissingName,EnumMissingEquals,EnumMissingName,EnumMissingVariants,EnumVariantLowercase. - Test coverage for untested error variants — Added tests for
RecordAccessMissingFieldName,RecordAccessDoubleDot,TrailingComma, andBlockNotNestederror variants. - Decimal type — New primitive type for exact decimal arithmetic, avoiding floating-point precision issues. Supports literal syntax with
dsuffix (42d,3.14d,-0.001d). Full arithmetic (+,-,*,/,%,^), comparison (==,!=,<,<=,>,>=), and negation. Backed byrust_decimalcrate with 28-digit precision. - Decimal module — 40+ stdlib functions for decimal operations:
- Creation:
fromInt,fromFloat,fromString,parse - Conversion:
toInt,toFloat,toString,toStringWithPrecision - Arithmetic:
add,sub,mul,div,rem,pow,abs,negate - Rounding:
round,roundTo,floor,ceil,trunc,truncTo - Comparison:
compare,min,max,isPositive,isNegative,isZero - Constants:
zero,one,pi,e,maxValue,minValue - Math:
sqrt,ln,log10,exp,sin,cos,tan - Date module — 25 stdlib functions for date manipulation:
- Creation:
fromYmd,today,epoch - Parsing:
parseIso,parse - Formatting:
toIsoString,format - Components:
year,month,day,weekday,dayOfYear,weekNumber,daysInMonth,isLeapYear - Arithmetic:
addDays,addWeeks,addMonths,addYears - Comparison:
isBefore,isAfter,isEqual,compare,daysBetween - Time module — 23 stdlib functions for time manipulation:
- Creation:
fromHms,fromHmsNano,midnight,noon - Parsing:
parseIso,parse - Formatting:
toIsoString,format - Components:
hour,minute,second,nanosecond - Arithmetic:
addHours,addMinutes,addSeconds,addNanos - Comparison:
isBefore,isAfter,isEqual,compare - Duration module — 24 stdlib functions for duration manipulation:
- Creation:
fromNanos,fromMicros,fromMillis,fromSecs,fromMins,fromHours,fromDays,fromWeeks,zero - Components:
nanos,micros,millis,secs,mins,hours,days,weeks - Arithmetic:
add,sub,mul,div,abs,negate - Comparison:
isPositive,isNegative,isZero,compare - DateTime interop functions — 4 new functions for converting between Date, Time, and DateTime:
DateTime.getDate : DateTime -> Date— extract date componentDateTime.getTime : DateTime -> Time— extract time componentDateTime.fromDateType : Date -> DateTime— convert Date to DateTime at midnight UTCDateTime.combine : Date -> Time -> DateTime— combine Date and Time into DateTime- DataFrame temporal type conversion — Runtime extraction of temporal types from Polars DataFrames.
AnyValue::Date,AnyValue::Time,AnyValue::Duration, andAnyValue::Datetimenow convert to native Keel Date, Time, Duration, and DateTime objects wrapped inMaybe. - FunctionDoc examples for temporal modules — All functions in Date (25), Time (23), Duration (24), and DateTime (57) modules now include working code examples in their FunctionDoc.
- Parser error variants for inline files — New typed errors for
inlineexpression parsing:InlineMissingPath(missing file path),InlineInvalidPath(non-string path),InlinePassingMissingParens(missing parentheses),InlineInvalidVar(invalid token in var list),InlineSpreadWithNamed(mixing..with named vars),InlineMultipleSpread(multiple..operators). Each error includes helpful hints and notes. - Parser error variants for lambda expressions —
LambdaUnclosedPipe(missing closing|) andLambdaMissingBody(no body after|params|) with recovery and hints. - Parser error variants for parameterized modules —
ModuleParamMissingTypeandModuleExposeMissingTypefor parameters and exposed variables missing type annotations inmodule (x) exposing (y)syntax. - Comprehensive parser test coverage — All parser error tests now check against specific
TypedParserErrorvariants viastate.typed_errorsinstead of genericast.is_err()checks. Tests verify exact error types likeTypedParserError::Parser(ParserError::LambdaUnclosedPipe). - DataFrame.Expr module — Composable, type-safe column expressions that compile directly to Polars operations with SIMD optimization and parallel execution. Unlike closures (which may fall back to slower row-by-row evaluation), expressions are always fast. The module provides:
- Column references and literals:
col "name",lit 42,lit 3.14,lit "hello" - Arithmetic:
add,sub,mul,div,mod,pow - Comparison:
eq,neq,gt,gte,lt,lte - Boolean:
and,or,not - Aggregations:
sum,mean,min,max,count,first,last,nUnique,std,var,median,quantile - String operations:
strLength,strUpper,strLower,strContains,strStartsWith,strEndsWith,strReplace,strTrim,strSlice - Math:
abs,sqrt,floor,ceil,round,log,log10,exp - Null handling:
fillNull,isNull,isNotNull,dropNulls - Conditional:
condfor if-then-else expressions - Window functions:
over,rowNumber,rank,denseRank,lag,lead - Naming:
namedto alias output columns - Compile-time constant evaluation framework — New
const_eval.rsmodule provides comprehensive constant folding during compilation. Evaluates arithmetic (+,-,*,/,//,%,^), comparison (==,!=,<,<=,>,>=), boolean (&&,||,not), string concatenation (++), list cons (::), and if-then-else expressions with constant conditions at compile time. Includes a stdlib function registry for const-evaluatingMath.abs,String.length,List.length,List.isEmpty,List.reverse, andList.sumwith constant arguments. Lambda-safe: parameters correctly shadow outer variables to prevent incorrect folding inside function bodies. Refactoredtry_eval_const_string()to use the new unified framework. Comprehensive test suite with 57 tests covering arithmetic, strings, booleans, edge cases, and lambda scoping. has_explicit_typefield inBindingAST node — TheBindingstruct now includes ahas_explicit_type: boolfield that tracks whether the user explicitly wrote a type annotation in the source code. This allows formatters and other tools to distinguish user-written annotations (let x : Int = 42) from parser-inferred types, enabling preservation of explicit annotations while omitting inferred ones.- DataFrame.describeLabel — Returns a formatted string describing value labels for a single column. Takes column name and DataFrame, returning a multi-line string showing each value-label mapping. Returns empty string if the column has no value labels.
- DataFrame.describeLabels — Returns a formatted string describing all value labels in a DataFrame, sorted by column name. Each column's value labels are shown with their integer codes and string labels.
- DataFrame.describeVariables — STATA-style variable overview returning a DataFrame with one row per column, showing name, type, variable label, value labels (abbreviated), and metadata. Useful for exploring dataset structure.
- Column-selective DataFrame I/O —
readCsvColumns,readJsonColumns,readParquetColumns,readDtaColumnsaccept a[String]column list and a file path, reading only the specified columns. Compile-time schema validation catches nonexistent columns. 8 tests covering shape, column selection, compile-time validation, pipe chains, and JSON/Parquet roundtrips. - Parser case-sensitivity diagnostics — New parser errors with hints for lowercase module names (
ModuleNameLowercase), type names (TypeNameLowercase), import aliases (ImportAliasLowercase), and uppercase pattern aliases (PatternAliasUppercase). Each error suggests the corrected casing. - Curried readXxxColumns type inference — Compiler and type-inference engine now handle curried 2-arg
readXxxColumnscalls, caching column-filtered DataFrame schemas for downstream type checking. - Table module — New
Tablestdlib module for cross-tabulation and summary tables, inspired by Stata'stablecommand. Quick forms (Table.freq "sex",Table.cross "sex" "bp") and a builder pattern (Table.create |> Table.rows ["sex"] |> Table.cols ["bp"] |> Table.count |> Table.show). Supports 8 statistics (count,percent,meanOf,sdOf,medianOf,minOf,maxOf,sumOf), faceting (facetBy), totals suppression (noTotals), layout rearrangement without recomputation (relayout), and DataFrame export (toDataFrame).freq/crossaccept bothStringand[String]for flexible dimensioning. DedicatedTabledisplay type with hierarchical headers, box-drawing separators, value label integration, and comma-formatted numbers. 70 tests covering core functionality, edge cases, error messages, display rendering, and VM integration. - FunctionDoc categories for all stdlib modules — Every
FunctionDocacross all stdlib modules now has acategoryfield. Categories group related functions within a module (e.g., List: Access/Build/Reduce/Search/Slice/Sort/Transform; Json: Parse/Encode/Access; Http: Request/Modify/Execute). Categories render in LSP hover documentation viato_markdown(). - CODE_REVIEW.md — Comprehensive code review document tracking 18 findings across security, performance, and architecture dimensions with fix status.
- FFI safety documentation — Added
// SAFETY:comments to all unsafe blocks andunsafe extern "C"callbacks indataframe_dta.rs, documenting pointer provenance, exclusive access invariants, buffer validity, and ReadStat's ownership model. - DTA fuzz target — New
fuzz_dtafuzz target that feeds arbitrary bytes toread_dta_file(with and without row limits), plus seed corpus of valid .dta files covering mixed types, nullable columns, value labels, and metadata. VmError::AllocationLimitExceeded— New error variant with hint for when data structure allocation exceeds the 10M element safety limit.- ValueLabelSet module — Bidirectional
Int ↔ Stringmapping for statistical value labels. Functions:empty,fromList,insert,remove,getLabel,getValue,values,labels,toList,size,isEmpty,merge,remap. Integrates with DataFrame value label system. - DataFrame variable labels — STATA-style descriptive labels for columns. Functions:
withVarLabel,getVarLabel,getVarLabels,removeVarLabel. Labels are preserved through filter/select/join operations and round-trip through .dta files. - DataFrame value labels — Map integer codes to human-readable labels (e.g., 1→"Male", 2→"Female"). Functions:
withValueLabels,withValueLabelsStrict(validates all values have labels),getValueLabels,getAllValueLabels,removeValueLabels. Labels propagate through operations and survive .dta I/O. - DataFrame display modes — Control how labeled columns display:
Raw(show codes),Labeled(show labels),Both(show "label (code)"). Function:withDisplayMode,getDisplayMode. - DataFrame.recode — Remap integer values in a column with automatic value label transfer. E.g., recode
[(1, 10), (2, 20)]changes 1→10, 2→20 and updates associated value labels. - STATA .dta label support — Full round-trip support for variable labels and value labels in
readDta/writeDta. Labels stored inDataFrameMetadata.var_labelsandDataFrameMetadata.value_labels(usingValueLabelSet). - Fuzz regression test suite — 64 tests in
tests/infrastructure/fuzz_regression.rscovering crash artifacts, OOM artifacts, timeout artifacts, determinism checks, and normal program regression. 5 tests are#[ignore]d documenting known stack overflow bugs with mixed nesting patterns. VmError::ExecutionLimitExceeded— New error variant for when the VM exceeds its step limit.- Elm-style multi-line literals as function arguments — Lists, records, and tuples can now be formatted with leading commas on separate lines when used as function arguments, matching Elm's indentation-sensitive syntax. This enables clean, readable data definitions without needing intermediate
letbindings. - DataFrame window functions — SQL-style window functions for advanced analytics: ranking, running totals, moving averages, and lag/lead operations. Window functions preserve row count (unlike aggregations) and partition data for grouped calculations.
partitionBy [cols]— Define partition boundaries (can be nested for different groupings)orderBy [cols]— Define row ordering within partitionscollect— Materialize WindowedDataFrame back to DataFramewithRowNumber "col"— Sequential numbering (1, 2, 3, ...) per partitionwithRank "col"— Ranking with gaps for ties (1, 2, 2, 4, ...)withDenseRank "col"— Ranking without gaps (1, 2, 2, 3, ...)withLag "result" "source" offset— Value from N rows before (returnsMaybe T)withLead "result" "source" offset— Value from N rows ahead (returnsMaybe T)withRollingSum "result" "source" N— Sum over N rows (returnsMaybe T)withRollingMean "result" "source" N— Average over N rows (returnsMaybe Float)withRollingMin "result" "source" N— Minimum over N rows (returnsMaybe T)withRollingMax "result" "source" N— Maximum over N rows (returnsMaybe T)withCumSum "result" "source"— Cumulative sum from partition startwithCumMean "result" "source"— Cumulative average (returnsFloat)withCumMin "result" "source"— Cumulative minimumwithCumMax "result" "source"— Cumulative maximum- Compile-time schema tracking through window operations
- Column name validation for partition/order columns
- Type propagation:
withRowNumberaddsInt,withLagaddsMaybe T, etc. - Gradual typing support for untyped DataFrames
- Backed by Polars window functions (
rank,cum_agg,rolling_windowfeatures) - WindowedDataFrame type tracks partition/order metadata
- Function overloading:
partitionByworks on both DataFrame and WindowedDataFrame - Comprehensive test suite: 45 tests covering normal operations, edge cases, and error conditions
- DataFrame schema validation with compile-time type checking — DataFrames can now have their schemas validated at compile time using type annotations. This enables "data contracts" where you declare the expected schema and get compile errors if the actual data doesn't match.
DataFrame { col: Type, ... }— DataFrame type constructor with schema{ col: Type, .. }— Open record type (allows extra fields){ col: Type }— Closed record type (exact match required)- Column existence — Missing columns produce compile errors
- Type compatibility — Column types must match declarations
- Extra columns — Closed schemas reject extras, open schemas allow them
- Variable paths skip validation (can't validate non-literals)
- Untyped DataFrames continue to work without annotations
- Schemas propagate through operations (
select,drop, etc.) - Import aliases — Modules can now be imported with alternative names using the
askeyword. Aliases provide convenient shorthand for module references throughout your code. - DateTime standard library module (48 functions) — UTC-based date and time operations backed by the
chronocrate. All DateTime values are opaqueNativeObjectinstances wrappingchrono::DateTime. Functions follow the pipe-last convention for natural composition. - Creation (7 functions):
now,fromParts,fromDate,fromTimestamp,fromTimestampMillis,toTimestamp,toTimestampMillis - Parsing (4 functions):
parse,parseIso8601,parseRfc3339,parseFormat— returnMaybe DateTimefor safe handling - Formatting (3 functions):
format,formatRfc3339,formatCustom— support ISO8601, RFC3339, and custom strftime patterns - Components (9 functions):
year,month,day,hour,minute,second,weekday,dayOfYear,weekNumber— extract datetime parts as integers - Manipulation (8 functions):
addMillis,addSeconds,addMinutes,addHours,addDays,addWeeks,addMonths,addYears— immutable time arithmetic - Comparison (4 functions):
isBefore,isAfter,isEqual,compare— total ordering for DateTime values - Duration/Difference (5 functions):
diffMillis,diffSeconds,diffMinutes,diffHours,diffDays— calculate time spans as Int milliseconds - Calendar Boundaries (8 functions):
startOfDay,endOfDay,startOfWeek,endOfWeek,startOfMonth,endOfMonth,startOfYear,endOfYear— calendar-aware rounding DataFrame.fromListsfunction — Create DataFrames from a list of(column name, values)tuples. This provides a column-oriented alternative tofromRecordsthat is more ergonomic for programmatic data construction and composes naturally withList.zip:- Compile-time DataFrame column validation — When
DataFrame.readCsv,readJson,readParquet, orreadDtais called with a literal string path, the compiler reads the file's schema (column names and types) at compile time and validates column references in subsequent operations. Invalid column names produce aDataFrameColumnNotFoundcompile error with a list of available columns. Type information propagates through pipe chains (select,drop,rename,withColumn,column, sort/filter operations). Gradual typing: untyped DataFrames (e.g., from variables or non-literal paths) skip validation. - Maybe-wrapped DataFrame columns — DataFrame column values are now properly wrapped in
Maybetypes (Just xfor values,Nothingfor nulls) when crossing from Polars to Keel. This makes null/missing data explicit and pattern-matchable instead of silently coercing nulls to defaults (0, "", false). Applies tocolumn,toRecords, and all column extraction paths. KeelSchemaruntime type system for DataFrames — Every DataFrame now carries aKeelSchemathat maps each column to its Keel type (alwaysMaybe Tsince Polars columns are nullable). Schema is auto-derived from the Polars DataFrame. Column types are displayed as Keel types (e.g.,Maybe Int,Maybe String,Maybe Float) instead of Polars dtypes in DataFrame output anddtypesresults. Type mapping: Polars Int8/16/32/64 and UInt8/16/32/64 →Maybe Int, Float32/64 →Maybe Float, Boolean →Maybe Bool, String →Maybe String, other →Maybe String(fallback).- Maybe-aware
withColumnandfromRecords—withColumnandfromRecordsnow accept both Maybe-wrapped lists ([Just 1, Nothing, Just 3]) and bare value lists ([1, 2, 3]) for backward compatibility. Maybe-wrapped values are unwrapped when creating Polars columns:Just x→ value,Nothing→ null. unwrap_maybehelper — Internal helper function for detecting and unwrapping Maybe enum values in the DataFrame module, following the same pattern asunwrap_enum_valuein the Maybe module.
Changed
71 items- All stdlib modules refactored to
StdlibFunctionpattern — Every stdlib module (List, String, Math, Decimal, Date, Time, Duration, DateTime, IO, Http, Json, DataFrame, Table, ValueLabelSet, Maybe, Result) now defines functions asVecinstead of manually building export HashMaps. Reduces boilerplate and ensures documentation is always in sync with implementation. - Typed errors passed directly to Chumsky — All parser error emissions now pass
ParserErrorvariants directly toRich::custom()instead of calling.to_string(), improving type safety and consistency across 11 locations in tuples, functions, modules, patterns, type aliases, enums, imports, and let bindings. - Test suite reorganization — Restructured test directories for clarity:
tests/errors/→tests/compiler/(compiler error tests),tests/features/→tests/integration/(language feature integration tests),tests/infrastructure/→tests/runtime/(VM and runtime tests). Added new test files for lambda type inference, nested pattern types, type aliases, exhaustiveness, guards, pattern type mismatch, and more. - Parser test improvements — Extended test coverage across 20+ parser test files with strong assertions against
TypedParserErrorvariants, consistent helper functions, and comprehensive error scenario coverage. - Unified parser test suite — Standardized all 48 parser test files (
tests/parser/*.rs) with consistent naming (droppedtest_prefixes), structured///doc comments explaining why each test should pass or fail, removed debugprintln!statements, and organized sections (helpers → success tests → comments → errors → edge cases). Added pass/fail reasoning with bullet points to every test across all files includingedge_cases.rs,error_hints.rs,expr_list_access.rs,trailing_tokens.rs,expr_types.rs,expr_inline_file.rs,lambda_case_scope.rs, andtypes.rs. - Parser emits typed errors to state — All parsers now push errors to
state.typed_errorsin addition to emitting chumsky errors, enabling tests and tooling to match on specific error variants. - Parameterized module exposing requires types —
parse_exposing_args_typed()enforces type annotations on all exposed variables in parameterized modules. - DataFrame module refactored into submodules — Split the 7,780-line
dataframe.rsinto a directory-based module structure:security.rs(sandbox config),metadata.rs(value labels, display modes),lineage.rs(transformation tracking),types.rs(KeelDataFrame, KeelSchema). Reduces main module to 6,680 lines with cleaner separation of concerns. - DataFrame label functions category — Renamed FunctionDoc category from "Labels" to "Label" (singular) for consistency with other categories (Window, Transform, Metadata, etc.).
- VM fields restricted to
pub(crate)— All 16 VM struct fields (registers,heap,call_stack, etc.) changed frompubtopub(crate). External consumers useVM::compile()and accessor methods; no API change. GC tests moved from integration tests to unit tests insidevm_core.rs. Typemodule moved to crate root —src/compiler/types.rs→src/types.rsas a top-level shared module. Breaks theparser → compilerdependency. Re-exported fromcompiler::typesfor backward compatibility. Addedtypes_compatible()function for overload resolution.- Compiler struct refactored — Extracted
FileContext(run stack, current dir, inline file types) andTypeChecker(inference + errors) into dedicated structs. Reduces field count onKeelCompilerand groups related state. - Parser scope uses flat arena —
ScopeStatenow stores scopes in a flatVecindexed byScopeIdwith parent links, replacing nested ownership. Deduplicated scope-walking logic intocollect_names_matching()helper. - Dual-scope architecture documented — Both
compiler::scopeandparser::scopenow have module-level docs explaining why two separate scope systems exist and their different storage models. - Typed parser error tracking — Parser now pushes
TypedParserErrorentries tostate.typed_errorsalongside existingRicherror emissions. This enables structured, pattern-matchable error inspection without string comparison. All parser error sites (indentation, blocks, enums, modules, type aliases, functions, lambdas, lists, records, tuples, patterns, let bindings) now emit typed errors. - Stdlib error handling improvements — Replaced panics and
unwrap()calls with proper error propagation indataframe.rs(kdf_to_output,describe),datetime.rs(datetime_to_output, date parsing),string.rs(pad functions), andenum_helpers.rs(useVmErrorinstead of string errors). - Parser tests use typed error assertions — Migrated parser tests (
decl_modules,enum_scope_errors,parse_types) from string-matching againstRicherror messages to typed pattern matching viastate.typed_errors, making tests more robust and less brittle. - All stdlib FunctionDoc examples are now self-contained and tested — Every example string across all 10 stdlib modules (List, String, Math, Result, Maybe, Json, IO, Http, DataFrame, DateTime) is now a complete, runnable Keel program with proper
importstatements. Added 47 missing DateTime examples. A new test (test_all_doc_examples_compile) validates all 234 examples compile and run viaVM::compile_checked. Fixed issues with multi-line example parsing, constructor precedence in pipes (Ok/Err/Justnow wrapped in parens),Math.pi/Math.eauto-evaluation, and DataFrame column metadata examples. - Test suite migrated to strict error handling with
VM::compile_checked— All 920+ test cases now useVM::compile_checked()instead ofVM::compile()for explicit error handling. This internal change enforces zero-tolerance for silent test failures and provides better error messages when tests fail. TheVM::compile()method silently swallowed errors and returned empty vectors, whileVM::compile_checked()returnsResultfor proper error propagation. No user-facing API changes., KeelError> - DataFrame lineage test suite rewritten with comprehensive assertions — Expanded from 15 shallow tests to 60 tests with deep structural verification. Tests now validate exact record structures from
DataFrame.lineageandDataFrame.columnLineage, verify origin types (File, Aggregated, JoinedFrom), check transformation histories, confirm dependency tracking, and test error messages with available column suggestions. - Internal refactoring: centralized stdlib helper functions — Eliminated ~200 lines of duplicate code across stdlib modules by creating two centralized utility modules:
enum_helpers(for Maybe/Result enum constructors) andtype_extractors(for RegisterValue type extraction). This refactoring consolidates 15 duplicates of enum constructor functions (make_just,make_nothing,make_ok,make_err) and 12+ duplicates of type extractors (get_int,get_string,get_float) that were previously scattered across 10 stdlib modules. All stdlib modules now import from the centralized helpers. No user-facing changes — this is purely internal code organization that improves maintainability and ensures consistent behavior across the stdlib. DateTime.nownow requires Unit argument — Changed fromDateTime.now(zero-arity) toDateTime.now ()to follow Keel's function calling conventions. Prevents parser ambiguity with property access.DateTime.parseFormatnow supports date-only formats — TheparseFormatfunction now accepts both datetime formats (with time) and date-only formats (e.g.,"%Y-%m-%d"). Date-only formats assume midnight (00:00:00) UTC. Previously only datetime formats were supported.DataFrame.columnreturns[Maybe a]instead of[a]— Thecolumnfunction type signature changed fromString -> DataFrame -> [a]toString -> DataFrame -> [Maybe a]. Values that were previously bare (e.g.,Int(30)) are nowJust(Int(30)), and nulls that were silently coerced to defaults are nowNothing. Code that extracts columns needs to handle Maybe values.DataFrame.dtypesreturns Keel types instead of Polars dtypes — Thedtypesfunction now returns Keel type names (e.g.,"Maybe Int","Maybe String") instead of Polars dtype names (e.g.,"i64","str"). DataFrame display output also shows Keel types.DataFrame.toRecordsreturns Maybe-wrapped field values — Record fields fromtoRecordsare now Maybe-wrapped. A field that wasage: 30is nowage: Just 30, and null fields areNothinginstead ofUnit.- Compile-time security checks for disabled module functions — When a security-gated module is disabled, calling its restricted functions now produces a compile-time error instead of a runtime error. The compiler checks the environment variable at compilation time and rejects calls to disabled functions before any code runs. Applies to IO (18 gated functions), Http (
send), and DataFrame (6-8 I/O functions). Error messages include actionable hints showing which environment variable to set: - STATA .dta file support (behind
statafeature flag) — Read and write STATA.dtafiles with full metadata support via ReadStat C FFI. Vendored ReadStat as a git submodule, compiled viabuild.rswith thecccrate. Two new DataFrame functions:readDtaandwriteDta. - DataFrame metadata system — Attach dataset-level and column-level metadata to DataFrames via a
KeelDataFramewrapper around Polars DataFrames. Metadata is preserved through operations, persisted to Parquet files, and displayed in output. 8 new functions in the DataFrame module: - Dataset-level:
setMeta,getMeta,allMeta— set/get/list key-value metadata on a DataFrame - Column-level:
setColumnMeta,getColumnMeta,allColumnMeta— set/get/list metadata on individual columns - Inspection:
describeMeta— returns a DataFrame summarizing all metadata - I/O:
writeParquet— writes DataFrames to Parquet files with metadata persisted via the"keel:metadata"key-value entry in the Parquet file metadata MetaValuetype for metadata storage — Self-contained value type (String,Int,Float,Bool,List,Record) that owns all its data without GC concerns. DerivesSerialize/Deserializefor JSON roundtripping to Parquet file metadata.KeelDataFramewrapper struct — Wrapspolars::DataFramewithDataFrameMetadata(dataset-levelHashMap+ per-columnHashMap). Convenience methods:> filter_columns,rename_column,merge,dataset_only,is_empty.- DataFrame metadata test suite (24 new tests) — Covers set/get/all for dataset and column metadata,
describeMeta, metadata propagation through head/filter/sort/select/rename/aggregation, pipe chains with metadata, Parquet roundtrip with and without metadata, and dataset name in display output. readParquetmetadata restoration —readParquetnow reads the"keel:metadata"key from Parquet file metadata and restores it into theKeelDataFrame, enabling metadata roundtripping through Parquet files.- MimeFormatter trait for rich frontend output — New
MimeFormattertrait invm::values::mime_formatenables rich MIME type output for Jupyter kernels, LSPs, and web UIs. Provides default implementations for DataFrame (HTML tables) and Record (JSON) types: - Tuple index bounds checking — Compile-time validation that tuple indices are within bounds. The compiler now checks that
tuple.NhasN < tuple_sizeand reportsCompileError::TupleIndexOutOfBoundswith helpful hints showing the valid index range. Runtime bounds checks inTupleGetinstruction prevent crashes from dynamically constructed tuples. Error messages show both the attempted index and tuple size: - Function signature in type errors —
CompileError::TypeMismatchnow includes an optionalfunction_signaturefield showing the full function type when argument types don't match. Error hints display the signature so users can see what types were expected: - Rich DataFrame display — DataFrames now render as formatted, column-aligned tables instead of opaque
in both REPL output andIO.print. Shows shape, column names, dtypes, and data rows. Large DataFrames (>10 rows) show first 5 + last 5 with…separator: NativeObject::new_with_output()constructor — Creates native objects with a custom display callback (fn(&dyn Any) -> OutputValue), enabling type-specific rich output. The existingNativeObject::new()is unchanged (falls back to).Compiler::clear_type_errors()method — Clears accumulated type errors, useful for REPL environments where each evaluation should start fresh.- Result module (6 functions) — Elm-style functional composition for
Resultvalues (Ok/Err). All functions take the Result as the last argument for natural piping. Functions:map,mapError,andThen,withDefault,toMaybe,fromMaybe. - Maybe module (3 functions) — Elm-style functional composition for
Maybevalues (Just/Nothing). All functions take the Maybe as the last argument for natural piping. Functions:map,andThen,withDefault. - DataFrame module (36 functions) — Polars-backed tabular data analysis with a pipe-friendly API. All functions take the DataFrame as the last argument for natural piping. Backed by Polars for high-performance columnar operations.
- I/O:
readCsv,readJson,readParquet,writeCsv,writeJson— file errors throw structuredVmErrorvariants (not Keel-levelResult) - Column ops:
select,drop,rename,withColumn,column,columns,dtypes - Row ops:
head,tail,slice,sort,sortDesc,unique,sample - Filters:
filterEq,filterNeq,filterGt,filterGte,filterLt,filterLte,filterIn— named predicate filters that build Polars expressions natively - Aggregation:
groupBy,agg(supports sum/mean/min/max/count/first/last/std/var/median),count,describe - Multi-DataFrame:
join(inner),concat,pivot - Inspection:
shapereturns(rows, columns)tuple - Conversion:
toRecordsconverts to Keel record lists,fromRecordscreates DataFrames from Keel records - DataFrame error variants — 5 structured
VmErrorvariants for DataFrame errors, replacing genericUnsupportedOperation(String):DataFrameDisabled,DataFrameSandboxViolation { path, sandbox },DataFrameFileError { path, reason },DataFrameColumnNotFound { column },DataFrameOperationError { operation, reason }. Each variant hasErrorHintimplementations with actionable hints (e.g., "Use DataFrame.columns to list available columns") and notes about security configuration. I/O functions (readCsv,readJson,readParquet,writeCsv,writeJson) now throwDataFrameFileErrordirectly instead of returning Keel-levelResultvalues, simplifying usage: - NativeObject infrastructure — Opaque native values in the VM via
RegisterValue::NativeObjectwrappingArc. Enables embedding Rust types (like Polars DataFrames) that are passed by reference, compared by pointer equality, and displayed with type-specific formatting. Updated all exhaustive match sites: GC traversal, output conversion, numeric coercions, and IO formatting. Type::Custom("Record")compatibility — The type inference system now recognizes thatType::Custom("Record", [])(from type signatures like[Record] -> DataFrame) is compatible with anyType::Record(fields)(from record literals). This enables functions likeDataFrame.fromRecordsto accept Keel record literals without type errors.- Comprehensive DataFrame test suite (73 tests) — Integration tests covering all DataFrame operations: I/O (CSV/JSON read/write roundtrips), column operations (select, drop, rename, withColumn), row operations (head, tail, slice, sort, unique, sample), all 7 filter types, aggregation (groupBy with sum/mean/min/max/count), joins, concat, describe, toRecords/fromRecords roundtrips, complex multi-step pipelines, and pipe-friendly argument order verification. Error tests use
VM::compile_checked()and assert against specificVmErrorvariants (DataFrameFileError,DataFrameColumnNotFound). Plus 6ErrorHinttests for all DataFrame error variants. - Type annotations required for generic parse results —
Json.parse(and similar functions returning unresolvable type variables likeResult a String) now requires an explicit type annotation when assigning to a variable. Without annotation, the compiler produces aGenericTypeRequiresAnnotationerror with a helpful hint. This ensures the compiler knows the expected shape for field access and pattern matching: Type::contains_type_vars()method — Detects unresolved type variables in types, mirroringcontains_unknown()but checking forVarinsteadCompileError::GenericTypeRequiresAnnotation— New error variant withErrorHintproviding actionable suggestions and context- All existing DataFrame functions preserve metadata — All ~35 existing DataFrame functions updated to propagate metadata through the
KeelDataFramewrapper. Functions that previously worked with rawpolars::DataFramenow extract and reattach metadata according to category-specific propagation rules (shape-preserving, column-selecting, aggregation, etc.). - DataFrame display shows dataset name — When a DataFrame has a
"name"metadata key, it is displayed above the shape line in both plain text and HTML output (e.g.,PISA 2022\nshape: (1000, 3)). polars-parquetadded as direct dependency — Required for low-level Parquet metadata write API (BatchedWriter+FileWriter::end()withKeyValuemetadata).- Expanded Result and Maybe module documentation —
ModuleDocdescriptions for Result and Maybe modules now include common patterns with code examples, comparison tables (Result vs Maybe), usage guidance, and categorized function listings. - Let binding type annotation resolution — Type annotations in let bindings are now resolved through
resolve_type()(expanding type aliases) and unified with the inferred expression type viaunify_types(), enabling type variable substitution from annotations - Http module (14 functions) — HTTP networking with a request-as-data pattern. Pure request-building functions (
get,post,put,patch,delete,request) return Record values that can be composed via|>. Modifier functions (withHeader,withHeaders,withBody,withJsonBody,withTimeout,withQueryParam) return modified request records. OnlyHttp.sendperforms the actual network call.Http.jsonBodyparses response bodies into Keel values via the Json module. Security controls:KEEL_HTTP_DISABLED(default: disabled),KEEL_HTTP_ALLOWED_HOSTS(domain allowlist). - Json module (8 functions) — JSON encoding/decoding with full Keel type mapping.
parseconverts JSON strings to Keel values (objects→Records, arrays→Lists, etc.). Type-specific parsers (parseString,parseInt,parseFloat,parseBool) for safe extraction.encodeandencodePrettyconvert Keel values to JSON strings.getextracts fields from JSON strings. All parsing functions returnResulttypes for safe error handling. - Http error variants —
HttpDisabled,HttpHostNotAllowed,HttpErrorin VmError with ErrorHint implementations - Comprehensive Http test suite (31 tests) — Request construction, field defaults, modifier functions, chaining, jsonBody parsing, security controls, and network tests (ignored by default)
- Comprehensive Json test suite (62 tests) — Parsing all JSON types, type-specific parsers, encoding, pretty printing, field extraction, roundtrip tests, and pipe/let-binding usage
Fixed
24 items- List parser indentation errors — Misaligned commas in multiline lists now produce typed
IndentationErrorwith helpful hints instead of generic parse failures. The parser uses lookahead to distinguish misaligned commas (error) from legitimate dedents before closing brackets (valid multiline function calls inside lists). - DTA writer integer overflow — Value label keys and Int64/UInt32 cell values now use
i32::try_from()instead of uncheckedas i32casts. Values outside i32 range fall back to f64 instead of silently truncating. - VM arity checking — Functions now return
ArityMismatcherror when called with more arguments than they accept, instead of silently ignoring excess arguments. - VM negative index on heap lists —
HeapListGetnow checks for negative indices before casting tousize, returning a properListIndexOutOfBoundserror instead of wrapping to a large positive index. - **KEEL_* environment variable protection** —
IO.setEnvnow rejects attempts to modifyKEEL_*variables at runtime, preventing scripts from bypassing sandboxing and security controls. - Integer overflow in VM arithmetic — All arithmetic operations (
+,-,*,**,%, negation) now use checked operations (checked_add,checked_sub, etc.). On overflow, results auto-promote toFloatinstead of silently wrapping. - Unchecked RegisterValue→usize conversion — Replaced the generic
as_index()method with direct pattern matching at each call site. Negative integers, non-finite floats, and non-indexable types now return proper type-specific errors instead of silently producing garbage indices. - HTTP host validation bypass — Replaced hand-rolled
extract_host()with RFC-complianturl::Url::parse(). Userinfo attacks likehttp://allowed.com@evil.com/now correctly extractevil.comas the host. - Instruction cloning in VM hot loop — Replaced per-instruction
.clone()withArc::cloneof the instruction slice (refcount bump only), eliminating allocation in the VM dispatch loop. - Unbounded allocation size — Added
MAX_ALLOC_CAPACITY(10M elements) limit.AllocTuple,AllocList, andAllocRecordnow check capacity before callingVec::with_capacity, returningVmError::AllocationLimitExceededon overflow. - IO.extension doc example — Fixed malformed syntax in the documentation example that caused doc example tests to fail.
- VM time machine disabled by default — The time machine unconditionally cloned the entire VM state (registers, bytecode, call stack, heap, string interner) on every single instruction. This turned any non-trivial program into a memory bomb during fuzzing. Now guarded behind
time_machine_enabled: bool(defaultfalse). - Parser nesting depth limit — Added
check_nesting_depth()pre-parsing check (limit: 128 levels) toparse_file_with_state()andparse_file_lenient(). Prevents stack overflow and OOM on pathological inputs like deeply nested parentheses(((((((.... - VM execution step limit — Added
max_steps(default 10M) and step counter to the VM execution loop. ReturnsVmError::ExecutionLimitExceededwhen exceeded, preventing infinite loops from running forever. - Empty input handling —
compile_checked()now returnsOk(vec![])immediately for empty or whitespace-only input, instead of crashing. - DataFrame.fromRecords now has compile-time type inference — DataFrames created with
DataFrame.fromRecordsnow have their schema inferred from the record literal at compile time, enabling full type safety for subsequent operations likecolumn,select, andagg. Previously,fromRecordscreated untyped DataFrames that couldn't benefit from column-level type checking. - DataFrame operations support gradual typing — DataFrame operations now gracefully handle both typed and untyped DataFrames. When column schemas can't be determined at compile time (e.g., when using variables for aggregation specs), operations return untyped DataFrames that compile successfully but skip column-level validation. This enables flexible data workflows while preserving type safety where possible.
- DateTime tests fixed with serial execution — All 61 DateTime tests now use
#[serial]attribute to prevent environment variable pollution between tests. Previously, tests would fail when run in parallel due to theKEEL_DATETIME_DISABLEDsecurity test affecting other tests. - Unit type now parsed correctly in type signatures — The
()unit type can now be used in type signatures and as a function argument. Previously, empty parentheses were only recognized as tuple syntax, not as the Unit type. - FunctionDoc examples for Result, Maybe, List, DataFrame, and IO modules now execute correctly — Fixed 15 stdlib FunctionDoc examples that previously failed to compile or execute. Issues addressed include: lambda syntax in
List.foldl,foldr,zipWith(now uses|acc x|not|acc, x|); escape sequence handling inString.linesandunlines; enum constructor precedence in pipes forResult.map,mapError,andThen,withDefaultandMaybe.map,andThen(now wrapped in parentheses like(Ok 5) |> ...); type inference issues inDataFrame.fromLists,getMeta,getColumnMeta(now inlined); unused let binding inIO.readLine; and newline escaping inIO.appendFile. Added comprehensive test suite (test_all_function_doc_examples) validating all 264 FunctionDoc examples across 10 stdlib modules. Test results: 243 passing (92%), 21 failing (all expected failures from security-disabled features and missing test files). - Enum constructors (
Just,Nothing,Ok,Err) now parse correctly in pipe expressions — Fixed parser precedence so enum constructors stop at pipe operators, enabling natural composition likeJust 5 |> Maybe.map fandOk "test" |> Result.map String.toUpperwithout requiring parentheses around the constructor. The parser now usessimple_argfor constructor arguments, which accepts literals, lists, records, and function calls, but stops at binary operators (including|>). Addedparse_expr_maybe_atomandparse_expr_result_atomvariants that only accept literal atoms for use in contexts where full expression parsing would be ambiguous. Comprehensive test suite added with 150+ tests covering pipe precedence, nested constructs, and edge cases. - Large .dta file reading (>10GB) no longer fails with "Unable to read from file" — ReadStat's C
unistdI/O layer is now fully bypassed by Rust I/O handlers (std::fs::File) registered viareadstat_set_{open,close,seek,read}_handler. This eliminates C-level type mismatches (ssize_tvssize_t),intoverflow in row handling for >2B rows, andO_LARGEFILE/_FILE_OFFSET_BITSnon-issues on 64-bit Linux. The Rust read handler includes short-read retry and EINTR handling. Previous C-side patches (_FILE_OFFSET_BITS=64,O_LARGEFILE, EINTR retry inunistd_read_handler) are retained as defense-in-depth but are no longer the primary I/O path. - Result parameter order:
Result ok err— The internalType::Result(first, second)representation now consistently meansfirst=ok, second=err, matching the syntaxResult OkType ErrType. Previously, the extraction and creation code treated the first box as err and the second as ok, which was backwards from the Display impl, stdlib signatures, and documentation. This affected type inference for Ok/Err constructors and pattern matching on Result values. - Type alias parsing in compound types — Type aliases like
Personcan now be used directly as arguments toResult,Maybe, andListwithout parentheses. Previously,Result Person Stringwould fail because the parser greedily consumedStringas a type argument ofPerson. NowResult Person String,Maybe Person, andList Personall parse correctly. Parenthesized forms likeResult (Person) Stringcontinue to work for backward compatibility.
Removed
7 items- Binary and REPL — Removed
src/bin/keel.rsandsrc/repl.rs. The binary and REPL have been moved to thekeel-cliandkeel-replcrates respectively.keel-coreis now a pure library crate. Removedclapdependency. - Time machine — Removed the entire
src/time_machine/module (8 files:mod.rs,timeline.rs,navigation.rs,mapping.rs,integration.rs,compiler_snapshot.rs,lexer_snapshot.rs,parser_snapshot.rs,vm_snapshot.rs). Removed thetime-machinefeature flag fromCargo.toml. - Fuzz regression tests — Removed
tests/infrastructure/fuzz_regression.rs(obsolete test infrastructure). - CODE_REVIEW.md — All 18 findings have been resolved across previous commits; tracking document removed.
- Scope uses persistent data structure — Switched
Scopeinternals fromstd::collections::HashMaptoim::HashMap.enter_scopenow does an O(1) clone instead of deep-copying all variable/function/type maps. - Register vector pooling on function calls — Added
register_poolto the VM. On function call, a pooled vector is reused viaclone_frominstead of allocating a new one. On return, the callee's vector is returned to the pool. - String interner uses
Arc— Replaced doubleStringallocation (one for HashMap key, one for Vec value) with a singleArcshared between both collections, halving per-string allocation cost.
Showing page 1 of 3 (3 versions)