DataFrame Module
Tabular data manipulation backed by Polars DataFrames.
The DataFrame module provides a pipe-friendly API for loading, transforming, filtering, and aggregating tabular data. DataFrames are opaque native objects that can only be manipulated through DataFrame module functions.
Common patterns
import DataFrame
let data = DataFrame.readCsv "employees.csv"
let summary = data
|> DataFrame.select ["name", "department", "salary"]
|> DataFrame.filterGt "salary" 50000
|> DataFrame.sort "salary"
|> DataFrame.head 20
IO.println (DataFrame.shape summary)
Display
DataFrames render as formatted, column-aligned tables when printed or displayed in the REPL. Output includes shape, column names, dtypes, and data rows. Large DataFrames (>10 rows) show the first 5 and last 5 rows with a … separator:
shape: (1000, 3)
name | age | city
str | i64 | str
-------+-----+---------
Alice | 30 | New York
Bob | 25 | London
… | … | …
Yara | 31 | Berlin
Zach | 22 | Tokyo
Security
| Variable | Effect |
|---|---|
KEEL_DATAFRAME_DISABLED=1 | Disable DataFrame operations |
KEEL_DATAFRAME_SANDBOX=/path | Restrict file I/O to directory |
KEEL_DATAFRAME_MAX_ROWS=10000 | Limit rows loaded from files |
Functions
I/O
DataFrame.readCsv
String -> DataFrame
Read a CSV file into a DataFrame.
import DataFrame
DataFrame.readCsv "data.csv"Try itNotes: With a string literal path, column names and types are validated at compile time.
See also: DataFrame.readCsvColumns, DataFrame.readJson, DataFrame.readParquet
DataFrame.readJson
String -> DataFrame
Read a JSON file into a DataFrame.
import DataFrame
DataFrame.readJson "data.json"Try itNotes: With a string literal path, column names and types are validated at compile time.
See also: DataFrame.readJsonColumns, DataFrame.readCsv
DataFrame.readParquet
String -> DataFrame
Read a Parquet file into a DataFrame.
import DataFrame
DataFrame.readParquet "data.parquet"Try itNotes: With a string literal path, column names and types are validated at compile time.
See also: DataFrame.readParquetColumns, DataFrame.readCsv
DataFrame.writeCsv
String -> DataFrame -> Unit
Write a DataFrame to a CSV file.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.writeCsv "out.csv"Try itSee also: DataFrame.writeJson
DataFrame.writeJson
String -> DataFrame -> Unit
Write a DataFrame to a JSON file.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.writeJson "out.json"Try itSee also: DataFrame.writeCsv
DataFrame.writeParquet
String -> DataFrame -> Unit
Write a DataFrame to a Parquet file, including metadata.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.writeParquet "data.parquet"Try itNotes: Metadata is persisted in the Parquet file and restored by readParquet.
See also: DataFrame.readParquet, DataFrame.writeCsv
Column ops
DataFrame.select
[String] -> DataFrame -> DataFrame
Select columns by name.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]
df
|> DataFrame.select ["name", "age"]Try itNotes: Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type is narrowed to only the selected columns.
See also: DataFrame.drop, DataFrame.columns
DataFrame.drop
[String] -> DataFrame -> DataFrame
Drop columns by name.
import DataFrame
let df =
DataFrame.fromRecords [{ id = 1, name = "Alice" }, { id = 2, name = "Bob" }]
df
|> DataFrame.drop ["id"]Try itNotes: Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type excludes the dropped columns.
See also: DataFrame.select
DataFrame.rename
String -> String -> DataFrame -> DataFrame
Rename a column.
import DataFrame
let df =
DataFrame.fromRecords [{ old = 1 }, { old = 2 }]
df
|> DataFrame.rename "old" "new"Try itNotes: The old column name is validated at compile time on typed DataFrames. The resulting DataFrame's type reflects the rename.
DataFrame.withColumn
String -> [a] -> DataFrame -> DataFrame
Add or replace a column with the given values.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }, { name = "Carol" }]
df
|> DataFrame.withColumn "doubled"
[2, 4, 6]Try itNotes: On typed DataFrames, the resulting type includes the new column.
See also: DataFrame.column
DataFrame.column
String -> DataFrame -> [Maybe a]
Extract a column as a list of Maybe values (Just x for values, Nothing for nulls).
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
DataFrame.column "name" dfTry itNotes: Every value is wrapped in Maybe since DataFrame columns are nullable. The column name is validated at compile time on typed DataFrames.
See also: DataFrame.columns
DataFrame.columns
DataFrame -> [String]
Get the column names of a DataFrame.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.columns dfTry itSee also: DataFrame.dtypes
DataFrame.dtypes
DataFrame -> [(String, String)]
Get column names and their data types.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.dtypes dfTry itSee also: DataFrame.columns
Row ops
DataFrame.head
Int -> DataFrame -> DataFrame
Take the first n rows.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.head 10Try itSee also: DataFrame.tail, DataFrame.slice
DataFrame.tail
Int -> DataFrame -> DataFrame
Take the last n rows.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.tail 5Try itSee also: DataFrame.head
DataFrame.slice
Int -> Int -> DataFrame -> DataFrame
Take a slice of rows from offset with length.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.slice 0 1Try itSee also: DataFrame.head, DataFrame.tail
DataFrame.sort
String -> DataFrame -> DataFrame
Sort by a column in ascending order.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Bob" }, { name = "Alice" }]
df
|> DataFrame.sort "name"Try itNotes: Column name is validated at compile time on typed DataFrames.
See also: DataFrame.sortDesc
DataFrame.sortDesc
String -> DataFrame -> DataFrame
Sort by a column in descending order.
import DataFrame
let df =
DataFrame.fromRecords [{ salary = 50000 }, { salary = 70000 }]
df
|> DataFrame.sortDesc "salary"Try itNotes: Column name is validated at compile time on typed DataFrames.
See also: DataFrame.sort
DataFrame.unique
[String] -> DataFrame -> DataFrame
Keep unique rows based on specified columns.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.unique ["name"]Try itDataFrame.sample
Int -> DataFrame -> DataFrame
Randomly sample n rows.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
df
|> DataFrame.sample 1Try itSee also: DataFrame.head
Filters
DataFrame.filterEq
String -> a -> DataFrame -> DataFrame
Filter rows where column equals value.
import DataFrame
let df =
DataFrame.fromRecords [{ city = "Berlin" }, { city = "Munich" }]
df
|> DataFrame.filterEq "city" "Berlin"Try itNotes: Column name is validated at compile time on typed DataFrames.
See also: DataFrame.filterNeq, DataFrame.filterGt
DataFrame.filterNeq
String -> a -> DataFrame -> DataFrame
Filter rows where column does not equal value.
import DataFrame
let df =
DataFrame.fromRecords [{ status = "active" }, { status = "inactive" }]
df
|> DataFrame.filterNeq "status" "inactive"Try itSee also: DataFrame.filterEq
DataFrame.filterGt
String -> a -> DataFrame -> DataFrame
Filter rows where column is greater than value.
import DataFrame
let df =
DataFrame.fromRecords [{ salary = 40000 }, { salary = 60000 }]
df
|> DataFrame.filterGt "salary" 50000Try itSee also: DataFrame.filterGte, DataFrame.filterLt
DataFrame.filterGte
String -> a -> DataFrame -> DataFrame
Filter rows where column is greater than or equal to value.
import DataFrame
let df =
DataFrame.fromRecords [{ age = 17 }, { age = 18 }, { age = 25 }]
df
|> DataFrame.filterGte "age" 18Try itSee also: DataFrame.filterGt
DataFrame.filterLt
String -> a -> DataFrame -> DataFrame
Filter rows where column is less than value.
import DataFrame
let df =
DataFrame.fromRecords [{ price = 50 }, { price = 150 }]
df
|> DataFrame.filterLt "price" 100Try itSee also: DataFrame.filterLte, DataFrame.filterGt
DataFrame.filterLte
String -> a -> DataFrame -> DataFrame
Filter rows where column is less than or equal to value.
import DataFrame
let df =
DataFrame.fromRecords [{ score = 30 }, { score = 50 }, { score = 80 }]
df
|> DataFrame.filterLte "score" 50Try itSee also: DataFrame.filterLt
DataFrame.filterIn
String -> [a] -> DataFrame -> DataFrame
Filter rows where column value is in the given list.
import DataFrame
let df =
DataFrame.fromRecords [{ city = "Berlin" }, { city = "Munich" }, { city = "Hamburg" }]
df
|> DataFrame.filterIn "city"
["Berlin", "Munich"]Try itSee also: DataFrame.filterEq
Aggregation
DataFrame.groupBy
[String] -> DataFrame -> GroupedDataFrame
Group a DataFrame by the given columns.
import DataFrame
let df =
DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "Sales", salary = 60000 }]
df
|> DataFrame.groupBy ["department"]Try itNotes: Returns a GroupedDataFrame. Use DataFrame.agg to aggregate.
See also: DataFrame.agg
DataFrame.agg
[(String, String)] -> GroupedDataFrame -> DataFrame
Aggregate a grouped DataFrame. Supported: sum, mean, min, max, count, first, last, std, var, median.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", salary = 50000 }, { dept = "Sales", salary = 60000 }]
let specs =
[("salary", "mean")]
df
|> DataFrame.groupBy ["dept"]
|> DataFrame.agg specsTry itSee also: DataFrame.groupBy
DataFrame.count
DataFrame -> Int
Get the number of rows in a DataFrame.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]
DataFrame.count dfTry itSee also: DataFrame.shape
DataFrame.describe
DataFrame -> DataFrame
Compute summary statistics for all columns.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]
DataFrame.describe dfTry itMulti-DataFrame
DataFrame.join
String -> String -> DataFrame -> DataFrame -> DataFrame
Inner join two DataFrames on the given column names.
import DataFrame
let left =
DataFrame.fromRecords [{ id = 1, name = "Alice" }]
let right =
DataFrame.fromRecords [{ user_id = 1, role = "admin" }]
left
|> DataFrame.join "id" "user_id" rightTry itNotes: First two args are left and right join column names.
See also: DataFrame.concat
DataFrame.concat
[DataFrame] -> DataFrame
Concatenate a list of DataFrames vertically.
import DataFrame
let df1 =
DataFrame.fromRecords [{ name = "Alice" }]
let df2 =
DataFrame.fromRecords [{ name = "Bob" }]
DataFrame.concat [df1, df2]Try itSee also: DataFrame.join
DataFrame.pivot
String -> String -> String -> DataFrame -> DataFrame
Pivot a DataFrame: spread values from one column into new columns.
import DataFrame
let df =
DataFrame.fromRecords [{ category = "A", date = "Jan", amount = 100 }, { category = "B", date = "Jan", amount = 200 }]
df
|> DataFrame.pivot "category" "date" "amount"Try itNotes: Args: on, index, values.
Metadata
DataFrame.setMeta
String -> a -> DataFrame -> DataFrame
Set a dataset-level metadata key.
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1 }]
df
|> DataFrame.setMeta "name" "PISA 2022"Try itNotes: Metadata values can be String, Int, Float, Bool, List, or Record.
See also: DataFrame.getMeta, DataFrame.allMeta
DataFrame.getMeta
String -> DataFrame -> Maybe a
Get a dataset-level metadata value by key.
import DataFrame
DataFrame.fromRecords [{ x = 1 }]
|> DataFrame.setMeta "name" "test"
|> DataFrame.getMeta "name"Try itSee also: DataFrame.setMeta, DataFrame.allMeta
DataFrame.allMeta
DataFrame -> Record
Get all dataset-level metadata as a record.
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1 }]
DataFrame.allMeta dfTry itSee also: DataFrame.getMeta, DataFrame.setMeta
DataFrame.setColumnMeta
String -> String -> a -> DataFrame -> DataFrame
Set a column-level metadata key.
import DataFrame
let df =
DataFrame.fromRecords [{ score = 500 }]
df
|> DataFrame.setColumnMeta "score" "label" "Math score"Try itNotes: First arg is column name, second is metadata key.
See also: DataFrame.getColumnMeta, DataFrame.allColumnMeta
DataFrame.getColumnMeta
String -> String -> DataFrame -> Maybe a
Get a column-level metadata value by column and key.
import DataFrame
DataFrame.fromRecords [{ score = 500 }]
|> DataFrame.setColumnMeta "score" "label" "Math"
|> DataFrame.getColumnMeta "score" "label"Try itSee also: DataFrame.setColumnMeta, DataFrame.allColumnMeta
DataFrame.allColumnMeta
String -> DataFrame -> Record
Get all metadata for a specific column as a record.
import DataFrame
let df =
DataFrame.fromRecords [{ score = 500 }]
DataFrame.allColumnMeta "score" dfTry itSee also: DataFrame.getColumnMeta, DataFrame.setColumnMeta
DataFrame.describeMeta
DataFrame -> DataFrame
Get a summary DataFrame of all metadata (dataset and column level).
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1 }]
DataFrame.describeMeta dfTry itNotes: Returns a DataFrame with columns: level, key, value.
See also: DataFrame.allMeta, DataFrame.allColumnMeta
Inspection
DataFrame.shape
DataFrame -> (Int, Int)
Get the shape of a DataFrame as (rows, columns).
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.shape dfTry itSee also: DataFrame.count, DataFrame.columns
Conversion
DataFrame.toRecords
DataFrame -> [Record]
Convert a DataFrame to a list of Keel records.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice", age = 30 }]
DataFrame.toRecords dfTry itNotes: Each row becomes a record with column names as field names.
See also: DataFrame.fromRecords
DataFrame.fromRecords
[Record] -> DataFrame
Create a DataFrame from a list of Keel records.
import DataFrame
DataFrame.fromRecords [{ name = "Alice", age = 30 }]Try itNotes: All records should have the same fields.
See also: DataFrame.toRecords
Other
DataFrame.aggExprs
[Expr] -> GroupedDataFrame -> DataFrame
Aggregate a grouped DataFrame using a list of DataFrame.Expr expressions.
import DataFrame
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ group = "A", value = 10 }, { group = "A", value = 20 }, { group = "B", value = 30 }]
let totalExpr = col "value" |> Expr.sum |> Expr.named "total"
let avgExpr = col "value" |> Expr.mean |> Expr.named "average"
df |> DataFrame.groupBy ["group"] |> DataFrame.aggExprs [totalExpr, avgExpr]Try itNotes: Each expression should have an alias set using Expr.named. This defines the output column name.
See also: DataFrame.agg, DataFrame.groupBy, DataFrame.Expr.sum
DataFrame.collect
WindowedDataFrame -> DataFrame
Collect a windowed DataFrame back into a regular DataFrame, materializing all window computations.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.withRowNumber "row_num"
|> DataFrame.collectTry itNotes: Must be called at the end of a window function chain to produce a usable DataFrame.
See also: DataFrame.partitionBy
DataFrame.columnLineage
String -> DataFrame -> Maybe Record
Get lineage metadata for a specific column.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
DataFrame.columnLineage "name" dfTry itNotes: Returns Just Record with origin, transformations, and dependencies, or Nothing if column not found.
See also: DataFrame.lineage
DataFrame.describeLabel
String -> DataFrame -> String
Describe value labels for a single column as a formatted table.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
|> DataFrame.withValueLabels "gender" gender
DataFrame.describeLabel "gender" dfTry itNotes: Returns a formatted table with Value and Label columns. Returns a message if the column has no value labels.
See also: DataFrame.describeLabels, DataFrame.getValueLabels
DataFrame.describeLabels
DataFrame -> String
Describe all value labels in a DataFrame as a formatted string.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
|> DataFrame.withValueLabels "gender" gender
DataFrame.describeLabels dfTry itNotes: Lists all columns with value labels, sorted by column name. Returns "(no value labels)" if none are set.
See also: DataFrame.describeLabel, DataFrame.getValueLabels, DataFrame.getAllValueLabels, DataFrame.describeMeta
DataFrame.describeVariables
DataFrame -> DataFrame
STATA-style variable overview: returns a DataFrame with one row per column showing name, type, label, value labels, and metadata.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ name = "Alice", gender = 1 }]
|> DataFrame.withVarLabel "name" "Person's name"
|> DataFrame.withValueLabels "gender" gender
DataFrame.describeVariables dfTry itNotes: Returns a DataFrame with columns: name, type, label, values, metadata. Rows are in column order. Value labels show abbreviated form: {1=Male, 2=Female} for ≤5 labels, or "N labels" for more.
See also: DataFrame.describeMeta, DataFrame.describeLabels, DataFrame.getVarLabels, DataFrame.searchVariables
DataFrame.filter
(row -> Bool) -> DataFrame -> DataFrame
Filter rows using a predicate closure. Rows for which the predicate returns True are kept.
import DataFrame
let df =
DataFrame.fromRecords [{ x = 1, y = 10 }, { x = 5, y = 50 }, { x = 10, y = 100 }]
df
|> DataFrame.filter (|r| r.x > 2 && r.y < 100)Try itNotes: Performance: Lambdas are compiled to native Polars expressions when possible (SIMD, parallel execution). Falls back to row-by-row VM execution for unsupported operations.
Supported (fast path):
- Field access:
r.field - Arithmetic:
+,-,*,/,% - Comparisons:
<,<=,>,>=,==,!= - Boolean:
&&,||,not - Literals:
42,3.14,"str",True,False - Case expressions with pattern matching
- Stdlib:
Maybe.withDefault,Math.abs/floor/ceil/round/sqrt,String.length/toUpper/toLower/trim/contains/startsWith/endsWith
Unsupported (slow path fallback):
- User-defined function calls
- List operations
- Nested closures
- Variable capture from outer scope
- Tuple operations
For >1000 rows, a warning is emitted when fallback occurs.
See also: DataFrame.filterEq, DataFrame.filterGt, DataFrame.filterExpr
DataFrame.filterExpr
Expr -> DataFrame -> DataFrame
Filter rows using a DataFrame.Expr boolean expression. Always uses the fast Polars path.
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]
df
|> DataFrame.filterExpr (col "x" |> Expr.gt (lit 2))Try itNotes: This is the recommended way to filter when you need full control over the expression. Unlike the closure-based filter, this always compiles to Polars for optimal performance.
See also: DataFrame.filter, DataFrame.Expr.col, DataFrame.Expr.gt
DataFrame.fromLists
[(String, [a])] -> DataFrame
Create a multi-column DataFrame from a list of (column name, values) tuples.
import DataFrame
DataFrame.fromLists [("age", [30, 40]), ("name", ["Alice", "Bob"])]Try itNotes: Column-oriented data construction. All value lists must have the same length. Supports Maybe-wrapped values. Composes well with List.zip for programmatic column creation. For single-column DataFrames, pass a single-element list: [("col", [values])].
See also: DataFrame.fromRecords
DataFrame.getAllValueLabels
DataFrame -> { String : ValueLabelSet }
Get all value labels as a record mapping column names to ValueLabelSets.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
|> DataFrame.withValueLabels "gender" gender
DataFrame.getAllValueLabels dfTry itSee also: DataFrame.getValueLabels, DataFrame.withValueLabels
DataFrame.getDisplayMode
String -> DataFrame -> String
Get the display mode for a column.
import DataFrame
let df = DataFrame.fromRecords [{ gender = 1 }]
DataFrame.getDisplayMode "gender" df -- "Both"Try itNotes: Returns "Both" (the default) if no display mode has been set.
See also: DataFrame.withDisplayMode, DataFrame.withValueLabels
DataFrame.getValueLabels
String -> DataFrame -> Maybe ValueLabelSet
Get the value labels for a column, if any.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
|> DataFrame.withValueLabels "gender" gender
DataFrame.getValueLabels "gender" df -- Just (ValueLabelSet)Try itSee also: DataFrame.withValueLabels, DataFrame.getAllValueLabels
DataFrame.getVarLabel
String -> DataFrame -> Maybe String
Get the variable label for a column, if any.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
|> DataFrame.withVarLabel "name" "Person's name"
DataFrame.getVarLabel "name" df -- Just "Person's name"Try itSee also: DataFrame.withVarLabel, DataFrame.getVarLabels
DataFrame.getVarLabels
DataFrame -> { String : String }
Get all variable labels as a record.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
|> DataFrame.withVarLabel "name" "Person's name"
|> DataFrame.withVarLabel "age" "Age in years"
DataFrame.getVarLabels df -- { name = "Person's name", age = "Age in years" }Try itSee also: DataFrame.getVarLabel, DataFrame.withVarLabel
DataFrame.lazy
DataFrame -> LazyFrame
Convert a DataFrame to a LazyFrame for deferred, optimized execution.
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]
df
|> DataFrame.lazy
|> DataFrame.lazyFilter (col "x" |> Expr.gt (lit 2))
|> DataFrame.lazyCollectTry itNotes: LazyFrame enables Polars query optimization: predicate pushdown, projection pushdown, and parallel execution.
See also: DataFrame.lazyCollect, DataFrame.lazyFilter
DataFrame.lazyCollect
LazyFrame -> DataFrame
Materialize a LazyFrame back to a DataFrame, executing the optimized query plan.
import DataFrame
DataFrame.fromRecords [{ x = 1 }, { x = 2 }] |> DataFrame.lazy |> DataFrame.lazyCollectTry itNotes: This triggers the actual computation. Until collect is called, all operations are deferred.
See also: DataFrame.lazy
DataFrame.lazyFilter
Expr -> LazyFrame -> LazyFrame
Filter a LazyFrame using a DataFrame.Expr boolean expression.
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let filterExpr = col "x" |> Expr.gt (lit 5)
DataFrame.fromRecords [{ x = 1 }, { x = 10 }] |> DataFrame.lazy |> DataFrame.lazyFilter filterExpr |> DataFrame.lazyCollectTry itNotes: The filter is added to the query plan and optimized with other operations.
See also: DataFrame.lazy, DataFrame.filterExpr
DataFrame.lazySelect
[Expr] -> LazyFrame -> LazyFrame
Select columns from a LazyFrame using a list of Expr expressions.
import DataFrame
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
let yRenamed = col "y" |> Expr.named "y_renamed"
DataFrame.fromRecords [{ x = 1, y = 2 }] |> DataFrame.lazy |> DataFrame.lazySelect [yRenamed] |> DataFrame.lazyCollectTry itNotes: Enables projection pushdown - only selected columns are read from files.
See also: DataFrame.lazy, DataFrame.select
DataFrame.lazyWithColumns
[Expr] -> LazyFrame -> LazyFrame
Add or replace columns in a LazyFrame using a list of Expr expressions.
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let doubledExpr = col "x" |> Expr.mul (lit 2) |> Expr.named "x_doubled"
DataFrame.fromRecords [{ x = 5 }] |> DataFrame.lazy |> DataFrame.lazyWithColumns [doubledExpr] |> DataFrame.lazyCollectTry itNotes: Each expression should have an alias set using Expr.named.
See also: DataFrame.lazy, DataFrame.withColumns
DataFrame.lineage
DataFrame -> Record
Get complete lineage metadata for all columns, including origins and transformations.
import DataFrame
let df =
DataFrame.fromRecords [{ name = "Alice" }]
df
|> DataFrame.lineageTry itNotes: Returns a Record with 'columns' (per-column lineage) and 'globalOperations' (operations affecting all columns).
See also: DataFrame.columnLineage
DataFrame.mutate
(Record -> Record) -> DataFrame -> DataFrame
Apply a row-wise transformation function to a DataFrame. Automatically preserves all existing columns.
import DataFrame
let df =
DataFrame.fromRecords [{ val = 100 }, { val = 200 }]
df
|> DataFrame.mutate (|r| { doubled = 200 })Try itNotes: Return only new/modified columns - existing columns are auto-preserved. Returning a field with the same name as an existing column overwrites it. Best practice: use descriptive names (e.g., price_adjusted) to avoid accidental overwrites. Column values are Maybe-wrapped to handle nulls. Note: Wrap the record literal in parentheses to disambiguate from a block.
Performance: Lambdas are compiled to native Polars expressions when possible (SIMD, parallel execution). Falls back to row-by-row VM execution for unsupported operations.
Supported (fast path):
- Field access:
r.field - Arithmetic:
+,-,*,/,% - Comparisons:
<,<=,>,>=,==,!= - Boolean:
&&,||,not - Literals:
42,3.14,"str",True,False - Record construction:
{ field = expr } - Case expressions with pattern matching
- Stdlib:
Maybe.withDefault,Math.abs/floor/ceil/round/sqrt,String.length/toUpper/toLower/trim/contains/startsWith/endsWith
Unsupported (slow path fallback):
- User-defined function calls
- List operations
- Nested closures
- Variable capture from outer scope
- Tuple operations
For >1000 rows, a warning is emitted when fallback occurs.
See also: DataFrame.toRecords, DataFrame.withColumn
DataFrame.orderBy
[String] -> WindowedDataFrame -> WindowedDataFrame
Set the ordering columns for a windowed DataFrame. Required before rank, lag, lead, and rolling functions.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1 }, { dept = "Sales", date = 2 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]Try itNotes: Ordering determines how rows are sequenced within each partition. Must be called before withRank, withDenseRank, withLag, withLead, or rolling functions.
See also: DataFrame.partitionBy, DataFrame.withRank
DataFrame.partitionBy
[String] -> DataFrame -> WindowedDataFrame
Create a windowed DataFrame partitioned by the given columns. This is the entry point for all window function operations.
import DataFrame
let df =
DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "HR", salary = 60000 }]
df
|> DataFrame.partitionBy ["department"]Try itNotes: Partition columns define independent groups for window calculations. Chain with orderBy, ranking, lag/lead, rolling, or cumulative functions.
See also: DataFrame.orderBy, DataFrame.collect
DataFrame.readCsvColumns
[String] -> String -> DataFrame
Read only the specified columns from a CSV file.
import DataFrame
DataFrame.readCsvColumns ["name", "age"] "data.csv"Try itNotes: Column projection is applied at the I/O level for efficient reading. With literal path and columns, validated at compile time.
See also: DataFrame.readCsv
DataFrame.readDta
String -> DataFrame
Read a STATA .dta file into a DataFrame with metadata. With a string literal path, column names and types are validated at compile time.
import DataFrame
DataFrame.readDta "data.dta"Try itNotes: Preserves variable labels, value labels, and dataset label as metadata.
See also: DataFrame.readDtaColumns, DataFrame.writeDta, DataFrame.readCsv
DataFrame.readDtaColumns
[String] -> String -> DataFrame
Read only the specified columns from a STATA .dta file.
import DataFrame
DataFrame.readDtaColumns ["var1", "var2"] "data.dta"Try itNotes: Reads full file then selects columns. With literal path and columns, validated at compile time.
See also: DataFrame.readDta
DataFrame.readJsonColumns
[String] -> String -> DataFrame
Read only the specified columns from a JSON file.
import DataFrame
DataFrame.readJsonColumns ["x", "y"] "data.json"Try itNotes: Reads full file then selects columns. With literal path and columns, validated at compile time.
See also: DataFrame.readJson
DataFrame.readParquetColumns
[String] -> String -> DataFrame
Read only the specified columns from a Parquet file.
import DataFrame
DataFrame.readParquetColumns ["id", "score"] "data.parquet"Try itNotes: True columnar projection — unneeded columns are never read from disk. With literal path and columns, validated at compile time.
See also: DataFrame.readParquet
DataFrame.recode
String -> [(Int, Int)] -> DataFrame -> DataFrame
Recode values in a column according to a mapping. Automatically updates value labels.
import DataFrame
import ValueLabelSet
let labels = ValueLabelSet.fromList [(1, "Low"), (2, "Medium"), (3, "High")]
let df = DataFrame.fromRecords [{ score = 1 }, { score = 2 }, { score = 3 }]
|> DataFrame.withValueLabels "score" labels
-- Collapse categories: 1 stays 1, 2->1, 3->2
df |> DataFrame.recode "score" [(2, 1), (3, 2)]
-- Labels are automatically remapped: 1->"Low", 2->"High"Try itNotes: Value labels are automatically updated based on the recode mapping. Values not in the mapping remain unchanged.
See also: DataFrame.withValueLabels, DataFrame.mutate
DataFrame.removeValueLabels
String -> DataFrame -> DataFrame
Remove value labels from a column.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
|> DataFrame.withValueLabels "gender" gender
df |> DataFrame.removeValueLabels "gender"Try itSee also: DataFrame.withValueLabels, DataFrame.getValueLabels
DataFrame.removeVarLabel
String -> DataFrame -> DataFrame
Remove the variable label from a column.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
|> DataFrame.withVarLabel "name" "Person's name"
df |> DataFrame.removeVarLabel "name"Try itSee also: DataFrame.withVarLabel, DataFrame.getVarLabel
DataFrame.searchVariables
String -> DataFrame -> DataFrame
Search for variables by name, label, value labels, or metadata. Returns matching variables as a DataFrame.
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ name = "Alice", gender = 1, income = 50000 }]
|> DataFrame.withVarLabel "name" "Person's name"
|> DataFrame.withVarLabel "income" "Annual income in USD"
|> DataFrame.withValueLabels "gender" gender
-- Search by variable name
DataFrame.searchVariables "name" df
-- Search by label text
DataFrame.searchVariables "income" df
-- Search by value label text
DataFrame.searchVariables "Male" dfTry itNotes: Case-insensitive substring search across all variable metadata: name, label (description), value labels, and column metadata. Returns a DataFrame with columns: name, type, label, values, metadata.
See also: DataFrame.describeVariables, DataFrame.describeMeta, DataFrame.getVarLabels
DataFrame.withColumns
[Expr] -> DataFrame -> DataFrame
Add or replace multiple columns using a list of DataFrame.Expr expressions.
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let df = DataFrame.fromRecords [{ x = 1, y = 10 }, { x = 2, y = 20 }]
let xDoubled = col "x" |> Expr.mul (lit 2) |> Expr.named "x_doubled"
let sumExpr = col "x" |> Expr.add (col "y") |> Expr.named "sum"
df |> DataFrame.withColumns [xDoubled, sumExpr]Try itNotes: Each expression should have an alias set using Expr.named. This defines the output column name.
See also: DataFrame.withColumn, DataFrame.Expr.named
DataFrame.withCumMax
String -> String -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative maximum column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withCumMax "running_max" "sales"
|> DataFrame.collectTry itNotes: Args: new column name, source column. Tracks the maximum value seen so far in the partition.
See also: DataFrame.withCumMin, DataFrame.withRollingMax
DataFrame.withCumMean
String -> String -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative mean column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withCumMean "running_avg" "sales"
|> DataFrame.collectTry itNotes: Args: new column name, source column. Computes running average over all preceding rows in the partition.
See also: DataFrame.withCumSum, DataFrame.withRollingMean
DataFrame.withCumMin
String -> String -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative minimum column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withCumMin "running_min" "sales"
|> DataFrame.collectTry itNotes: Args: new column name, source column. Tracks the minimum value seen so far in the partition.
See also: DataFrame.withCumMax, DataFrame.withRollingMin
DataFrame.withCumSum
String -> String -> WindowedDataFrame -> WindowedDataFrame
Add a cumulative sum column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withCumSum "running_total" "sales"
|> DataFrame.collectTry itNotes: Args: new column name, source column. Computes running total over all preceding rows in the partition.
See also: DataFrame.withCumMean, DataFrame.withRollingSum
DataFrame.withDenseRank
String -> WindowedDataFrame -> WindowedDataFrame
Add a dense rank column within each partition. Ties receive the same rank, with no gaps (e.g., 1, 2, 2, 3).
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["score"]
|> DataFrame.withDenseRank "dense_rank"
|> DataFrame.collectTry itNotes: Argument is the name of the new column. Requires orderBy to be set first.
See also: DataFrame.withRank, DataFrame.withRowNumber, DataFrame.orderBy
DataFrame.withDisplayMode
String -> String -> DataFrame -> DataFrame
Set how a column's values should be displayed (Raw, Labeled, or Both).
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
|> DataFrame.withValueLabels "gender" gender
|> DataFrame.withDisplayMode "gender" "Labeled"Try itNotes: Display modes: "Raw" shows only the value, "Labeled" shows only the label, "Both" (default) shows "value (label)".
See also: DataFrame.getDisplayMode, DataFrame.withValueLabels
DataFrame.withLag
String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a column with the value from a previous row within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withLag "prev_sales" "sales" 1
|> DataFrame.collectTry itNotes: Args: new column name, source column, offset (number of rows back). Produces Nothing for rows without a previous value. Requires orderBy.
See also: DataFrame.withLead, DataFrame.orderBy
DataFrame.withLead
String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a column with the value from a subsequent row within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withLead "next_sales" "sales" 1
|> DataFrame.collectTry itNotes: Args: new column name, source column, offset (number of rows forward). Produces Nothing for rows without a subsequent value. Requires orderBy.
See also: DataFrame.withLag, DataFrame.orderBy
DataFrame.withRank
String -> WindowedDataFrame -> WindowedDataFrame
Add a rank column within each partition. Ties receive the same rank, with gaps after ties (e.g., 1, 2, 2, 4).
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["score"]
|> DataFrame.withRank "rank"
|> DataFrame.collectTry itNotes: Argument is the name of the new column. Requires orderBy to be set first.
See also: DataFrame.withDenseRank, DataFrame.withRowNumber, DataFrame.orderBy
DataFrame.withRollingMax
String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling maximum column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withRollingMax "max_3d" "sales" 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingMin, DataFrame.withCumMax
DataFrame.withRollingMean
String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling mean column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withRollingMean "avg_3d" "sales" 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingSum, DataFrame.withCumMean
DataFrame.withRollingMin
String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling minimum column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withRollingMin "min_3d" "sales" 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingMax, DataFrame.withCumMin
DataFrame.withRollingSum
String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame
Add a rolling sum column computed over a fixed-size window within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.orderBy ["date"]
|> DataFrame.withRollingSum "sum_3d" "sales" 3
|> DataFrame.collectTry itNotes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.
See also: DataFrame.withRollingMean, DataFrame.withCumSum
DataFrame.withRowNumber
String -> WindowedDataFrame -> WindowedDataFrame
Add a sequential row number column within each partition.
import DataFrame
let df =
DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]
df
|> DataFrame.partitionBy ["dept"]
|> DataFrame.withRowNumber "row_num"
|> DataFrame.collectTry itNotes: Argument is the name of the new column. Row numbers start at 1. Does not require orderBy.
See also: DataFrame.withRank, DataFrame.withDenseRank
DataFrame.withValueLabels
String -> ValueLabelSet -> DataFrame -> DataFrame
Attach value labels to a column (lenient - allows unlabeled values).
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
df |> DataFrame.withValueLabels "gender" genderTry itNotes: Value labels map numeric codes to human-readable labels. Use withValueLabelsStrict for exhaustive validation.
See also: DataFrame.withValueLabelsStrict, DataFrame.getValueLabels
DataFrame.withValueLabelsStrict
String -> ValueLabelSet -> DataFrame -> DataFrame
Attach value labels to a column (strict - all values must have labels).
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
df |> DataFrame.withValueLabelsStrict "gender" genderTry itNotes: Returns an error if any value in the column doesn't have a corresponding label.
See also: DataFrame.withValueLabels, DataFrame.getValueLabels
DataFrame.withVarLabel
String -> String -> DataFrame -> DataFrame
Set a variable label (description) for a column.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
df |> DataFrame.withVarLabel "name" "Person's full name"Try itNotes: Variable labels describe what a column represents. They are preserved in STATA files.
See also: DataFrame.getVarLabel, DataFrame.removeVarLabel
DataFrame.writeDta
String -> DataFrame -> Unit
Write a DataFrame to a STATA .dta file with metadata.
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
df |> DataFrame.writeDta "data.dta"Try itNotes: Preserves variable labels, value labels, and dataset label from metadata.
See also: DataFrame.readDta, DataFrame.writeCsv