Esc
Start typing to search...

DataFrame Module

Tabular data manipulation backed by Polars DataFrames.

The DataFrame module provides a pipe-friendly API for loading, transforming, filtering, and aggregating tabular data. DataFrames are opaque native objects that can only be manipulated through DataFrame module functions.

Common patterns

import DataFrame

let data = DataFrame.readCsv "employees.csv"
let summary = data
    |> DataFrame.select ["name", "department", "salary"]
    |> DataFrame.filterGt "salary" 50000
    |> DataFrame.sort "salary"
    |> DataFrame.head 20
IO.println (DataFrame.shape summary)

Display

DataFrames render as formatted, column-aligned tables when printed or displayed in the REPL. Output includes shape, column names, dtypes, and data rows. Large DataFrames (>10 rows) show the first 5 and last 5 rows with a separator:

shape: (1000, 3)
name | age |     city
str | i64 |      str
-------+-----+---------
Alice |  30 | New York
Bob |  25 |   London
… |   … |        …
Yara |  31 |   Berlin
Zach |  22 |    Tokyo

Security

VariableEffect
KEEL_DATAFRAME_DISABLED=1Disable DataFrame operations
KEEL_DATAFRAME_SANDBOX=/pathRestrict file I/O to directory
KEEL_DATAFRAME_MAX_ROWS=10000Limit rows loaded from files

Functions

I/O

DataFrame.readCsv

String -> DataFrame

Read a CSV file into a DataFrame.

Example:
import DataFrame

DataFrame.readCsv "data.csv"
Try it

Notes: With a string literal path, column names and types are validated at compile time.

See also: DataFrame.readCsvColumns, DataFrame.readJson, DataFrame.readParquet

DataFrame.readJson

String -> DataFrame

Read a JSON file into a DataFrame.

Example:
import DataFrame

DataFrame.readJson "data.json"
Try it

Notes: With a string literal path, column names and types are validated at compile time.

See also: DataFrame.readJsonColumns, DataFrame.readCsv

DataFrame.readParquet

String -> DataFrame

Read a Parquet file into a DataFrame.

Example:
import DataFrame

DataFrame.readParquet "data.parquet"
Try it

Notes: With a string literal path, column names and types are validated at compile time.

See also: DataFrame.readParquetColumns, DataFrame.readCsv

DataFrame.writeCsv

String -> DataFrame -> Unit

Write a DataFrame to a CSV file.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.writeCsv "out.csv"
Try it

See also: DataFrame.writeJson

DataFrame.writeJson

String -> DataFrame -> Unit

Write a DataFrame to a JSON file.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.writeJson "out.json"
Try it

See also: DataFrame.writeCsv

DataFrame.writeParquet

String -> DataFrame -> Unit

Write a DataFrame to a Parquet file, including metadata.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.writeParquet "data.parquet"
Try it

Notes: Metadata is persisted in the Parquet file and restored by readParquet.

See also: DataFrame.readParquet, DataFrame.writeCsv

Column ops

DataFrame.select

[String] -> DataFrame -> DataFrame

Select columns by name.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]

df
    |> DataFrame.select ["name", "age"]
Try it

Notes: Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type is narrowed to only the selected columns.

See also: DataFrame.drop, DataFrame.columns

DataFrame.drop

[String] -> DataFrame -> DataFrame

Drop columns by name.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ id = 1, name = "Alice" }, { id = 2, name = "Bob" }]

df
    |> DataFrame.drop ["id"]
Try it

Notes: Column names are validated at compile time on typed DataFrames. The resulting DataFrame's type excludes the dropped columns.

See also: DataFrame.select

DataFrame.rename

String -> String -> DataFrame -> DataFrame

Rename a column.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ old = 1 }, { old = 2 }]

df
    |> DataFrame.rename "old" "new"
Try it

Notes: The old column name is validated at compile time on typed DataFrames. The resulting DataFrame's type reflects the rename.

DataFrame.withColumn

String -> [a] -> DataFrame -> DataFrame

Add or replace a column with the given values.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }, { name = "Carol" }]

df
    |> DataFrame.withColumn "doubled"
    [2, 4, 6]
Try it

Notes: On typed DataFrames, the resulting type includes the new column.

See also: DataFrame.column

DataFrame.column

String -> DataFrame -> [Maybe a]

Extract a column as a list of Maybe values (Just x for values, Nothing for nulls).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

DataFrame.column "name" df
Try it

Notes: Every value is wrapped in Maybe since DataFrame columns are nullable. The column name is validated at compile time on typed DataFrames.

See also: DataFrame.columns

DataFrame.columns

DataFrame -> [String]

Get the column names of a DataFrame.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.columns df
Try it

See also: DataFrame.dtypes

DataFrame.dtypes

DataFrame -> [(String, String)]

Get column names and their data types.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.dtypes df
Try it

See also: DataFrame.columns

Row ops

DataFrame.tail

Int -> DataFrame -> DataFrame

Take the last n rows.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.tail 5
Try it

See also: DataFrame.head

DataFrame.slice

Int -> Int -> DataFrame -> DataFrame

Take a slice of rows from offset with length.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.slice 0 1
Try it

See also: DataFrame.head, DataFrame.tail

DataFrame.sort

String -> DataFrame -> DataFrame

Sort by a column in ascending order.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Bob" }, { name = "Alice" }]

df
    |> DataFrame.sort "name"
Try it

Notes: Column name is validated at compile time on typed DataFrames.

See also: DataFrame.sortDesc

DataFrame.sortDesc

String -> DataFrame -> DataFrame

Sort by a column in descending order.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ salary = 50000 }, { salary = 70000 }]

df
    |> DataFrame.sortDesc "salary"
Try it

Notes: Column name is validated at compile time on typed DataFrames.

See also: DataFrame.sort

DataFrame.unique

[String] -> DataFrame -> DataFrame

Keep unique rows based on specified columns.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.unique ["name"]
Try it

DataFrame.sample

Int -> DataFrame -> DataFrame

Randomly sample n rows.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

df
    |> DataFrame.sample 1
Try it

See also: DataFrame.head

Filters

DataFrame.filterEq

String -> a -> DataFrame -> DataFrame

Filter rows where column equals value.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ city = "Berlin" }, { city = "Munich" }]

df
    |> DataFrame.filterEq "city" "Berlin"
Try it

Notes: Column name is validated at compile time on typed DataFrames.

See also: DataFrame.filterNeq, DataFrame.filterGt

DataFrame.filterNeq

String -> a -> DataFrame -> DataFrame

Filter rows where column does not equal value.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ status = "active" }, { status = "inactive" }]

df
    |> DataFrame.filterNeq "status" "inactive"
Try it

See also: DataFrame.filterEq

DataFrame.filterGt

String -> a -> DataFrame -> DataFrame

Filter rows where column is greater than value.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ salary = 40000 }, { salary = 60000 }]

df
    |> DataFrame.filterGt "salary" 50000
Try it

See also: DataFrame.filterGte, DataFrame.filterLt

DataFrame.filterGte

String -> a -> DataFrame -> DataFrame

Filter rows where column is greater than or equal to value.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ age = 17 }, { age = 18 }, { age = 25 }]

df
    |> DataFrame.filterGte "age" 18
Try it

See also: DataFrame.filterGt

DataFrame.filterLt

String -> a -> DataFrame -> DataFrame

Filter rows where column is less than value.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ price = 50 }, { price = 150 }]

df
    |> DataFrame.filterLt "price" 100
Try it

See also: DataFrame.filterLte, DataFrame.filterGt

DataFrame.filterLte

String -> a -> DataFrame -> DataFrame

Filter rows where column is less than or equal to value.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ score = 30 }, { score = 50 }, { score = 80 }]

df
    |> DataFrame.filterLte "score" 50
Try it

See also: DataFrame.filterLt

DataFrame.filterIn

String -> [a] -> DataFrame -> DataFrame

Filter rows where column value is in the given list.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ city = "Berlin" }, { city = "Munich" }, { city = "Hamburg" }]

df
    |> DataFrame.filterIn "city"
    ["Berlin", "Munich"]
Try it

See also: DataFrame.filterEq

Aggregation

DataFrame.groupBy

[String] -> DataFrame -> GroupedDataFrame

Group a DataFrame by the given columns.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "Sales", salary = 60000 }]

df
    |> DataFrame.groupBy ["department"]
Try it

Notes: Returns a GroupedDataFrame. Use DataFrame.agg to aggregate.

See also: DataFrame.agg

DataFrame.agg

[(String, String)] -> GroupedDataFrame -> DataFrame

Aggregate a grouped DataFrame. Supported: sum, mean, min, max, count, first, last, std, var, median.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", salary = 50000 }, { dept = "Sales", salary = 60000 }]

let specs =
    [("salary", "mean")]

df
    |> DataFrame.groupBy ["dept"]
    |> DataFrame.agg specs
Try it

See also: DataFrame.groupBy

DataFrame.count

DataFrame -> Int

Get the number of rows in a DataFrame.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }, { name = "Bob" }]

DataFrame.count df
Try it

See also: DataFrame.shape

DataFrame.describe

DataFrame -> DataFrame

Compute summary statistics for all columns.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }, { name = "Bob", age = 25 }]

DataFrame.describe df
Try it

Multi-DataFrame

DataFrame.join

String -> String -> DataFrame -> DataFrame -> DataFrame

Inner join two DataFrames on the given column names.

Example:
import DataFrame

let left =
    DataFrame.fromRecords [{ id = 1, name = "Alice" }]

let right =
    DataFrame.fromRecords [{ user_id = 1, role = "admin" }]

left
    |> DataFrame.join "id" "user_id" right
Try it

Notes: First two args are left and right join column names.

See also: DataFrame.concat

DataFrame.concat

[DataFrame] -> DataFrame

Concatenate a list of DataFrames vertically.

Example:
import DataFrame

let df1 =
    DataFrame.fromRecords [{ name = "Alice" }]

let df2 =
    DataFrame.fromRecords [{ name = "Bob" }]

DataFrame.concat [df1, df2]
Try it

See also: DataFrame.join

DataFrame.pivot

String -> String -> String -> DataFrame -> DataFrame

Pivot a DataFrame: spread values from one column into new columns.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ category = "A", date = "Jan", amount = 100 }, { category = "B", date = "Jan", amount = 200 }]

df
    |> DataFrame.pivot "category" "date" "amount"
Try it

Notes: Args: on, index, values.

Metadata

DataFrame.setMeta

String -> a -> DataFrame -> DataFrame

Set a dataset-level metadata key.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1 }]

df
    |> DataFrame.setMeta "name" "PISA 2022"
Try it

Notes: Metadata values can be String, Int, Float, Bool, List, or Record.

See also: DataFrame.getMeta, DataFrame.allMeta

DataFrame.getMeta

String -> DataFrame -> Maybe a

Get a dataset-level metadata value by key.

Example:
import DataFrame

DataFrame.fromRecords [{ x = 1 }]
    |> DataFrame.setMeta "name" "test"
    |> DataFrame.getMeta "name"
Try it

See also: DataFrame.setMeta, DataFrame.allMeta

DataFrame.allMeta

DataFrame -> Record

Get all dataset-level metadata as a record.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1 }]

DataFrame.allMeta df
Try it

See also: DataFrame.getMeta, DataFrame.setMeta

DataFrame.setColumnMeta

String -> String -> a -> DataFrame -> DataFrame

Set a column-level metadata key.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ score = 500 }]

df
    |> DataFrame.setColumnMeta "score" "label" "Math score"
Try it

Notes: First arg is column name, second is metadata key.

See also: DataFrame.getColumnMeta, DataFrame.allColumnMeta

DataFrame.getColumnMeta

String -> String -> DataFrame -> Maybe a

Get a column-level metadata value by column and key.

Example:
import DataFrame

DataFrame.fromRecords [{ score = 500 }]
    |> DataFrame.setColumnMeta "score" "label" "Math"
    |> DataFrame.getColumnMeta "score" "label"
Try it

See also: DataFrame.setColumnMeta, DataFrame.allColumnMeta

DataFrame.allColumnMeta

String -> DataFrame -> Record

Get all metadata for a specific column as a record.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ score = 500 }]

DataFrame.allColumnMeta "score" df
Try it

See also: DataFrame.getColumnMeta, DataFrame.setColumnMeta

DataFrame.describeMeta

DataFrame -> DataFrame

Get a summary DataFrame of all metadata (dataset and column level).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1 }]

DataFrame.describeMeta df
Try it

Notes: Returns a DataFrame with columns: level, key, value.

See also: DataFrame.allMeta, DataFrame.allColumnMeta

Inspection

DataFrame.shape

DataFrame -> (Int, Int)

Get the shape of a DataFrame as (rows, columns).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.shape df
Try it

See also: DataFrame.count, DataFrame.columns

Conversion

DataFrame.toRecords

DataFrame -> [Record]

Convert a DataFrame to a list of Keel records.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice", age = 30 }]

DataFrame.toRecords df
Try it

Notes: Each row becomes a record with column names as field names.

See also: DataFrame.fromRecords

DataFrame.fromRecords

[Record] -> DataFrame

Create a DataFrame from a list of Keel records.

Example:
import DataFrame

DataFrame.fromRecords [{ name = "Alice", age = 30 }]
Try it

Notes: All records should have the same fields.

See also: DataFrame.toRecords

Other

DataFrame.aggExprs

[Expr] -> GroupedDataFrame -> DataFrame

Aggregate a grouped DataFrame using a list of DataFrame.Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ group = "A", value = 10 }, { group = "A", value = 20 }, { group = "B", value = 30 }]
let totalExpr = col "value" |> Expr.sum |> Expr.named "total"
let avgExpr = col "value" |> Expr.mean |> Expr.named "average"

df |> DataFrame.groupBy ["group"] |> DataFrame.aggExprs [totalExpr, avgExpr]
Try it

Notes: Each expression should have an alias set using Expr.named. This defines the output column name.

See also: DataFrame.agg, DataFrame.groupBy, DataFrame.Expr.sum

DataFrame.collect

WindowedDataFrame -> DataFrame

Collect a windowed DataFrame back into a regular DataFrame, materializing all window computations.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.withRowNumber "row_num"
    |> DataFrame.collect
Try it

Notes: Must be called at the end of a window function chain to produce a usable DataFrame.

See also: DataFrame.partitionBy

DataFrame.columnLineage

String -> DataFrame -> Maybe Record

Get lineage metadata for a specific column.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

DataFrame.columnLineage "name" df
Try it

Notes: Returns Just Record with origin, transformations, and dependencies, or Nothing if column not found.

See also: DataFrame.lineage

DataFrame.describeLabel

String -> DataFrame -> String

Describe value labels for a single column as a formatted table.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
    |> DataFrame.withValueLabels "gender" gender
DataFrame.describeLabel "gender" df
Try it

Notes: Returns a formatted table with Value and Label columns. Returns a message if the column has no value labels.

See also: DataFrame.describeLabels, DataFrame.getValueLabels

DataFrame.describeLabels

DataFrame -> String

Describe all value labels in a DataFrame as a formatted string.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
    |> DataFrame.withValueLabels "gender" gender
DataFrame.describeLabels df
Try it

Notes: Lists all columns with value labels, sorted by column name. Returns "(no value labels)" if none are set.

See also: DataFrame.describeLabel, DataFrame.getValueLabels, DataFrame.getAllValueLabels, DataFrame.describeMeta

DataFrame.describeVariables

DataFrame -> DataFrame

STATA-style variable overview: returns a DataFrame with one row per column showing name, type, label, value labels, and metadata.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ name = "Alice", gender = 1 }]
    |> DataFrame.withVarLabel "name" "Person's name"
    |> DataFrame.withValueLabels "gender" gender
DataFrame.describeVariables df
Try it

Notes: Returns a DataFrame with columns: name, type, label, values, metadata. Rows are in column order. Value labels show abbreviated form: {1=Male, 2=Female} for ≤5 labels, or "N labels" for more.

See also: DataFrame.describeMeta, DataFrame.describeLabels, DataFrame.getVarLabels, DataFrame.searchVariables

DataFrame.filter

(row -> Bool) -> DataFrame -> DataFrame

Filter rows using a predicate closure. Rows for which the predicate returns True are kept.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ x = 1, y = 10 }, { x = 5, y = 50 }, { x = 10, y = 100 }]

df
    |> DataFrame.filter (|r| r.x > 2 && r.y < 100)
Try it

Notes: Performance: Lambdas are compiled to native Polars expressions when possible (SIMD, parallel execution). Falls back to row-by-row VM execution for unsupported operations.

Supported (fast path):

  • Field access: r.field
  • Arithmetic: +, -, *, /, %
  • Comparisons: <, <=, >, >=, ==, !=
  • Boolean: &&, ||, not
  • Literals: 42, 3.14, "str", True, False
  • Case expressions with pattern matching
  • Stdlib: Maybe.withDefault, Math.abs/floor/ceil/round/sqrt, String.length/toUpper/toLower/trim/contains/startsWith/endsWith

Unsupported (slow path fallback):

  • User-defined function calls
  • List operations
  • Nested closures
  • Variable capture from outer scope
  • Tuple operations

For >1000 rows, a warning is emitted when fallback occurs.

See also: DataFrame.filterEq, DataFrame.filterGt, DataFrame.filterExpr

DataFrame.filterExpr

Expr -> DataFrame -> DataFrame

Filter rows using a DataFrame.Expr boolean expression. Always uses the fast Polars path.

Example:
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]

df
    |> DataFrame.filterExpr (col "x" |> Expr.gt (lit 2))
Try it

Notes: This is the recommended way to filter when you need full control over the expression. Unlike the closure-based filter, this always compiles to Polars for optimal performance.

See also: DataFrame.filter, DataFrame.Expr.col, DataFrame.Expr.gt

DataFrame.fromLists

[(String, [a])] -> DataFrame

Create a multi-column DataFrame from a list of (column name, values) tuples.

Example:
import DataFrame

DataFrame.fromLists [("age", [30, 40]), ("name", ["Alice", "Bob"])]
Try it

Notes: Column-oriented data construction. All value lists must have the same length. Supports Maybe-wrapped values. Composes well with List.zip for programmatic column creation. For single-column DataFrames, pass a single-element list: [("col", [values])].

See also: DataFrame.fromRecords

DataFrame.getAllValueLabels

DataFrame -> { String : ValueLabelSet }

Get all value labels as a record mapping column names to ValueLabelSets.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
    |> DataFrame.withValueLabels "gender" gender
DataFrame.getAllValueLabels df
Try it

See also: DataFrame.getValueLabels, DataFrame.withValueLabels

DataFrame.getDisplayMode

String -> DataFrame -> String

Get the display mode for a column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ gender = 1 }]
DataFrame.getDisplayMode "gender" df  -- "Both"
Try it

Notes: Returns "Both" (the default) if no display mode has been set.

See also: DataFrame.withDisplayMode, DataFrame.withValueLabels

DataFrame.getValueLabels

String -> DataFrame -> Maybe ValueLabelSet

Get the value labels for a column, if any.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
    |> DataFrame.withValueLabels "gender" gender
DataFrame.getValueLabels "gender" df  -- Just (ValueLabelSet)
Try it

See also: DataFrame.withValueLabels, DataFrame.getAllValueLabels

DataFrame.getVarLabel

String -> DataFrame -> Maybe String

Get the variable label for a column, if any.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
    |> DataFrame.withVarLabel "name" "Person's name"
DataFrame.getVarLabel "name" df  -- Just "Person's name"
Try it

See also: DataFrame.withVarLabel, DataFrame.getVarLabels

DataFrame.getVarLabels

DataFrame -> { String : String }

Get all variable labels as a record.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
    |> DataFrame.withVarLabel "name" "Person's name"
    |> DataFrame.withVarLabel "age" "Age in years"
DataFrame.getVarLabels df  -- { name = "Person's name", age = "Age in years" }
Try it

See also: DataFrame.getVarLabel, DataFrame.withVarLabel

DataFrame.lazy

DataFrame -> LazyFrame

Convert a DataFrame to a LazyFrame for deferred, optimized execution.

Example:
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1 }, { x = 5 }, { x = 10 }]

df
    |> DataFrame.lazy
    |> DataFrame.lazyFilter (col "x" |> Expr.gt (lit 2))
    |> DataFrame.lazyCollect
Try it

Notes: LazyFrame enables Polars query optimization: predicate pushdown, projection pushdown, and parallel execution.

See also: DataFrame.lazyCollect, DataFrame.lazyFilter

DataFrame.lazyCollect

LazyFrame -> DataFrame

Materialize a LazyFrame back to a DataFrame, executing the optimized query plan.

Example:
import DataFrame
DataFrame.fromRecords [{ x = 1 }, { x = 2 }] |> DataFrame.lazy |> DataFrame.lazyCollect
Try it

Notes: This triggers the actual computation. Until collect is called, all operations are deferred.

See also: DataFrame.lazy

DataFrame.lazyFilter

Expr -> LazyFrame -> LazyFrame

Filter a LazyFrame using a DataFrame.Expr boolean expression.

Example:
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let filterExpr = col "x" |> Expr.gt (lit 5)
DataFrame.fromRecords [{ x = 1 }, { x = 10 }] |> DataFrame.lazy |> DataFrame.lazyFilter filterExpr |> DataFrame.lazyCollect
Try it

Notes: The filter is added to the query plan and optimized with other operations.

See also: DataFrame.lazy, DataFrame.filterExpr

DataFrame.lazySelect

[Expr] -> LazyFrame -> LazyFrame

Select columns from a LazyFrame using a list of Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
let yRenamed = col "y" |> Expr.named "y_renamed"
DataFrame.fromRecords [{ x = 1, y = 2 }] |> DataFrame.lazy |> DataFrame.lazySelect [yRenamed] |> DataFrame.lazyCollect
Try it

Notes: Enables projection pushdown - only selected columns are read from files.

See also: DataFrame.lazy, DataFrame.select

DataFrame.lazyWithColumns

[Expr] -> LazyFrame -> LazyFrame

Add or replace columns in a LazyFrame using a list of Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
let doubledExpr = col "x" |> Expr.mul (lit 2) |> Expr.named "x_doubled"
DataFrame.fromRecords [{ x = 5 }] |> DataFrame.lazy |> DataFrame.lazyWithColumns [doubledExpr] |> DataFrame.lazyCollect
Try it

Notes: Each expression should have an alias set using Expr.named.

See also: DataFrame.lazy, DataFrame.withColumns

DataFrame.lineage

DataFrame -> Record

Get complete lineage metadata for all columns, including origins and transformations.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ name = "Alice" }]

df
    |> DataFrame.lineage
Try it

Notes: Returns a Record with 'columns' (per-column lineage) and 'globalOperations' (operations affecting all columns).

See also: DataFrame.columnLineage

DataFrame.mutate

(Record -> Record) -> DataFrame -> DataFrame

Apply a row-wise transformation function to a DataFrame. Automatically preserves all existing columns.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ val = 100 }, { val = 200 }]

df
    |> DataFrame.mutate (|r| { doubled = 200 })
Try it

Notes: Return only new/modified columns - existing columns are auto-preserved. Returning a field with the same name as an existing column overwrites it. Best practice: use descriptive names (e.g., price_adjusted) to avoid accidental overwrites. Column values are Maybe-wrapped to handle nulls. Note: Wrap the record literal in parentheses to disambiguate from a block.

Performance: Lambdas are compiled to native Polars expressions when possible (SIMD, parallel execution). Falls back to row-by-row VM execution for unsupported operations.

Supported (fast path):

  • Field access: r.field
  • Arithmetic: +, -, *, /, %
  • Comparisons: <, <=, >, >=, ==, !=
  • Boolean: &&, ||, not
  • Literals: 42, 3.14, "str", True, False
  • Record construction: { field = expr }
  • Case expressions with pattern matching
  • Stdlib: Maybe.withDefault, Math.abs/floor/ceil/round/sqrt, String.length/toUpper/toLower/trim/contains/startsWith/endsWith

Unsupported (slow path fallback):

  • User-defined function calls
  • List operations
  • Nested closures
  • Variable capture from outer scope
  • Tuple operations

For >1000 rows, a warning is emitted when fallback occurs.

See also: DataFrame.toRecords, DataFrame.withColumn

DataFrame.orderBy

[String] -> WindowedDataFrame -> WindowedDataFrame

Set the ordering columns for a windowed DataFrame. Required before rank, lag, lead, and rolling functions.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1 }, { dept = "Sales", date = 2 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
Try it

Notes: Ordering determines how rows are sequenced within each partition. Must be called before withRank, withDenseRank, withLag, withLead, or rolling functions.

See also: DataFrame.partitionBy, DataFrame.withRank

DataFrame.partitionBy

[String] -> DataFrame -> WindowedDataFrame

Create a windowed DataFrame partitioned by the given columns. This is the entry point for all window function operations.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ department = "Sales", salary = 50000 }, { department = "HR", salary = 60000 }]

df
    |> DataFrame.partitionBy ["department"]
Try it

Notes: Partition columns define independent groups for window calculations. Chain with orderBy, ranking, lag/lead, rolling, or cumulative functions.

See also: DataFrame.orderBy, DataFrame.collect

DataFrame.readCsvColumns

[String] -> String -> DataFrame

Read only the specified columns from a CSV file.

Example:
import DataFrame

DataFrame.readCsvColumns ["name", "age"] "data.csv"
Try it

Notes: Column projection is applied at the I/O level for efficient reading. With literal path and columns, validated at compile time.

See also: DataFrame.readCsv

DataFrame.readDta

String -> DataFrame

Read a STATA .dta file into a DataFrame with metadata. With a string literal path, column names and types are validated at compile time.

Example:
import DataFrame
DataFrame.readDta "data.dta"
Try it

Notes: Preserves variable labels, value labels, and dataset label as metadata.

See also: DataFrame.readDtaColumns, DataFrame.writeDta, DataFrame.readCsv

DataFrame.readDtaColumns

[String] -> String -> DataFrame

Read only the specified columns from a STATA .dta file.

Example:
import DataFrame
DataFrame.readDtaColumns ["var1", "var2"] "data.dta"
Try it

Notes: Reads full file then selects columns. With literal path and columns, validated at compile time.

See also: DataFrame.readDta

DataFrame.readJsonColumns

[String] -> String -> DataFrame

Read only the specified columns from a JSON file.

Example:
import DataFrame

DataFrame.readJsonColumns ["x", "y"] "data.json"
Try it

Notes: Reads full file then selects columns. With literal path and columns, validated at compile time.

See also: DataFrame.readJson

DataFrame.readParquetColumns

[String] -> String -> DataFrame

Read only the specified columns from a Parquet file.

Example:
import DataFrame

DataFrame.readParquetColumns ["id", "score"] "data.parquet"
Try it

Notes: True columnar projection — unneeded columns are never read from disk. With literal path and columns, validated at compile time.

See also: DataFrame.readParquet

DataFrame.recode

String -> [(Int, Int)] -> DataFrame -> DataFrame

Recode values in a column according to a mapping. Automatically updates value labels.

Example:
import DataFrame
import ValueLabelSet

let labels = ValueLabelSet.fromList [(1, "Low"), (2, "Medium"), (3, "High")]
let df = DataFrame.fromRecords [{ score = 1 }, { score = 2 }, { score = 3 }]
    |> DataFrame.withValueLabels "score" labels

-- Collapse categories: 1 stays 1, 2->1, 3->2
df |> DataFrame.recode "score" [(2, 1), (3, 2)]
-- Labels are automatically remapped: 1->"Low", 2->"High"
Try it

Notes: Value labels are automatically updated based on the recode mapping. Values not in the mapping remain unchanged.

See also: DataFrame.withValueLabels, DataFrame.mutate

DataFrame.removeValueLabels

String -> DataFrame -> DataFrame

Remove value labels from a column.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
    |> DataFrame.withValueLabels "gender" gender
df |> DataFrame.removeValueLabels "gender"
Try it

See also: DataFrame.withValueLabels, DataFrame.getValueLabels

DataFrame.removeVarLabel

String -> DataFrame -> DataFrame

Remove the variable label from a column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
    |> DataFrame.withVarLabel "name" "Person's name"
df |> DataFrame.removeVarLabel "name"
Try it

See also: DataFrame.withVarLabel, DataFrame.getVarLabel

DataFrame.searchVariables

String -> DataFrame -> DataFrame

Search for variables by name, label, value labels, or metadata. Returns matching variables as a DataFrame.

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ name = "Alice", gender = 1, income = 50000 }]
    |> DataFrame.withVarLabel "name" "Person's name"
    |> DataFrame.withVarLabel "income" "Annual income in USD"
    |> DataFrame.withValueLabels "gender" gender

-- Search by variable name
DataFrame.searchVariables "name" df

-- Search by label text
DataFrame.searchVariables "income" df

-- Search by value label text
DataFrame.searchVariables "Male" df
Try it

Notes: Case-insensitive substring search across all variable metadata: name, label (description), value labels, and column metadata. Returns a DataFrame with columns: name, type, label, values, metadata.

See also: DataFrame.describeVariables, DataFrame.describeMeta, DataFrame.getVarLabels

DataFrame.withColumns

[Expr] -> DataFrame -> DataFrame

Add or replace multiple columns using a list of DataFrame.Expr expressions.

Example:
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr

let df = DataFrame.fromRecords [{ x = 1, y = 10 }, { x = 2, y = 20 }]
let xDoubled = col "x" |> Expr.mul (lit 2) |> Expr.named "x_doubled"
let sumExpr = col "x" |> Expr.add (col "y") |> Expr.named "sum"

df |> DataFrame.withColumns [xDoubled, sumExpr]
Try it

Notes: Each expression should have an alias set using Expr.named. This defines the output column name.

See also: DataFrame.withColumn, DataFrame.Expr.named

DataFrame.withCumMax

String -> String -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative maximum column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withCumMax "running_max" "sales"
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Tracks the maximum value seen so far in the partition.

See also: DataFrame.withCumMin, DataFrame.withRollingMax

DataFrame.withCumMean

String -> String -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative mean column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withCumMean "running_avg" "sales"
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Computes running average over all preceding rows in the partition.

See also: DataFrame.withCumSum, DataFrame.withRollingMean

DataFrame.withCumMin

String -> String -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative minimum column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withCumMin "running_min" "sales"
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Tracks the minimum value seen so far in the partition.

See also: DataFrame.withCumMax, DataFrame.withRollingMin

DataFrame.withCumSum

String -> String -> WindowedDataFrame -> WindowedDataFrame

Add a cumulative sum column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withCumSum "running_total" "sales"
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column. Computes running total over all preceding rows in the partition.

See also: DataFrame.withCumMean, DataFrame.withRollingSum

DataFrame.withDenseRank

String -> WindowedDataFrame -> WindowedDataFrame

Add a dense rank column within each partition. Ties receive the same rank, with no gaps (e.g., 1, 2, 2, 3).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["score"]
    |> DataFrame.withDenseRank "dense_rank"
    |> DataFrame.collect
Try it

Notes: Argument is the name of the new column. Requires orderBy to be set first.

See also: DataFrame.withRank, DataFrame.withRowNumber, DataFrame.orderBy

DataFrame.withDisplayMode

String -> String -> DataFrame -> DataFrame

Set how a column's values should be displayed (Raw, Labeled, or Both).

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ gender = 1 }]
    |> DataFrame.withValueLabels "gender" gender
    |> DataFrame.withDisplayMode "gender" "Labeled"
Try it

Notes: Display modes: "Raw" shows only the value, "Labeled" shows only the label, "Both" (default) shows "value (label)".

See also: DataFrame.getDisplayMode, DataFrame.withValueLabels

DataFrame.withLag

String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a column with the value from a previous row within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withLag "prev_sales" "sales" 1
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, offset (number of rows back). Produces Nothing for rows without a previous value. Requires orderBy.

See also: DataFrame.withLead, DataFrame.orderBy

DataFrame.withLead

String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a column with the value from a subsequent row within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withLead "next_sales" "sales" 1
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, offset (number of rows forward). Produces Nothing for rows without a subsequent value. Requires orderBy.

See also: DataFrame.withLag, DataFrame.orderBy

DataFrame.withRank

String -> WindowedDataFrame -> WindowedDataFrame

Add a rank column within each partition. Ties receive the same rank, with gaps after ties (e.g., 1, 2, 2, 4).

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", score = 90 }, { dept = "Sales", score = 85 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["score"]
    |> DataFrame.withRank "rank"
    |> DataFrame.collect
Try it

Notes: Argument is the name of the new column. Requires orderBy to be set first.

See also: DataFrame.withDenseRank, DataFrame.withRowNumber, DataFrame.orderBy

DataFrame.withRollingMax

String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling maximum column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withRollingMax "max_3d" "sales" 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingMin, DataFrame.withCumMax

DataFrame.withRollingMean

String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling mean column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withRollingMean "avg_3d" "sales" 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingSum, DataFrame.withCumMean

DataFrame.withRollingMin

String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling minimum column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withRollingMin "min_3d" "sales" 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingMax, DataFrame.withCumMin

DataFrame.withRollingSum

String -> String -> Int -> WindowedDataFrame -> WindowedDataFrame

Add a rolling sum column computed over a fixed-size window within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", date = 1, sales = 100 }, { dept = "Sales", date = 2, sales = 200 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.orderBy ["date"]
    |> DataFrame.withRollingSum "sum_3d" "sales" 3
    |> DataFrame.collect
Try it

Notes: Args: new column name, source column, window size. Window includes the current row and preceding rows. Requires orderBy.

See also: DataFrame.withRollingMean, DataFrame.withCumSum

DataFrame.withRowNumber

String -> WindowedDataFrame -> WindowedDataFrame

Add a sequential row number column within each partition.

Example:
import DataFrame

let df =
    DataFrame.fromRecords [{ dept = "Sales", val = 1 }, { dept = "Sales", val = 2 }]

df
    |> DataFrame.partitionBy ["dept"]
    |> DataFrame.withRowNumber "row_num"
    |> DataFrame.collect
Try it

Notes: Argument is the name of the new column. Row numbers start at 1. Does not require orderBy.

See also: DataFrame.withRank, DataFrame.withDenseRank

DataFrame.withValueLabels

String -> ValueLabelSet -> DataFrame -> DataFrame

Attach value labels to a column (lenient - allows unlabeled values).

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
df |> DataFrame.withValueLabels "gender" gender
Try it

Notes: Value labels map numeric codes to human-readable labels. Use withValueLabelsStrict for exhaustive validation.

See also: DataFrame.withValueLabelsStrict, DataFrame.getValueLabels

DataFrame.withValueLabelsStrict

String -> ValueLabelSet -> DataFrame -> DataFrame

Attach value labels to a column (strict - all values must have labels).

Example:
import DataFrame
import ValueLabelSet
let gender = ValueLabelSet.fromList [(1, "Male"), (2, "Female")]
let df = DataFrame.fromRecords [{ id = 1, gender = 1 }]
df |> DataFrame.withValueLabelsStrict "gender" gender
Try it

Notes: Returns an error if any value in the column doesn't have a corresponding label.

See also: DataFrame.withValueLabels, DataFrame.getValueLabels

DataFrame.withVarLabel

String -> String -> DataFrame -> DataFrame

Set a variable label (description) for a column.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice", age = 30 }]
df |> DataFrame.withVarLabel "name" "Person's full name"
Try it

Notes: Variable labels describe what a column represents. They are preserved in STATA files.

See also: DataFrame.getVarLabel, DataFrame.removeVarLabel

DataFrame.writeDta

String -> DataFrame -> Unit

Write a DataFrame to a STATA .dta file with metadata.

Example:
import DataFrame
let df = DataFrame.fromRecords [{ name = "Alice" }]
df |> DataFrame.writeDta "data.dta"
Try it

Notes: Preserves variable labels, value labels, and dataset label from metadata.

See also: DataFrame.readDta, DataFrame.writeCsv