Esc
Start typing to search...

DataFrame Expressions

The DataFrame.Expr module provides composable, type-safe column expressions that compile directly to Polars operations. Unlike closures (which may fall back to slow row-by-row evaluation), expressions always use Polars' optimized SIMD and parallel execution.

Getting Started

Import the Expr module with an alias for concise usage:

import DataFrame
import DataFrame.Expr as Expr

Column References and Literals

Build expressions from column references and literal values:

-- tags: dataframe, expr, expressions
-- expect: ["name", "revenue", "double_revenue"]
-- DataFrame.Expr for composable column operations
import DataFrame
import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "Alice", revenue = 100 }
    , { name = "Bob", revenue = 200 }
    ]
    |> DataFrame.selectExpr
        [ Expr.col "name"
        , Expr.col "revenue"
        , Expr.col "revenue"
            |> Expr.mul (Expr.lit 2)
            |> Expr.named "double_revenue"
        ]
    |> DataFrame.columns
Try it
  • Expr.col "name" references a column by name
  • Expr.lit value creates a constant expression from an Int, Float, or String
  • Expr.named "alias" expr renames the output column

Arithmetic

Combine expressions with arithmetic operators:

import DataFrame.Expr as Expr

-- Column arithmetic
Expr.col "price" |> Expr.mul (Expr.col "quantity")

-- Mixed column and literal
Expr.col "score" |> Expr.add (Expr.lit 10)

Available: add, sub, mul, div, mod, pow.

Comparison and Boolean Logic

import DataFrame.Expr as Expr

-- Filter-style expressions
Expr.col "age" |> Expr.gte (Expr.lit 18)

-- Combine with boolean logic
let isAdult = Expr.col "age" |> Expr.gte (Expr.lit 18)
let isActive = Expr.col "status" |> Expr.eq (Expr.lit "active")
Expr.and isAdult isActive

Comparison: eq, neq, gt, gte, lt, lte. Boolean: and, or, not.

Conditional Expressions

Use cond for if-then-else logic:

-- tags: dataframe, expr, conditional
-- expect: ["name", "score", "grade"]
-- Conditional expressions with DataFrame.Expr
import DataFrame
import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "Alice", score = 95 }
    , { name = "Bob", score = 72 }
    , { name = "Carol", score = 88 }
    ]
    |> DataFrame.selectExpr
        [ Expr.col "name"
        , Expr.col "score"
        , Expr.cond
            (Expr.col "score" |> Expr.gte (Expr.lit 90))
            (Expr.lit "A")
            (Expr.lit "B")
            |> Expr.named "grade"
        ]
    |> DataFrame.columns
Try it

Aggregations

Reduce columns to summary values:

-- tags: dataframe, expr, aggregation
-- expect: ["total", "average"]
-- Aggregation with DataFrame.Expr
import DataFrame
import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { value = 10 }
    , { value = 20 }
    , { value = 30 }
    ]
    |> DataFrame.selectExpr
        [ Expr.col "value" |> Expr.sum |> Expr.named "total"
        , Expr.col "value" |> Expr.mean |> Expr.named "average"
        ]
    |> DataFrame.columns
Try it

Available: sum, mean, min, max, count, first, last, std, var, median.

String Operations

Transform string columns:

-- tags: dataframe, expr, string
-- expect: ["name", "upper_name"]
-- String operations with DataFrame.Expr
import DataFrame
import DataFrame.Expr as Expr

DataFrame.fromRecords
    [ { name = "alice" }
    , { name = "bob" }
    ]
    |> DataFrame.selectExpr
        [ Expr.col "name"
        , Expr.col "name"
            |> Expr.strUpper
            |> Expr.named "upper_name"
        ]
    |> DataFrame.columns
Try it

Available: strLength, strUpper, strLower, strTrim, strContains, strStartsWith, strEndsWith, strReplace.

Math Functions

import DataFrame.Expr as Expr

Expr.col "value" |> Expr.abs
Expr.col "value" |> Expr.sqrt
Expr.col "value" |> Expr.round

Available: abs, sqrt, floor, ceil, round.

Null Handling

import DataFrame.Expr as Expr

-- Replace nulls with a default
Expr.col "score" |> Expr.fillNull (Expr.lit 0)

-- Check for nulls
Expr.col "email" |> Expr.isNull
Expr.col "email" |> Expr.isNotNull

Window Functions

Apply expressions over partitions (SQL-style window functions):

import DataFrame.Expr as Expr

-- Running sum per group
Expr.col "revenue" |> Expr.sum |> Expr.over ["region"]

-- Ranking within groups
Expr.col "score" |> Expr.rank |> Expr.over ["department"]
Expr.col "score" |> Expr.denseRank |> Expr.over ["department"]

-- Access previous/next rows
Expr.col "value" |> Expr.lag 1   -- previous row's value
Expr.col "value" |> Expr.lead 1  -- next row's value

When to Use Expressions vs Closures

Use Expressions WhenUse Closures When
Column arithmetic and comparisonsComplex logic needing full language features
Aggregations and window functionsPattern matching on values
String transformations on columnsCalling other Keel functions
Performance is criticalPrototyping or one-off transforms

Expressions compile to native Polars operations and benefit from SIMD vectorization, parallel execution, and query optimization. Use them for performance-critical data pipelines.

Next Steps

See the DataFrame stdlib page for the complete function reference, including how expressions integrate with selectExpr, filterExpr, and other DataFrame operations.