DataFrame.Expr Module
Composable column expressions for DataFrame operations.
The DataFrame.Expr module provides a functional, pipe-friendly API for building column expressions that compile directly to Polars with SIMD optimization and parallel execution. Use expressions with DataFrame.withColumns, DataFrame.filterExpr, and DataFrame.aggExprs.
Why Expressions?
- Performance: Expressions compile directly to Polars — always fast, no fallback
- Composability: Expressions are values that can be bound, passed, and composed
- Window functions: Impossible with closures, natural with expressions
- Aggregations: Sum, mean, count as composable operations
Common patterns
import DataFrame
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Add computed columns
df |> DataFrame.withColumns
[ col "price" |> Expr.mul (col "qty") |> Expr.named "total"
, col "price" |> Expr.mul (lit 1.1) |> Expr.named "with_tax"
]
-- Filter with expressions
df |> DataFrame.filterExpr (col "status" |> Expr.eq (lit "active"))
-- Conditional logic
Expr.cond
[ (col "age" |> Expr.lt (lit 18), lit "minor")
, (col "age" |> Expr.lt (lit 65), lit "adult")
] (lit "senior")
-- Window functions
col "sales" |> Expr.sum |> Expr.over ["region"] |> Expr.named "region_total"
Aggregation with groupBy
df
|> DataFrame.groupBy ["department"]
|> DataFrame.aggExprs
[ col "salary" |> Expr.mean |> Expr.named "avg_salary"
, col "id" |> Expr.count |> Expr.named "employee_count"
]
Functions
Constructors
DataFrame.Expr.col
String -> Expr
Reference a DataFrame column by name. This is the primary way to start building expressions.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Reference a single column
col "name"
-- Use in arithmetic
col "price" |> Expr.mul (col "quantity")
-- Use in comparisons
col "age" |> Expr.gte (lit 18)Try itNotes: Column names are case-sensitive and must exactly match the DataFrame column names.
See also: lit, named
DataFrame.Expr.lit
a -> Expr
Create a literal (constant) expression from a value. Supports Int, Float, String, Bool, and Unit (null).
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Integer literal
lit 42
-- Float literal
lit 3.14159
-- String literal
lit "active"
-- Boolean literal
lit True
-- Use in expressions
col "price" |> Expr.mul (lit 1.1) -- 10% markup
col "status" |> Expr.eq (lit "active")Try itNotes: Unit values become SQL NULL. Use lit for constants in expressions rather than hardcoding values.
See also: col
DataFrame.Expr.named
String -> Expr -> Expr
Assign a name (alias) to an expression's output column. Required when using withColumns to define the result column name.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Name the output of a computation
col "price" |> Expr.mul (col "qty") |> Expr.named "total"
-- Multiple named expressions
df |> DataFrame.withColumns
[ col "a" |> Expr.add (col "b") |> Expr.named "sum_ab"
, col "a" |> Expr.mul (col "b") |> Expr.named "product_ab"
]Try itNotes: Column name cannot be empty. The alias only affects the output column name, not the expression itself.
See also: col, DataFrame.withColumns
Arithmetic
DataFrame.Expr.add
Expr -> Expr -> Expr
Add two expressions element-wise. Works with numeric columns (Int, Float).
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Add two columns
col "a" |> Expr.add (col "b")
-- Add a constant
col "price" |> Expr.add (lit 10)
-- Chain operations
col "a" |> Expr.add (col "b") |> Expr.add (col "c")Try itNotes: Follows pipe convention: lhs |> add rhs = lhs + rhs. Type coercion follows Polars rules (Int + Float = Float).
See also: sub, mul, div
DataFrame.Expr.sub
Expr -> Expr -> Expr
Subtract two expressions element-wise (lhs - rhs).
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Column difference
col "revenue" |> Expr.sub (col "cost")
-- Subtract a constant
col "score" |> Expr.sub (lit 5)Try itNotes: Follows pipe convention: lhs |> sub rhs = lhs - rhs.
See also: add, mul, div
DataFrame.Expr.mul
Expr -> Expr -> Expr
Multiply two expressions element-wise.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Calculate total
col "price" |> Expr.mul (col "quantity")
-- Apply percentage
col "salary" |> Expr.mul (lit 1.05) -- 5% raise
-- Named result
col "hours" |> Expr.mul (col "rate") |> Expr.named "pay"Try itNotes: Follows pipe convention: lhs |> mul rhs = lhs * rhs.
See also: add, sub, div, pow
DataFrame.Expr.div
Expr -> Expr -> Expr
Divide two expressions element-wise (lhs / rhs). Returns Float for integer division.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Calculate ratio
col "completed" |> Expr.div (col "total")
-- Per-unit value
col "total_cost" |> Expr.div (col "quantity")
-- Normalize (0-1 range)
col "value" |> Expr.div (col "max_value")Try itNotes: Division by zero returns null (not an error). Integer division produces Float result.
See also: mul, mod
DataFrame.Expr.mod
Expr -> Expr -> Expr
Modulo (remainder) of two expressions (lhs % rhs).
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Check if even
col "n" |> Expr.mod (lit 2) |> Expr.eq (lit 0)
-- Get last digit
col "id" |> Expr.mod (lit 10)Try itNotes: Result has the same sign as the dividend (lhs).
See also: div
DataFrame.Expr.pow
Expr -> Expr -> Expr
Raise base to exponent power (base ^ exp).
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Square a column
col "x" |> Expr.pow (lit 2)
-- Cube root (exponent 1/3)
col "volume" |> Expr.pow (lit 0.333333)
-- Compound interest
col "principal" |> Expr.mul (lit 1.05 |> Expr.pow (col "years"))Try itNotes: Follows pipe convention: base |> pow exp = base ^ exp.
See also: sqrt, mul
Comparison
DataFrame.Expr.eq
Expr -> Expr -> Expr
Test equality (==). Returns a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Filter by status
col "status" |> Expr.eq (lit "active")
-- Compare columns
col "actual" |> Expr.eq (col "expected")
-- Use with filterExpr
df |> DataFrame.filterExpr (col "country" |> Expr.eq (lit "USA"))Try itNotes: Null values: null == null returns null, not True. Use isNull for null checks.
See also: neq, gt, lt, isNull
DataFrame.Expr.neq
Expr -> Expr -> Expr
Test inequality (!=). Returns a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Exclude a status
col "status" |> Expr.neq (lit "deleted")
-- Filter non-matching
df |> DataFrame.filterExpr (col "type" |> Expr.neq (lit "test"))Try itNotes: Null values: null != value returns null, not True.
See also: eq, isNotNull
DataFrame.Expr.gt
Expr -> Expr -> Expr
Greater than comparison (>). Returns a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Age filter
col "age" |> Expr.gt (lit 18)
-- Compare columns
col "revenue" |> Expr.gt (col "cost")
-- Chain with boolean ops
col "score" |> Expr.gt (lit 90) |> Expr.and (col "passed" |> Expr.eq (lit True))Try itNotes: Follows pipe convention: lhs |> gt rhs = lhs > rhs.
See also: gte, lt, lte
DataFrame.Expr.gte
Expr -> Expr -> Expr
Greater than or equal comparison (>=). Returns a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Minimum threshold
col "quantity" |> Expr.gte (lit 10)
-- Date comparison
col "year" |> Expr.gte (lit 2020)Try itNotes: Follows pipe convention: lhs |> gte rhs = lhs >= rhs.
See also: gt, lte
DataFrame.Expr.lt
Expr -> Expr -> Expr
Less than comparison (<). Returns a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Below threshold
col "temperature" |> Expr.lt (lit 0)
-- Range check (combine with gt)
let inRange = col "x" |> Expr.gt (lit 0) |> Expr.and (col "x" |> Expr.lt (lit 100))Try itNotes: Follows pipe convention: lhs |> lt rhs = lhs < rhs.
See also: lte, gt, gte
DataFrame.Expr.lte
Expr -> Expr -> Expr
Less than or equal comparison (<=). Returns a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Maximum threshold
col "price" |> Expr.lte (lit 100)
-- Cohort bucketing with cond
Expr.cond
[ (col "age" |> Expr.lte (lit 17), lit "minor")
, (col "age" |> Expr.lte (lit 64), lit "adult")
]
(lit "senior")Try itNotes: Follows pipe convention: lhs |> lte rhs = lhs <= rhs.
See also: lt, gte
Boolean
DataFrame.Expr.and
Expr -> Expr -> Expr
Logical AND of two boolean expressions. Both must be true for result to be true.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Combine conditions
let isActiveAdult =
col "age" |> Expr.gte (lit 18)
|> Expr.and (col "status" |> Expr.eq (lit "active"))
-- Multiple conditions
col "a" |> Expr.gt (lit 0)
|> Expr.and (col "b" |> Expr.gt (lit 0))
|> Expr.and (col "c" |> Expr.gt (lit 0))Try itNotes: Short-circuit evaluation is not guaranteed. Null AND True = Null, Null AND False = False.
See also: or, not
DataFrame.Expr.or
Expr -> Expr -> Expr
Logical OR of two boolean expressions. Either being true makes result true.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Either condition
let isSpecial =
col "status" |> Expr.eq (lit "vip")
|> Expr.or (col "status" |> Expr.eq (lit "admin"))
-- Fallback check
col "primary_email" |> Expr.isNotNull
|> Expr.or (col "backup_email" |> Expr.isNotNull)Try itNotes: Null OR True = True, Null OR False = Null.
See also: and, not
DataFrame.Expr.not
Expr -> Expr
Logical NOT (negation) of a boolean expression.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Negate a condition
col "is_deleted" |> Expr.not
-- Filter for NOT matching
df |> DataFrame.filterExpr (col "status" |> Expr.eq (lit "spam") |> Expr.not)Try itNotes: NOT Null = Null.
See also: and, or
Aggregation
DataFrame.Expr.sum
Expr -> Expr
Sum of all values in a column. Use with groupBy/agg for group-wise sums.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Total sum
col "amount" |> Expr.sum |> Expr.named "total_amount"
-- Group-wise sum
df
|> DataFrame.groupBy ["category"]
|> DataFrame.aggExprs [col "sales" |> Expr.sum |> Expr.named "total_sales"]
-- Window sum
col "value" |> Expr.sum |> Expr.over ["group_id"] |> Expr.named "group_total"Try itNotes: Null values are ignored (not treated as 0). Returns null for empty groups.
See also: mean, count, over
DataFrame.Expr.mean
Expr -> Expr
Arithmetic mean (average) of values. Returns Float.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Overall average
col "score" |> Expr.mean |> Expr.named "avg_score"
-- Group average
df
|> DataFrame.groupBy ["department"]
|> DataFrame.aggExprs [col "salary" |> Expr.mean |> Expr.named "avg_salary"]Try itNotes: Null values are excluded from both numerator and count. Empty groups return null.
See also: sum, median, std
DataFrame.Expr.min
Expr -> Expr
Minimum value in a column. Works with numeric, string, and date types.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Find minimum
col "price" |> Expr.min |> Expr.named "lowest_price"
-- Group minimum
df
|> DataFrame.groupBy ["product"]
|> DataFrame.aggExprs [col "date" |> Expr.min |> Expr.named "first_sale"]Try itNotes: Null values are ignored. Returns null for empty groups.
See also: max, first
DataFrame.Expr.max
Expr -> Expr
Maximum value in a column. Works with numeric, string, and date types.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Find maximum
col "temperature" |> Expr.max |> Expr.named "peak_temp"
-- Group maximum
df
|> DataFrame.groupBy ["user_id"]
|> DataFrame.aggExprs [col "login_time" |> Expr.max |> Expr.named "last_login"]Try itNotes: Null values are ignored. Returns null for empty groups.
See also: min, last
DataFrame.Expr.count
Expr -> Expr
Count of non-null values in a column.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Count non-null values
col "email" |> Expr.count |> Expr.named "emails_provided"
-- Group counts
df
|> DataFrame.groupBy ["status"]
|> DataFrame.aggExprs [col "id" |> Expr.count |> Expr.named "n"]Try itNotes: Counts non-null values only. For total rows including nulls, count a non-nullable column like id.
See also: sum, first, last
DataFrame.Expr.first
Expr -> Expr
First value in a group. Order depends on the DataFrame's current row order.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Get first value after sorting
df
|> DataFrame.sort "date"
|> DataFrame.groupBy ["customer"]
|> DataFrame.aggExprs [col "order_id" |> Expr.first |> Expr.named "first_order"]Try itNotes: Returns first non-null value. Sort the DataFrame first if you need a specific ordering.
See also: last, min
DataFrame.Expr.last
Expr -> Expr
Last value in a group. Order depends on the DataFrame's current row order.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Get last value after sorting
df
|> DataFrame.sort "timestamp"
|> DataFrame.groupBy ["user"]
|> DataFrame.aggExprs [col "action" |> Expr.last |> Expr.named "last_action"]Try itNotes: Returns last non-null value. Sort the DataFrame first if you need a specific ordering.
See also: first, max
DataFrame.Expr.std
Expr -> Expr
Sample standard deviation (with Bessel's correction, ddof=1).
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Calculate spread
col "score" |> Expr.std |> Expr.named "score_stddev"
-- Group variation
df
|> DataFrame.groupBy ["treatment"]
|> DataFrame.aggExprs [col "response" |> Expr.std |> Expr.named "response_std"]Try itNotes: Uses ddof=1 (sample standard deviation). Requires at least 2 values.
See also: var, mean
DataFrame.Expr.var
Expr -> Expr
Sample variance (with Bessel's correction, ddof=1).
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Calculate variance
col "measurement" |> Expr.var |> Expr.named "measurement_var"Try itNotes: Uses ddof=1 (sample variance). Variance = std^2.
See also: std, mean
DataFrame.Expr.median
Expr -> Expr
Median (50th percentile) of values. More robust to outliers than mean.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Robust central tendency
col "income" |> Expr.median |> Expr.named "median_income"
-- Compare mean vs median
df
|> DataFrame.groupBy ["region"]
|> DataFrame.aggExprs
[ col "price" |> Expr.mean |> Expr.named "mean_price"
, col "price" |> Expr.median |> Expr.named "median_price"
]Try itNotes: For even-length groups, returns average of the two middle values.
See also: mean, min, max
String
DataFrame.Expr.strLength
Expr -> Expr
Length of string in characters (not bytes). Returns Int.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- String length
col "name" |> Expr.strLength |> Expr.named "name_len"
-- Filter by length
df |> DataFrame.filterExpr (col "code" |> Expr.strLength |> Expr.eq (lit 5))Try itNotes: Counts Unicode characters, not bytes. Null strings return null.
See also: strUpper, strLower, strTrim
DataFrame.Expr.strUpper
Expr -> Expr
Convert string to uppercase.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Normalize to uppercase
col "country_code" |> Expr.strUpper |> Expr.named "country_code_upper"
-- Case-insensitive comparison
col "status" |> Expr.strUpper |> Expr.eq (lit "ACTIVE")Try itNotes: Uses Unicode case mapping rules.
See also: strLower
DataFrame.Expr.strLower
Expr -> Expr
Convert string to lowercase.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Normalize to lowercase
col "email" |> Expr.strLower |> Expr.named "email_normalized"Try itNotes: Uses Unicode case mapping rules.
See also: strUpper
DataFrame.Expr.strTrim
Expr -> Expr
Remove leading and trailing whitespace from string.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Clean up user input
col "user_input" |> Expr.strTrim |> Expr.named "cleaned_input"Try itNotes: Removes spaces, tabs, newlines, and other Unicode whitespace.
See also: strReplace
DataFrame.Expr.strContains
String -> Expr -> Expr
Check if string contains the given pattern. Returns boolean.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Check for substring
col "email" |> Expr.strContains "@gmail.com"
-- Filter emails
df |> DataFrame.filterExpr (col "email" |> Expr.strContains "@company.com")Try itNotes: Literal string matching (not regex). Case-sensitive.
See also: strStartsWith, strEndsWith
DataFrame.Expr.strStartsWith
String -> Expr -> Expr
Check if string starts with the given prefix. Returns boolean.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Check prefix
col "phone" |> Expr.strStartsWith "+1"
-- Filter by title
df |> DataFrame.filterExpr (col "name" |> Expr.strStartsWith "Dr.")Try itNotes: Case-sensitive comparison.
See also: strEndsWith, strContains
DataFrame.Expr.strEndsWith
String -> Expr -> Expr
Check if string ends with the given suffix. Returns boolean.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Check file extension
col "filename" |> Expr.strEndsWith ".csv"
-- Filter by domain
df |> DataFrame.filterExpr (col "url" |> Expr.strEndsWith ".org")Try itNotes: Case-sensitive comparison.
See also: strStartsWith, strContains
DataFrame.Expr.strReplace
String -> String -> Expr -> Expr
Replace first occurrence of a pattern with replacement string.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Replace substring
col "text" |> Expr.strReplace "old" "new"
-- Remove prefix
col "id" |> Expr.strReplace "ID_" ""Try itNotes: Only replaces the first occurrence. Pattern is literal (not regex).
See also: strTrim
Math
DataFrame.Expr.abs
Expr -> Expr
Absolute value. Works with Int and Float.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Absolute difference
col "actual" |> Expr.sub (col "predicted") |> Expr.abs |> Expr.named "abs_error"Try itNotes: Returns same type as input.
See also: sqrt, round
DataFrame.Expr.sqrt
Expr -> Expr
Square root. Returns Float.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Calculate RMSE component
col "squared_error" |> Expr.sqrt
-- Distance calculation
col "x" |> Expr.pow (lit 2)
|> Expr.add (col "y" |> Expr.pow (lit 2))
|> Expr.sqrt
|> Expr.named "distance"Try itNotes: Negative values return NaN, not an error.
See also: pow, abs
DataFrame.Expr.floor
Expr -> Expr
Round down to nearest integer (toward negative infinity).
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Truncate to integer
col "price" |> Expr.floor |> Expr.named "price_floor"
-- floor(2.7) = 2, floor(-2.3) = -3Try itNotes: Returns Float type, not Int. Use for rounding, not type conversion.
See also: ceil, round
DataFrame.Expr.ceil
Expr -> Expr
Round up to nearest integer (toward positive infinity).
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Round up
col "quantity" |> Expr.ceil |> Expr.named "quantity_ceil"
-- ceil(2.1) = 3, ceil(-2.7) = -2Try itNotes: Returns Float type, not Int.
See also: floor, round
DataFrame.Expr.round
Int -> Expr -> Expr
Round to specified number of decimal places.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Round to 2 decimal places
col "price" |> Expr.round 2 |> Expr.named "price_rounded"
-- Round to whole number
col "average" |> Expr.round 0Try itNotes: Uses banker's rounding (round half to even). Negative decimals round to tens, hundreds, etc.
See also: floor, ceil
Null
DataFrame.Expr.fillNull
Expr -> Expr -> Expr
Replace null values with a default value.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Fill with constant
col "score" |> Expr.fillNull (lit 0)
-- Fill with another column
col "nickname" |> Expr.fillNull (col "name")
-- Chain to handle multiple fallbacks
col "preferred_email"
|> Expr.fillNull (col "work_email")
|> Expr.fillNull (col "personal_email")Try itNotes: The default expression is only evaluated for null values.
See also: isNull, isNotNull
DataFrame.Expr.isNull
Expr -> Expr
Check if value is null. Returns boolean.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Find missing values
col "email" |> Expr.isNull
-- Filter for nulls
df |> DataFrame.filterExpr (col "deleted_at" |> Expr.isNull)
-- Count nulls
col "score" |> Expr.isNull |> Expr.sum |> Expr.named "missing_count"Try itNotes: Null represents missing data. Use isNull instead of eq(lit null).
See also: isNotNull, fillNull
DataFrame.Expr.isNotNull
Expr -> Expr
Check if value is not null. Returns boolean.
import DataFrame.Expr exposing (col)
import DataFrame.Expr as Expr
-- Find present values
col "email" |> Expr.isNotNull
-- Filter for non-nulls
df |> DataFrame.filterExpr (col "verified_at" |> Expr.isNotNull)Try itNotes: Equivalent to isNull |> not, but more readable.
See also: isNull, fillNull
Conditional
DataFrame.Expr.cond
[(Expr, Expr)] -> Expr -> Expr
Multi-branch conditional expression (like SQL CASE WHEN). Takes a list of (condition, result) pairs and a default value.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Age categories
let ageGroup = Expr.cond
[ (col "age" |> Expr.lt (lit 18), lit "minor")
, (col "age" |> Expr.lt (lit 65), lit "adult")
]
(lit "senior")
|> Expr.named "age_group"
-- Numeric bucketing
let cohort = Expr.cond
[ (col "year" |> Expr.lte (lit 1949), lit 1)
, (col "year" |> Expr.lte (lit 1959), lit 2)
, (col "year" |> Expr.lte (lit 1969), lit 3)
]
(lit 4)
|> Expr.named "cohort"
-- Use with withColumns
df |> DataFrame.withColumns [ageGroup, cohort]Try itNotes: Conditions are evaluated in order; first match wins. The default is required and used when no conditions match.
See also: and, or, eq
Window
DataFrame.Expr.over
[String] -> Expr -> Expr
Apply an expression as a window function over partition columns. Enables row-level access to aggregated values.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Running total per group
col "amount" |> Expr.sum |> Expr.over ["customer_id"] |> Expr.named "customer_total"
-- Percentage of group
col "sales"
|> Expr.div (col "sales" |> Expr.sum |> Expr.over ["region"])
|> Expr.mul (lit 100)
|> Expr.named "pct_of_region"
-- Multiple partitions
col "value" |> Expr.mean |> Expr.over ["year", "category"] |> Expr.named "avg_by_year_cat"
-- Global window (no partitions)
col "score" |> Expr.mean |> Expr.over [] |> Expr.named "global_avg"Try itNotes: Empty partition list [] means global window (entire DataFrame). Results are broadcast back to each row.
See also: sum, mean, rowNumber, lag, lead
DataFrame.Expr.rowNumber
Expr
Assign sequential row numbers within partitions (1-based). Use with over to partition.
import DataFrame.Expr as Expr
-- Row number within groups
Expr.rowNumber |> Expr.over ["customer_id"] |> Expr.named "order_seq"
-- Global row number
Expr.rowNumber |> Expr.over [] |> Expr.named "row_num"
-- Get first row per group (filter where rowNumber == 1)
df
|> DataFrame.withColumns [Expr.rowNumber |> Expr.over ["group"] |> Expr.named "rn"]
|> DataFrame.filterExpr (col "rn" |> Expr.eq (lit 1))Try itNotes: Starts at 1. Order depends on current DataFrame sort order.
See also: rank, denseRank, over
DataFrame.Expr.rank
Expr
Rank values with gaps for ties. Ties get the same rank; next rank skips accordingly.
import DataFrame.Expr as Expr
-- Rank with gaps: [1, 2, 2, 4] for values [10, 20, 20, 30]
Expr.rank |> Expr.over ["department"] |> Expr.named "sales_rank"Try itNotes: Use with over for partitioned ranking. Sort the DataFrame first to control rank ordering.
See also: denseRank, rowNumber
DataFrame.Expr.denseRank
Expr
Rank values without gaps for ties. Consecutive ranks even when ties exist.
import DataFrame.Expr as Expr
-- Dense rank: [1, 2, 2, 3] for values [10, 20, 20, 30]
Expr.denseRank |> Expr.over ["category"] |> Expr.named "price_rank"Try itNotes: Unlike rank, dense_rank doesn't skip numbers after ties.
See also: rank, rowNumber
DataFrame.Expr.lag
Int -> Expr -> Expr
Get value from n rows before the current row. Useful for comparing to previous values.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Previous row's value
col "price" |> Expr.lag 1 |> Expr.over ["stock"] |> Expr.named "prev_price"
-- Calculate change from previous
col "value" |> Expr.sub (col "value" |> Expr.lag 1 |> Expr.over [])
|> Expr.named "change"
-- Look back 7 periods
col "sales" |> Expr.lag 7 |> Expr.over [] |> Expr.named "sales_last_week"Try itNotes: Returns null for rows without enough history (first n rows). Sort first for meaningful order.
See also: lead, over
DataFrame.Expr.lead
Int -> Expr -> Expr
Get value from n rows after the current row. Useful for comparing to future values.
import DataFrame.Expr exposing (col, lit)
import DataFrame.Expr as Expr
-- Next row's value
col "price" |> Expr.lead 1 |> Expr.over ["stock"] |> Expr.named "next_price"
-- Days until next event
col "event_date" |> Expr.lead 1 |> Expr.over ["user"]
|> Expr.sub (col "event_date")
|> Expr.named "days_to_next"Try itNotes: Returns null for rows without enough future values (last n rows). Sort first for meaningful order.
See also: lag, over