AQLearn 2.5 / Concept

Pipe (2) and Source Table

Source lesson: AQLearn 2.5: Pipe (2). Tangible win: see the pipe as "first-argument sugar", and predict how the source table changes a metric's meaning.

Purpose of this lesson

Have you ever tried to answer "average product price among ordered items in 2022" and gotten a different number than expected?

You hit these pain points:

Two queries look almost identical (products | avg(products.price) vs order_items | avg(products.price)), but the numbers differ. You cannot explain why on a call.
You read a long pipeline and lose track of which model AQL is iterating over.
You want to read deeply nested AQL but the nesting is intimidating; the pipe and the function-call form look like different languages.

This lesson solves it: pipe is just first-argument sugar, and the left-side input is the source table that defines which rows the function iterates over. Same expression, different source, different number, by design.

Why this matters for your mission: "wrong number" debates almost always trace to a source-table mismatch. Knowing where the iteration happens lets you defend or fix the answer in one sentence.

Pipe is first-argument sugar

The pipe character does one thing: it takes the value on its left and uses it as the first argument of the function on its right.

left | f(arg2, arg3)
// is exactly the same as
f(left, arg2, arg3)

The pipe and the comma are two ways to write the same thing. Pipe just makes long pipelines readable.

De-sugaring exercise

Practice rewriting pipe form as function form.

Pipe form

products
  | select(double: products.price * 2)
  | select(quadruple: double * 2)

Function form

select(
  select(products, double: products.price * 2),
  quadruple: double * 2,
)

Read right-to-left: the outermost call is the last step in the pipeline. Each previous step is the first argument of the next.

The deeper idea: source table semantics

The left-of-pipe value is more than a parameter. It is the source table. The function:

iterates over the rows of the source table,
can reach fields in other models through many-to-one relationships from the source,
and aggregates with the source as the row counter.

Change the source table and you change what each iteration counts. Two metrics with the same body can return very different numbers.

Worked comparison: products vs order_items as source

Both pipelines compute avg(products.price) after filtering to 2022. The numbers will differ. Why?

Source = order_items

order_items
  | filter(orders.created_at matches @2022)
  | avg(products.price)

Iterates over each order item. Looks up the product's price for each item. Average gives extra weight to products that were ordered more often.

Reading: "average product price weighted by how often each product appears in order items in 2022".

Source = products

products
  | filter(orders.created_at matches @2022)
  | avg(products.price)

Iterates over each distinct product (filtered to those with 2022 orders). Each product contributes once. Pure unweighted average across the distinct ordered products.

Reading: "average price of distinct products ordered in 2022".

SQL parallel

In SQL the source table is whatever is in FROM. The same observations apply: counts, averages, and sums are weighted by the row count of the source.

-- order_items source
select avg(products.price)
from order_items
left join orders on order_items.order_id = orders.id
left join products on order_items.product_id = products.id
where date_trunc('year', orders.created_at) = '2022-01-01'

-- products source
select avg(products.price)
from products
where exists (
  select 1
  from order_items
  left join orders on order_items.order_id = orders.id
  where order_items.product_id = products.id
    and date_trunc('year', orders.created_at) = '2022-01-01'
)

Cross-model aggregation: `products | sum(order_items.quantity)`

The source is products. The function reaches into order_items through the relationship order_items.product_id > products.id. AQL iterates per product, summing the quantity across that product's items.

explore {
  dimensions {
    products.name,
  }
  measures {
    items_sold: products | sum(order_items.quantity),
  }
}

One row per product, total quantity ordered for that product. Switching the source to order_items would change "iterate per item" semantics; the dimension grain dictates the rest.

Dataset diagram

Source-table choice determines which model is the "iterator" and which fields are reachable.

Source A (order_items) iterates per item, reaching into products and orders via N:1. Source B (products) iterates per distinct product. Same aggregate body, different number.

primary key foreign key

Customer-safe explanation: "AQL averages over the rows of the source table. Pick the wrong source and you weight your numbers wrong, even if the formula looks right."

Retrieval practice

1. The pipe x | f(y) is syntactic sugar for... f(x, y) (x as first argument) f(y, x) (x as last argument) two separate function calls

2. An aggregate iterates over the rows of... the source table on the left of the pipe any related model the function names the explore dimensions only

3. order_items | avg(products.price) tends to give a value that is... weighted by how often each product appears in order items the unweighted average across distinct products zero, because order_items has no price column

4. From the source table, a function can reach a related model through... many-to-one relationships from the source toward the related model one-to-many fanout from the source any relationship in any direction

Do in AQLearn

Open AQLearn Pipe (2).
Rewrite the starter pipe expression as a single nested function call. Confirm the result is identical.
Compute average product price across order items in 2022. Note the value.
Compute average product price across products in 2022. Compare.
Add products | sum(order_items.quantity) as a measure and observe one row per product.

Next local lesson

Next: Function: select & group (2). The most common AQL pitfall and how to fix it with nested aggregations.

Pipe (2) and Source Table

Purpose of this lesson

Pipe is first-argument sugar

De-sugaring exercise

Pipe form

Function form

The deeper idea: source table semantics

Worked comparison: products vs order_items as source

Source = order_items

Source = products

SQL parallel

Cross-model aggregation: products | sum(order_items.quantity)

Dataset diagram

Retrieval practice

Do in AQLearn

Next local lesson

Cross-model aggregation: `products | sum(order_items.quantity)`