AQLearn 2.5 / Concept

Pipe (2) and Source Table

Source lesson: AQLearn 2.5: Pipe (2). Tangible win: see the pipe as "first-argument sugar", and predict how the source table changes a metric's meaning.

Purpose of this lesson

Have you ever tried to answer "average product price among ordered items in 2022" and gotten a different number than expected?

You hit these pain points:

  1. Two queries look almost identical (products | avg(products.price) vs order_items | avg(products.price)), but the numbers differ. You cannot explain why on a call.
  2. You read a long pipeline and lose track of which model AQL is iterating over.
  3. You want to read deeply nested AQL but the nesting is intimidating; the pipe and the function-call form look like different languages.

This lesson solves it: pipe is just first-argument sugar, and the left-side input is the source table that defines which rows the function iterates over. Same expression, different source, different number, by design.

Why this matters for your mission: "wrong number" debates almost always trace to a source-table mismatch. Knowing where the iteration happens lets you defend or fix the answer in one sentence.

Pipe is first-argument sugar

The pipe character does one thing: it takes the value on its left and uses it as the first argument of the function on its right.

left | f(arg2, arg3)
// is exactly the same as
f(left, arg2, arg3)
Pipe form products | count(products.id) inserted as first argument Function form (equivalent) count(products, products.id)
The pipe and the comma are two ways to write the same thing. Pipe just makes long pipelines readable.

De-sugaring exercise

Practice rewriting pipe form as function form.

Pipe form

products
  | select(double: products.price * 2)
  | select(quadruple: double * 2)

Function form

select(
  select(products, double: products.price * 2),
  quadruple: double * 2,
)

Read right-to-left: the outermost call is the last step in the pipeline. Each previous step is the first argument of the next.

The deeper idea: source table semantics

The left-of-pipe value is more than a parameter. It is the source table. The function:

Change the source table and you change what each iteration counts. Two metrics with the same body can return very different numbers.

Worked comparison: products vs order_items as source

Both pipelines compute avg(products.price) after filtering to 2022. The numbers will differ. Why?

Source = order_items

order_items
  | filter(orders.created_at matches @2022)
  | avg(products.price)

Iterates over each order item. Looks up the product's price for each item. Average gives extra weight to products that were ordered more often.

Reading: "average product price weighted by how often each product appears in order items in 2022".

Source = products

products
  | filter(orders.created_at matches @2022)
  | avg(products.price)

Iterates over each distinct product (filtered to those with 2022 orders). Each product contributes once. Pure unweighted average across the distinct ordered products.

Reading: "average price of distinct products ordered in 2022".

SQL parallel

In SQL the source table is whatever is in FROM. The same observations apply: counts, averages, and sums are weighted by the row count of the source.

-- order_items source
select avg(products.price)
from order_items
left join orders on order_items.order_id = orders.id
left join products on order_items.product_id = products.id
where date_trunc('year', orders.created_at) = '2022-01-01'

-- products source
select avg(products.price)
from products
where exists (
  select 1
  from order_items
  left join orders on order_items.order_id = orders.id
  where order_items.product_id = products.id
    and date_trunc('year', orders.created_at) = '2022-01-01'
)

Cross-model aggregation: products | sum(order_items.quantity)

The source is products. The function reaches into order_items through the relationship order_items.product_id > products.id. AQL iterates per product, summing the quantity across that product's items.

explore {
  dimensions {
    products.name,
  }
  measures {
    items_sold: products | sum(order_items.quantity),
  }
}

One row per product, total quantity ordered for that product. Switching the source to order_items would change "iterate per item" semantics; the dimension grain dictates the rest.

Dataset diagram

Source-table choice determines which model is the "iterator" and which fields are reachable.

N:1 N:1 N:1 N:1 N:1 countries codetext, pk nametext continent_nametext cities idnumber, pk nametext country_codetext, fk users idnumber, pk nametext gendertext agenumber sign_up_atdatetime city_idnumber, fk orders idnumber, pk statustext created_atdatetime delivery_attemptsnumber discountnumber user_idnumber, fk order_items source A order_idnumber, fk product_idnumber, fk quantitynumber products source B idnumber, pk nametext pricenumber category_idnumber, fk merchant_idnumber, fk
Source A (order_items) iterates per item, reaching into products and orders via N:1. Source B (products) iterates per distinct product. Same aggregate body, different number.
primary key foreign key
Customer-safe explanation: "AQL averages over the rows of the source table. Pick the wrong source and you weight your numbers wrong, even if the formula looks right."

Retrieval practice

1. The pipe x | f(y) is syntactic sugar for...
2. An aggregate iterates over the rows of...
3. order_items | avg(products.price) tends to give a value that is...
4. From the source table, a function can reach a related model through...

Do in AQLearn

  1. Open AQLearn Pipe (2).
  2. Rewrite the starter pipe expression as a single nested function call. Confirm the result is identical.
  3. Compute average product price across order items in 2022. Note the value.
  4. Compute average product price across products in 2022. Compare.
  5. Add products | sum(order_items.quantity) as a measure and observe one row per product.

Next local lesson

Next: Function: select & group (2). The most common AQL pitfall and how to fix it with nested aggregations.