AQLearn 2.7 / Pattern

Nested Aggregations (1)

Source lesson: AQLearn 2.7: Nested Aggregations (1). Tangible win: write any "aggregate of aggregates" metric using one fixed pipeline shape.

Purpose of this lesson

Have you ever tried to answer "what is the highest number of users among all cities of each country"?

You hit these pain points:

  1. The question has two aggregates stacked on top of each other and you cannot tell AQL "first count, then take the max".
  2. You try max(count(users.id)) or count(users.id) | max() and get errors or wrong numbers.
  3. You explain to a customer "we are taking the highest of the per-city counts", but you cannot point at the part of the query that says it.

This lesson solves it: name the inner and outer parts of the question, then map them onto the fixed AQL shape group + select + outer aggregate. Same pattern works for highest, lowest, average, sum, and median of per-group values.

Why this matters for your mission: "average per X" and "top-of-something-per-something" questions are everywhere in presales. With this pattern, you stop guessing and write the right shape on first try.

Parse the question first

Pick a sentence with two aggregate words. Color the inner part one color and the outer part another:

"Highest number of users among all cities of each country."

LevelAggregate (verb)Dimension (what to group by)
Inner (blue)count of userscity
Outer (orange)max of the inner countscountry (set by the explore)

The fixed pipeline shape

metric my_metric =
  source_table
  | group(inner_dimension)
  | select(inner_aggregation)
  | outer_aggregate();
  1. source_table: the model that holds the rows you are counting (often the row source of the inner aggregate).
  2. group(inner_dimension): one row per inner group (city, month, customer, etc.).
  3. select(inner_aggregation): per-group value (count, sum, avg).
  4. outer_aggregate(): a scalar function that collapses those per-group values into one (max, avg, min, etc.). The explore dimensions partition this outer aggregate.

Visualize the two levels

Inner level (per inner_dimension) Source table users many user rows | group(inner dim) group(cities.id) one row per city | select(inner agg) select(count(users.id)) per-city user count Table [89, 103, 84, ...] one value per inner group Outer level (collapses inner table to a scalar) outer aggregate | max() highest among inner values Scalar one number per row of the explore feeds outer step
Inner level produces an intermediate Table (one value per inner group). The outer aggregate collapses that Table to one Scalar per row of the explore.

Worked example, step by step

Step 1: build the inner aggregation

metric my_metric =
  users
  | group(cities.id)
  | select(users | count(users.id))
;

explore {
  measures { my_metric }
}

Each result row currently holds a list of inner counts, one per city. With no outer dimension, all cities collapse into a single list.

Example output: [89, 88, 84, 97, 87, 95, 103, 101, 83, 91, 74] (the per-city counts).

Step 2: split the inner list by an outer dimension

Add an explore dimension so each row of the result holds only the per-city counts within that outer group.

explore {
  measures { my_metric }
  dimensions { countries.name }
}

Australia has two cities, so its row now holds two counts: [103, 89]. Vietnam holds [94, 83]. And so on.

Step 3: outer aggregate to collapse each inner list

metric my_metric =
  users
  | group(cities.id)
  | select(users | count(users.id))
  | max()
;

Now each row is one number. Australia: 103. Vietnam: 94. The original question is answered.

Step 4: swap the outer aggregate to change the question

Want "average per country" instead of "max per country"? Change max() to avg(). The shape of the metric stays the same.

... | select(users | count(users.id)) | avg();

Step 5: change the outer dimension to change the slice

Want "average per continent"? Keep the metric, change the explore dimension.

explore {
  measures { my_metric }
  dimensions { countries.continent_name }
}

The metric stayed the same. The explore did the rest.

SQL parallel

A subquery emits the inner table; the outer query aggregates again with the outer dimension as the partition.

select
  countries.name
  , max(per_city_count) as max_per_city_count
from (
  select
    countries.name
    , cities.id
    , count(users.id) as per_city_count
  from users
  left join cities on users.city_id = cities.id
  left join countries on cities.country_code = countries.code
  group by countries.name, cities.id
) t
group by countries.name

Reading shortcuts

"X per Y per Z"

Two "pers" means two levels. The innermost is the group() dimension. The outer is the explore dimension.

"Highest of, average of"

Any superlative or summary word over a count or average is an outer aggregate. Plug it in after select().

Dataset diagram

Inner group: cities.id. Outer dimension: countries.name or countries.continent_name. Source: users.

N:1 N:1 N:1 N:1 N:1 countries outer dim codetext, pk nameouter dim A continent_nameouter dim B .. cities inner group idinner group key nametext country_codetext, fk users source idcounted nametext gendertext agenumber sign_up_atdatetime city_idnumber, fk orders idnumber, pk statustext created_atdatetime delivery_attemptsnumber discountnumber user_idnumber, fk order_items order_idnumber, fk product_idnumber, fk quantitynumber products idnumber, pk nametext pricenumber category_idnumber, fk merchant_idnumber, fk
Three roles in one chain: users (source) is grouped by cities.id (inner) and rolled up by countries.name or continent_name (outer).
primary key foreign key
Customer-safe explanation: "We compute one number per inner group, then summarize those numbers per outer slice. AQL writes the subquery and the outer aggregate from one short pipeline."

Retrieval practice

1. The fixed shape for a nested aggregation is...
2. In "average number of users among all cities of each country", which level is the city grouping?
3. To switch from "per country" to "per continent", you...
4. Replacing | max() with | avg() changes which level of the question?

Do in AQLearn

  1. Open AQLearn Nested Aggregations (1).
  2. Run the starter. Confirm the result is one row with a list of per-city counts.
  3. Add countries.name as a dimension. Read Australia's row as [103, 89].
  4. Append | max() to the metric. Now Australia is 103.
  5. Swap max() with avg() and then change the dimension to countries.continent_name.

Next local lesson

Next: Nested aggregations (2). More patterns, including questions with three "pers" stacked.