AQLearn 2.7 / Pattern

Nested Aggregations (1)

Source lesson: AQLearn 2.7: Nested Aggregations (1). Tangible win: write any "aggregate of aggregates" metric using one fixed pipeline shape.

Purpose of this lesson

Have you ever tried to answer "what is the highest number of users among all cities of each country"?

You hit these pain points:

The question has two aggregates stacked on top of each other and you cannot tell AQL "first count, then take the max".
You try max(count(users.id)) or count(users.id) | max() and get errors or wrong numbers.
You explain to a customer "we are taking the highest of the per-city counts", but you cannot point at the part of the query that says it.

This lesson solves it: name the inner and outer parts of the question, then map them onto the fixed AQL shape group + select + outer aggregate. Same pattern works for highest, lowest, average, sum, and median of per-group values.

Why this matters for your mission: "average per X" and "top-of-something-per-something" questions are everywhere in presales. With this pattern, you stop guessing and write the right shape on first try.

Parse the question first

Pick a sentence with two aggregate words. Color the inner part one color and the outer part another:

"Highest number of users among all cities of each country."

Level	Aggregate (verb)	Dimension (what to group by)
Inner (blue)	count of users	city
Outer (orange)	max of the inner counts	country (set by the explore)

The fixed pipeline shape

metric my_metric =
  source_table
  | group(inner_dimension)
  | select(inner_aggregation)
  | outer_aggregate();

source_table: the model that holds the rows you are counting (often the row source of the inner aggregate).
group(inner_dimension): one row per inner group (city, month, customer, etc.).
select(inner_aggregation): per-group value (count, sum, avg).
outer_aggregate(): a scalar function that collapses those per-group values into one (max, avg, min, etc.). The explore dimensions partition this outer aggregate.

Visualize the two levels

Inner level produces an intermediate Table (one value per inner group). The outer aggregate collapses that Table to one Scalar per row of the explore.

Worked example, step by step

Step 1: build the inner aggregation

metric my_metric =
  users
  | group(cities.id)
  | select(users | count(users.id))
;

explore {
  measures { my_metric }
}

Each result row currently holds a list of inner counts, one per city. With no outer dimension, all cities collapse into a single list.

Example output: [89, 88, 84, 97, 87, 95, 103, 101, 83, 91, 74] (the per-city counts).

Step 2: split the inner list by an outer dimension

Add an explore dimension so each row of the result holds only the per-city counts within that outer group.

explore {
  measures { my_metric }
  dimensions { countries.name }
}

Australia has two cities, so its row now holds two counts: [103, 89]. Vietnam holds [94, 83]. And so on.

Step 3: outer aggregate to collapse each inner list

metric my_metric =
  users
  | group(cities.id)
  | select(users | count(users.id))
  | max()
;

Now each row is one number. Australia: 103. Vietnam: 94. The original question is answered.

Step 4: swap the outer aggregate to change the question

Want "average per country" instead of "max per country"? Change max() to avg(). The shape of the metric stays the same.

... | select(users | count(users.id)) | avg();

Step 5: change the outer dimension to change the slice

Want "average per continent"? Keep the metric, change the explore dimension.

explore {
  measures { my_metric }
  dimensions { countries.continent_name }
}

The metric stayed the same. The explore did the rest.

SQL parallel

A subquery emits the inner table; the outer query aggregates again with the outer dimension as the partition.

select
  countries.name
  , max(per_city_count) as max_per_city_count
from (
  select
    countries.name
    , cities.id
    , count(users.id) as per_city_count
  from users
  left join cities on users.city_id = cities.id
  left join countries on cities.country_code = countries.code
  group by countries.name, cities.id
) t
group by countries.name

Reading shortcuts

"X per Y per Z"

Two "pers" means two levels. The innermost is the group() dimension. The outer is the explore dimension.

"Highest of, average of"

Any superlative or summary word over a count or average is an outer aggregate. Plug it in after select().

Order of operations

The outer aggregate runs at step 6 of the AQL pipeline. The explore dimensions partition that step.

Dataset diagram

Inner group: cities.id. Outer dimension: countries.name or countries.continent_name. Source: users.

Three roles in one chain: users (source) is grouped by cities.id (inner) and rolled up by countries.name or continent_name (outer).

primary key foreign key

Customer-safe explanation: "We compute one number per inner group, then summarize those numbers per outer slice. AQL writes the subquery and the outer aggregate from one short pipeline."

Retrieval practice

Do in AQLearn

Open AQLearn Nested Aggregations (1).
Run the starter. Confirm the result is one row with a list of per-city counts.
Add countries.name as a dimension. Read Australia's row as [103, 89].
Append | max() to the metric. Now Australia is 103.
Swap max() with avg() and then change the dimension to countries.continent_name.

Next local lesson

Next: Nested aggregations (2). More patterns, including questions with three "pers" stacked.