AQLearn 2.6 / Pitfall

select() and group() (2)

Source lesson: AQLearn 2.6: Function: select & group (2). Tangible win: avoid the most common AQL beginner mistake and replace it with a proper nested-aggregation pattern.

Purpose of this lesson

Have you ever tried to compute "average age per continent, but only for countries" and gotten the error "column may refer to multiple values"?

You hit these pain points:

  1. You wrote users | group(countries.name) | avg(users.age) because it reads naturally. AQL refused.
  2. The error message says "not grouped and may refer to multiple values", but you cannot map it back to the AQL you wrote.
  3. You do not know if you want "average across the whole continent" or "highest average age among countries". Without naming that intent, you cannot pick the right syntax.

This lesson solves it: name the rule (group() and select() always travel together), then choose between flattening the group or doing a real nested aggregation. Both options are short, both are unambiguous.

Why this matters for your mission: the next two lessons are entirely about nested aggregations. This lesson is the door you walk through to get there.

The trap, in one line

// Invalid. AQL will reject this.
users | group(countries.name) | avg(users.age)

Reads like "group by country, then average age". The problem: avg() is a Scalar-output aggregate (see Lesson 0012). It must produce exactly one value per row. But group(countries.name) packs many user ages into each country row. The aggregate cannot decide which set of ages to average.

Error: The column users.age is not grouped and may refer to multiple values. AQL is asking you "which slice of users.age do you want me to average?"

Visualize the conflict

Table (users) many users, each row has its own age group(countries.name) one row per country, each row holds many users.age values avg(users.age) expects exactly one users.age per row, finds many. Error. Rejected "may refer to multiple values" Each grouped row holds many users.age values: Vietnam: [34, 29, 41, 27, 52] Singapore: [38, 45] USA: [22, 36, 28, 31, 40]
After grouping, each row holds a set of ages. A scalar aggregate cannot reduce a set into one value without an explicit instruction.

The rule

Rule: group() and select() go hand-in-hand. After grouping, the next step must be select() with explicit aggregates that say "for each group, give me this single value".

If you do not have a select() after group(), you are not building a derived table; you are confusing AQL.

Two valid alternatives

Pick based on what you actually want.

Option A: drop the group

If you want a single number across the whole filtered set (one average per explore dimension), just remove group():

users | avg(users.age)

Result: average age across all users in the current explore context. AQL respects the explore dimensions (for example, countries.continent_name) so you still get one average per continent.

Option B: nested aggregation

If you want "highest average age among countries in each continent", build the inner table with group + select, then aggregate again:

users
  | group(countries.name)
  | select(
      countries.name,
      avg_age: avg(users.age),
    )
  | max(avg_age)

Read bottom-up: average age per country, then take the max across countries. The explore dimension (continent) shapes the outer max.

How to read your own intent

Intent (plain English)AQL pattern
Average across the whole group set as defined by the exploreusers | avg(users.age)
An aggregate of per-group aggregates (max of averages, average of counts, etc.)users | group(X) | select(metric: avg(users.age)) | max(metric)
A derived table you want to keep usingusers | group(X) | select(...) (Table out)

SQL parallel

SQL emits the same warning, with different words. If you write a SELECT with non-aggregated columns that are not in GROUP BY, the database refuses. AQL is enforcing the same shape rule at metric-expression time.

-- SQL equivalent of Option B
select
  countries.continent_name
  , max(country_avg_age) as max_avg_age
from (
  select
    countries.continent_name
    , countries.name as country_name
    , avg(users.age) as country_avg_age
  from users
  left join cities on users.city_id = cities.id
  left join countries on cities.country_code = countries.code
  group by countries.continent_name, countries.name
) t
group by countries.continent_name

Dataset diagram

This lesson groups by countries.name while keeping countries.continent_name as the outer explore dimension. Both are reached from users via the relationship chain.

N:1 N:1 N:1 N:1 N:1 countries groups + outer codetext, pk nameinner group continent_nameouter dim codetext cities idnumber, pk nametext country_codetext, fk users source idnumber, pk nametext gendertext ageaggregated sign_up_atdatetime city_idnumber, fk orders idnumber, pk statustext created_atdatetime delivery_attemptsnumber discountnumber user_idnumber, fk order_items order_idnumber, fk product_idnumber, fk quantitynumber products idnumber, pk nametext pricenumber category_idnumber, fk merchant_idnumber, fk
Source: users. Inner group: countries.name (one row per country). Outer dimension: countries.continent_name (set by the explore). Inner aggregate: avg(users.age). Outer aggregate: max(avg_age).
primary key foreign key
Customer-safe explanation: "If we want a single average, we ask for one average. If we want an aggregate over groups, we tell AQL the group and the per-group value, then the outer aggregate. Never both vague at once."

Retrieval practice

1. users | group(countries.name) | avg(users.age) is...
2. Which function must follow group() to keep the pipeline meaningful?
3. To get the average age across the whole explore (no nested aggregation), the simplest fix is to...
4. After group() | select(inner_agg), the next step in a nested aggregation is...

Do in AQLearn

  1. Open AQLearn select & group (2).
  2. Run the starter expression and read the "may refer to multiple values" error.
  3. Replace | avg(users.age) with | select(countries.name, avg_age: avg(users.age)). Run.
  4. Append | max(avg_age) to get the highest country average per continent.
  5. Now try the simpler fix: remove group() entirely and run. Compare.

Next local lesson

Next: Nested aggregations (1). Apply the group + select + outer aggregate pattern to real questions.