AQLearn 2.4 / Functions

select() and group()

Source lesson: AQLearn 2.4: Function: select & group (1). Tangible win: build derived tables on the fly with group() for grain and select() for columns.

Purpose of this lesson

Have you ever tried to answer "show me one row per city with the city name, its user count, and its order count"?

You hit these pain points:

  1. You need a derived table, not just an explore. The result has a fixed grain (one row per city) and multiple computed columns.
  2. In SQL you write a subquery with GROUP BY plus several aggregates and aliases. That much boilerplate breaks the flow of building a metric.
  3. Once you have the derived table, you want to feed it into another aggregate (for example, "average user count across cities"). Going through explore + report twice is awkward.

This lesson solves it: group() sets the grain inside a pipeline; select() picks the output columns, including inline aggregates. Together they build a Table that the next pipe step can keep transforming.

Why this matters for your mission: nested aggregations and "average per X" metrics all start with group() + select(). This is the foundation for the rest of chapter 2.

One sentence each

group(dim1, dim2, ...)

Sets the grain of the table. Each unique combination of dimensions becomes one output row. Like SQL GROUP BY.

select(col1, col2, ...)

Picks columns to include. Each column can be a plain field, a renamed field with name:, or an inline aggregate. Like SQL SELECT.

Both are Table in, Table out (see Lesson 0012). They keep the pipeline open for more steps.

Shape diagram

Table users many rows, all columns | group() group(cities.name) grain: one row per city | select() select(cities.name, user_count: count(users.id)) pick output columns Table one row per city, 2 columns
Pipe a base table into group() to set the grain, then into select() to pick the columns.

Worked example 1: rename in select()

metric my_metric = users
  | select(
      hi_city: concat('hi ', cities.name),
    );

One column. Each user row gets a label like "hi London". select() here behaves like SQL SELECT with an alias: pick one output column, rename it.

SQL analogue:

select
  'hi ' || cities.name as hi_city
from users
left join cities on users.city_id = cities.id

Worked example 2: select() with inline aggregates

metric my_metric = users
  | select(
      hi_city: concat('hi ', cities.name),
      user_count: users | count(users.id),
      order_count: orders | count(orders.id),
    );

select() can include aggregates in any column. AQL groups by the non-aggregate columns implicitly, similar to how a BI tool would.

SQL analogue:

select
  'hi ' || cities.name as hi_city
  , count(users.id) as user_count
  , count(orders.id) as order_count
from users
left join cities on users.city_id = cities.id
left join orders on orders.user_id = users.id
group by 'hi ' || cities.name

Worked example 3: explicit group() for a derived table

Same answer, with the grain stated explicitly.

metric city_user_counts = users
  | group(cities.name)
  | select(
      city: cities.name,
      user_count: count(users.id),
    );

One row per city, two columns. The shape stays a Table.

SQL analogue:

select
  cities.name as city
  , count(users.id) as user_count
from users
left join cities on users.city_id = cities.id
group by cities.name

Preview: nested aggregation

Because group + select outputs a Table, you can pipe it into another aggregate to compute "an aggregate of aggregates". This is the nested aggregation pattern (covered fully in the next lesson).

// "Average number of users per city"
metric avg_users_per_city = users
  | group(cities.name)
  | select(count(users.id))
  | avg();

Read it bottom-up: count users per city, then average those counts. The inner table has one row per city; the outer avg() collapses them to a single scalar.

Customer-safe explanation: "We can build a small intermediate table on the fly, give it the grain we want, then keep aggregating. No subquery boilerplate."

SQL mapping

AQLSQL
group(cities.name)group by cities.name
select(city: cities.name)select cities.name as city
select(user_count: count(users.id))select count(users.id) as user_count
users | group(...) | select(...)derived table (subquery or CTE)

Dataset diagram

This lesson groups and selects across users, cities, and orders.

N:1 N:1 N:1 N:1 N:1 countries codetext, pk nametext continent_nametext cities group key idnumber, pk nametext country_codetext, fk users source idnumber, pk nametext gendertext agenumber sign_up_atdatetime city_idnumber, fk orders extra agg idnumber, pk statustext created_atdatetime delivery_attemptsnumber discountnumber user_idnumber, fk order_items order_idnumber, fk product_idnumber, fk quantitynumber products idnumber, pk nametext pricenumber category_idnumber, fk merchant_idnumber, fk
Source table: users. Group key: cities.name (joined automatically). Extra inline aggregate from orders follows the relationship chain back.
primary key foreign key

Retrieval practice

1. The output shape of users | group(cities.name) | select(...) is...
2. group(cities.name) maps to which SQL clause?
3. Can select() contain inline aggregates like count(orders.id)?
4. Piping group() | select(...) into avg() produces a...

Do in AQLearn

  1. Open AQLearn select & group (1).
  2. Add | select(hi_city: concat('hi ', cities.name)) to the metric and run.
  3. Extend the select(...) to include user_count and order_count.
  4. Add | group(cities.name) before the select and confirm the result has one row per city.

Next local lesson

Next: Pipe (2) and Source Table. The pipe as first-argument sugar, and how the source table changes a metric's meaning.