Asked  7 Months ago    Answers:  5   Viewed   38 times

This code works as expected, but I it's long and creepy.

select p.name, p.played, w.won, l.lost from

(select users.name, count(games.name) as played
from users
inner join games on games.player_1_id = users.id
where games.winner_id > 0
group by users.name
union
select users.name, count(games.name) as played
from users
inner join games on games.player_2_id = users.id
where games.winner_id > 0
group by users.name) as p

inner join

(select users.name, count(games.name) as won
from users
inner join games on games.player_1_id = users.id
where games.winner_id = users.id
group by users.name
union
select users.name, count(games.name) as won
from users
inner join games on games.player_2_id = users.id
where games.winner_id = users.id
group by users.name) as w on p.name = w.name

inner join

(select users.name, count(games.name) as lost
from users
inner join games on games.player_1_id = users.id
where games.winner_id != users.id
group by users.name
union
select users.name, count(games.name) as lost
from users
inner join games on games.player_2_id = users.id
where games.winner_id != users.id
group by users.name) as l on l.name = p.name

As you can see, it consists of 3 repetitive parts for retrieving:

  • player name and the amount of games they played
  • player name and the amount of games they won
  • player name and the amount of games they lost

And each of those also consists of 2 parts:

  • player name and the amount of games in which they participated as player_1
  • player name and the amount of games in which they participated as player_2

How could this be simplified?

The result looks like so:

           name            | played | won | lost 
---------------------------+--------+-----+------
 player_a                  |      5 |   2 |    3
 player_b                  |      3 |   2 |    1
 player_c                  |      2 |   1 |    1

 Answers

89

The aggregate FILTER clause in Postgres 9.4 or newer is shorter and faster:

SELECT u.name
     , count(*) FILTER (WHERE g.winner_id  > 0)    AS played
     , count(*) FILTER (WHERE g.winner_id  = u.id) AS won
     , count(*) FILTER (WHERE g.winner_id <> u.id) AS lost
FROM   games g
JOIN   users u ON u.id IN (g.player_1_id, g.player_2_id)
GROUP  BY u.name;
  • The manual
  • Postgres Wiki
  • Depesz blog post

In Postgres 9.3 (or any version) this is still shorter and faster than nested sub-selects or CASE expressions:

SELECT u.name
     , count(g.winner_id  > 0 OR NULL)    AS played
     , count(g.winner_id  = u.id OR NULL) AS won
     , count(g.winner_id <> u.id OR NULL) AS lost
FROM   games g
JOIN   users u ON u.id IN (g.player_1_id, g.player_2_id)
GROUP  BY u.name;

Details:

  • For absolute performance, is SUM faster or COUNT?
Tuesday, June 1, 2021
 
samrap
answered 7 Months ago
36

Simple query

This can be much simpler with PostgreSQL 9.1 or later. As explained in this closely related answer:

  • PGError: ERROR: aggregates not allowed in WHERE clause on a AR query of an object and its has_many objects

It is enough to GROUP BY the primary key of a table. Since:

foo1 is a primary key

.. you can simplify your example to:

SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM   tbl1
GROUP  BY 1
ORDER  BY foo7, foo8;  -- have to be spelled out, since not in select list!

Query with multiple tables

However, since you have:

many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate

.. it should be faster and simpler to aggregate first, join later:

SELECT t1.foo1, t1.foo2, ...
     , t2.bar1, t2.bar2, ...
     , a.aggregated_col 
FROM   tbl1 t1
LEFT   JOIN tbl2 t2 ON ...
...
LEFT   JOIN (
   SELECT some_id, string_agg(agg_col, ', ') AS aggregated_col
   FROM   agg_tbl a ON ...
   GROUP  BY some_id
   ) a ON a.some_id = ?.some_id
ORDER  BY ...

This way the big portion of your query does not need aggregation at all.

I recently provided a test case in an SQL Fiddle to prove the point in this related answer:

  • PostgreSQL - order by an array

Since you are referring to this related answer: No, DISTINCT is not going to help at all in this case.

Tuesday, June 15, 2021
 
turson
answered 6 Months ago
97

The following method gets rid of the in-line view to fetch duplicates, it uses REGEXP_REPLACE and RTRIM on the LISTAGG function to get the distinct result set in the aggregated list. Thus, it won't do more than one scan.

Adding this piece to your code,

RTRIM(REGEXP_REPLACE(listagg (tm_redir.team_code, ',') 
                     WITHIN GROUP (ORDER BY tm_redir.team_code),
                     '([^,]+)(,1)+', '1'),
                     ',')

Modified query-

SQL> with tran_party as -- ALL DUMMY DATA ARE IN THESE CTE FOR YOUR REFERENCE
  2           (select 1 tran_party_id, 11 transaction_id, 101 team_id_redirect
  3              from dual
  4            union all
  5            select 2, 11, 101 from dual
  6            union all
  7            select 3, 11, 102 from dual
  8            union all
  9            select 4, 12, 103 from dual
 10            union all
 11            select 5, 12, 103 from dual
 12            union all
 13            select 6, 12, 104 from dual
 14            union all
 15            select 7, 13, 104 from dual
 16            union all
 17            select 8, 13, 105 from dual),
 18       tran as
 19           (select 11 transaction_id, 1001 account_id, 1034.93 amount from dual
 20            union all
 21            select 12, 1001, 2321.89 from dual
 22            union all
 23            select 13, 1002, 3201.47 from dual),
 24       account as
 25           (select 1001 account_id, 111 team_id from dual
 26            union all
 27            select 1002, 112 from dual),
 28       team as
 29           (select 101 team_id, 'UUU' as team_code from dual
 30            union all
 31            select 102, 'VV' from dual
 32            union all
 33            select 103, 'WWW' from dual
 34            union all
 35            select 104, 'XXXXX' from dual
 36            union all
 37            select 105, 'Z' from dual)
 38  -- The Actual Query
 39  select a.account_id,
 40         t.transaction_id,
 41         (SELECT  RTRIM(
 42           REGEXP_REPLACE(listagg (tm_redir.team_code, ',')
 43                     WITHIN GROUP (ORDER BY tm_redir.team_code),
 44             '([^,]+)(,1)+', '1'),
 45           ',')
 46            from tran_party tp_redir
 47                 inner join team tm_redir
 48                     on tp_redir.team_id_redirect = tm_redir.team_id
 49                 inner join tran t_redir
 50                     on tp_redir.transaction_id = t_redir.transaction_id
 51           where     t_redir.account_id = a.account_id
 52                 and t_redir.transaction_id != t.transaction_id)
 53             AS teams_redirected
 54    from tran t inner join account a on t.account_id = a.account_id
 55  /

ACCOUNT_ID TRANSACTION_ID TEAMS_REDIRECTED
---------- -------------- --------------------
      1001             11 WWW,XXXXX
      1001             12 UUU,VV
      1002             13

SQL>
Friday, July 30, 2021
 
o_flyer
answered 5 Months ago
52

Building on Laurence's answer, here's a pure SQL way to aggregate the summed key/value pairs into a new hstore using array_agg and the hstore(text[], text[]) constructor.

http://sqlfiddle.com/#!1/9f1fb/17

SELECT hstore(array_agg(hs_key), array_agg(hs_value::text))
FROM (
  SELECT
    s.hs_key, sum(s.hs_value::integer)
  FROM (
    SELECT (each(goals)).* FROM statistics
  ) as s(hs_key, hs_value)
  GROUP BY hs_key
) x(hs_key,hs_value)

I've also replaced to_number with a simple cast to integer and simplified the key/value iteration.

Wednesday, August 11, 2021
 
Gabriele
answered 4 Months ago
41

In PostgreSQL 9.0 or later you can order elements inside aggregate functions:

SELECT company_id, array_agg(employee ORDER BY company_id DESC)::text
FROM   tbl
GROUP  BY 1;

That's not available for PostgreSQL 8.4. You have to pre-order values to be aggregated. Use a subselect or CTE (8.4+) for this purpose:

SELECT company_id, array_agg(employee)::text
FROM  (SELECT * FROM tbl ORDER BY company_id, employee  DESC) x
GROUP  BY 1;

I order by company_id in addition, because that should speed up the aggregation.

I also use the "trick" of just casting the array to text (array_agg(employee)::text), which gives you a basic, comma-separated string without additional white space and is the fastest way.
For more sophisticated formatting, use array_to_string(array_agg(employee), ', ') like you have in your question.

Friday, October 29, 2021
 
John Kugelman
answered 1 Month ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share