Asked  7 Months ago    Answers:  5   Viewed   81 times

Hi For many days I have been working on this problem in MySQL, however I can not figure it out. Do any of you have suggestions?

Basically, I have a category table with domains like: id, name (name of category), and parent (id of parent of the category).

Example Data:

1  Fruit        0
2  Apple        1
3  pear         1
4  FujiApple    2
5  AusApple     2
6  SydneyAPPLE  5
....

There are many levels, possibly more than 3 levels. I want to create an sql query that groups the datas according to he hierarchy: parent > child > grandchild > etc.

It should output the tree structure, as follows:

1 Fruit 0
 ^ 2 Apple 1
   ^ 4 FujiApple 2
   - 5 AusApple 2
     ^ 6 SydneyApple 5
 - 3 pear 1

Can I do this using a single SQL query? The alternative, which I tried and does work, is the following:

SELECT * FROM category WHERE parent=0

After this, I loop through the data again, and select the rows where parent=id. This seems like a bad solution. Because it is mySQL, CTEs cannot be used.

 Answers

59

You can do it in a single call from php to mysql if you use a stored procedure:

Example calls

mysql> call category_hier(1);

+--------+---------------+---------------+----------------------+-------+
| cat_id | category_name | parent_cat_id | parent_category_name | depth |
+--------+---------------+---------------+----------------------+-------+
|      1 | Location      |          NULL | NULL                 |     0 |
|      3 | USA           |             1 | Location             |     1 |
|      4 | Illinois      |             3 | USA                  |     2 |
|      5 | Chicago       |             3 | USA                  |     2 |
+--------+---------------+---------------+----------------------+-------+
4 rows in set (0.00 sec)


$sql = sprintf("call category_hier(%d)", $id);

Hope this helps :)

Full script

Test table structure:

drop table if exists categories;
create table categories
(
cat_id smallint unsigned not null auto_increment primary key,
name varchar(255) not null,
parent_cat_id smallint unsigned null,
key (parent_cat_id)
)
engine = innodb;

Test data:

insert into categories (name, parent_cat_id) values
('Location',null),
   ('USA',1), 
      ('Illinois',2), 
      ('Chicago',2),  
('Color',null), 
   ('Black',3), 
   ('Red',3);

Procedure:

drop procedure if exists category_hier;

delimiter #

create procedure category_hier
(
in p_cat_id smallint unsigned
)
begin

declare v_done tinyint unsigned default 0;
declare v_depth smallint unsigned default 0;

create temporary table hier(
 parent_cat_id smallint unsigned, 
 cat_id smallint unsigned, 
 depth smallint unsigned default 0
)engine = memory;

insert into hier select parent_cat_id, cat_id, v_depth from categories where cat_id = p_cat_id;

/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */

create temporary table tmp engine=memory select * from hier;

while not v_done do

    if exists( select 1 from categories p inner join hier on p.parent_cat_id = hier.cat_id and hier.depth = v_depth) then

        insert into hier 
            select p.parent_cat_id, p.cat_id, v_depth + 1 from categories p 
            inner join tmp on p.parent_cat_id = tmp.cat_id and tmp.depth = v_depth;

        set v_depth = v_depth + 1;          

        truncate table tmp;
        insert into tmp select * from hier where depth = v_depth;

    else
        set v_done = 1;
    end if;

end while;

select 
 p.cat_id,
 p.name as category_name,
 b.cat_id as parent_cat_id,
 b.name as parent_category_name,
 hier.depth
from 
 hier
inner join categories p on hier.cat_id = p.cat_id
left outer join categories b on hier.parent_cat_id = b.cat_id
order by
 hier.depth, hier.cat_id;

drop temporary table if exists hier;
drop temporary table if exists tmp;

end #

Test runs:

delimiter ;

call category_hier(1);

call category_hier(2);

Some performance testing using Yahoo geoplanet places data

drop table if exists geoplanet_places;
create table geoplanet_places
(
woe_id int unsigned not null,
iso_code  varchar(3) not null,
name varchar(255) not null,
lang varchar(8) not null,
place_type varchar(32) not null,
parent_woe_id int unsigned not null,
primary key (woe_id),
key (parent_woe_id)
)
engine=innodb;

mysql> select count(*) from geoplanet_places;
+----------+
| count(*) |
+----------+
|  5653967 |
+----------+

so that's 5.6 million rows (places) in the table let's see how the adjacency list implementation/stored procedure called from php handles that.

     1 records fetched with max depth 0 in 0.001921 secs
   250 records fetched with max depth 1 in 0.004883 secs
   515 records fetched with max depth 1 in 0.006552 secs
   822 records fetched with max depth 1 in 0.009568 secs
   918 records fetched with max depth 1 in 0.009689 secs
  1346 records fetched with max depth 1 in 0.040453 secs
  5901 records fetched with max depth 2 in 0.219246 secs
  6817 records fetched with max depth 1 in 0.152841 secs
  8621 records fetched with max depth 3 in 0.096665 secs
 18098 records fetched with max depth 3 in 0.580223 secs
238007 records fetched with max depth 4 in 2.003213 secs

Overall i'm pretty pleased with those cold runtimes as I wouldn't even begin to consider returning tens of thousands of rows of data to my front end but would rather build the tree dynamically fetching only several levels per call. Oh and just incase you were thinking innodb is slower than myisam - the myisam implementation I tested was twice as slow in all counts.

More stuff here : http://pastie.org/1672733

Hope this helps :)

Tuesday, June 1, 2021
 
NaeiKinDus
answered 7 Months ago
92

As pointed out above this isn't truly recursive but if you know how many steps deep you need to go as a maximum, you can use something along these lines (perhaps use PHP to generate the query):

I'd first set parent ID to NULL rather than 0, but that's personal preference.

SELECT * FROM table t1
LEFT JOIN table t2 ON t2.parent_id = t1.role_id
LEFT JOIN table t3 ON t3.parent_id = t2.role_id
WHERE t1.parent_id IS NULL

^^ however deep you need to go in that case.

[next bit not strictly relevant]

You can then manipulate the output something along these lines:

SELECT
        (CASE 
        WHEN (t1.name IS NULL AND t2.name IS NULL) THEN t3.name
        WHEN (t1.name IS NULL AND t2.name IS NOT NULL) THEN t2.name
        ELSE t1.name END)  AS first,
        (CASE 
        WHEN (t1.name IS NOT NULL AND t2.name IS NOT NULL) THEN t2.name
        WHEN (t2.name IS NULL AND t3.name IS NOT NULL) THEN NULL
        ELSE t3.name END)  AS second,
        (CASE 
        WHEN (t1.name IS NOT NULL) THEN t3.name
        ELSE  NULL END)  AS third
FROM
Friday, July 30, 2021
 
Gersom
answered 5 Months ago
36

In PostgreSQL you can use recursive CTEs (Common Table Expression) to walk trees in your queries.

Here are two relevant links into the docs:

  • Syntax
  • Examples

EDIT

Since there is no subselect required it might run a little better on a larger dataset than Arion's query.

WITH RECURSIVE children AS (
    -- select leaf nodes
    SELECT id, value, parent
        FROM t
        WHERE value IS NOT NULL
    UNION ALL
    -- propagate values of leaf nodes up, adding rows 
    SELECT t.id, children.value, t.parent
        FROM children JOIN t ON children.parent = t.id
)
SELECT id, sum(value) 
    FROM children 
    GROUP BY id   -- sum up appropriate rows
    ORDER BY id;
Monday, August 2, 2021
 
iJade
answered 4 Months ago
24

In your INSERT statement in the Libary_Update trigger you have the following line:

WHERE subtree.`ancestor` = NEW.`iD`

but you aren't updating the ID field so I don't think you will have a NEW.iD value. Should that line possibly use OLD.iD instead?

Thursday, September 30, 2021
 
Sanguine
answered 2 Months ago
69

The whole closure table is redundant if you can use recursive queries :)

I think it's much better to have a complicated recursive query that you have to figure out once than deal with the extra IO (and disk space) of a separate table and associated triggers.

I have done some simple tests with recursive queries in postgres. With a few million rows in the table queries were still < 10ms for returning all parents of a particular child. Returning all children was fast too, depending on the level of the parent. It seemed to depend more on disk IO fetching the rows rather than the query speed itself. This was done single user, so not sure how it would perform under load. I suspect it would be very fast still if you can also hold most of the table in memory (and setup postgres correctly). Clustering the table by parent id also seemed to help.

Friday, October 8, 2021
 
Reid
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share