Asked  7 Months ago    Answers:  5   Viewed   43 times

This query is very simple, all I want to do, is get all the articles in given category ordered by last_updated field:

SELECT
    `articles`.*
FROM
    `articles`,
    `articles_to_categories`
WHERE
        `articles`.`id` = `articles_to_categories`.`article_id`
        AND `articles_to_categories`.`category_id` = 1
ORDER BY `articles`.`last_updated` DESC
LIMIT 0, 20;

But it runs very slow. Here is what EXPLAIN said:

select_type  table                   type     possible_keys           key         key_len  ref                                rows  Extra
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SIMPLE       articles_to_categories  ref      article_id,category_id  article_id  5        const                              5016  Using where; Using temporary; Using filesort
SIMPLE       articles                eq_ref   PRIMARY                 PRIMARY     4        articles_to_categories.article_id  1

Is there a way to rewrite this query or add additional logic to my PHP scripts to avoid Using temporary; Using filesort and speed thing up?

The table structure:

*articles*
id | title | content | last_updated

*articles_to_categories*
article_id | category_id

UPDATE

I have last_updated indexed. I guess my situation is explained in documentation:

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

The key used to fetch the rows is not the same as the one used in the ORDER BY: SELECT * FROM t1 WHERE key2=constant ORDER BY key1;

You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

but I still have no idea how to fix this.

 Answers

62

Here's a simplified example I did for a similar performance related question sometime ago that takes advantage of innodb clustered primary key indexes (obviously only available with innodb !!)

  • http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
  • http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

You have 3 tables: category, product and product_category as follows:

drop table if exists product;
create table product
(
prod_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists product_category;
create table product_category
(
cat_id mediumint unsigned not null,
prod_id int unsigned not null,
primary key (cat_id, prod_id) -- **note the clustered composite index** !!
)
engine = innodb;

The most import thing is the order of the product_catgeory clustered composite primary key as typical queries for this scenario always lead by cat_id = x or cat_id in (x,y,z...).

We have 500K categories, 1 million products and 125 million product categories.

select count(*) from category;
+----------+
| count(*) |
+----------+
|   500000 |
+----------+

select count(*) from product;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

select count(*) from product_category;
+-----------+
| count(*)  |
+-----------+
| 125611877 |
+-----------+

So let's see how this schema performs for a query similar to yours. All queries are run cold (after mysql restart) with empty buffers and no query caching.

select
 p.*
from
 product p
inner join product_category pc on 
    pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
 p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
limit 20;

+---------+----------------+
| prod_id | name           |
+---------+----------------+
|  993561 | Product 993561 |
|  991215 | Product 991215 |
|  989222 | Product 989222 |
|  986589 | Product 986589 |
|  983593 | Product 983593 |
|  982507 | Product 982507 |
|  981505 | Product 981505 |
|  981320 | Product 981320 |
|  978576 | Product 978576 |
|  973428 | Product 973428 |
|  959384 | Product 959384 |
|  954829 | Product 954829 |
|  953369 | Product 953369 |
|  951891 | Product 951891 |
|  949413 | Product 949413 |
|  947855 | Product 947855 |
|  947080 | Product 947080 |
|  945115 | Product 945115 |
|  943833 | Product 943833 |
|  942309 | Product 942309 |
+---------+----------------+
20 rows in set (0.70 sec) 

explain
select
 p.*
from
 product p
inner join product_category pc on 
    pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
 p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
limit 20;

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref           | rows | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
|  1 | SIMPLE      | pc    | ref    | PRIMARY       | PRIMARY | 3       | const           |  499 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY       | PRIMARY | 4       | vl_db.pc.prod_id |    1 |                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

So that's 0.70 seconds cold - ouch.

Hope this helps :)

EDIT

Having just read your reply to my comment above it seems you have one of two choices to make:

create table articles_to_categories
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(article_id, category_id), -- good for queries that lead with article_id = x
key (category_id)
)
engine=innodb;

or.

create table categories_to_articles
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(category_id, article_id), -- good for queries that lead with category_id = x
key (article_id)
)
engine=innodb;

depends on your typical queries as to how you define your clustered PK.

Wednesday, March 31, 2021
 
ammezie
answered 7 Months ago
53

If this is a web application and you are trying to hang onto the transaction from one page to the next, don't; it won't work.

What do you mean by "just after"? If you are doing nothing between the two statements, even a timeout of 1 second should be big enough.

mysql> SET GLOBAL innodb_lock_wait_timeout = 1;
mysql> SELECT @@innodb_lock_wait_timeout;
+----------------------------+
| @@innodb_lock_wait_timeout |
+----------------------------+
|                         50 |
+----------------------------+
mysql> SET SESSION innodb_lock_wait_timeout = 1;
mysql> SELECT @@innodb_lock_wait_timeout;
+----------------------------+
| @@innodb_lock_wait_timeout |
+----------------------------+
|                          1 |
+----------------------------+

To explain GLOBAL vs SESSION for VARIABLES: The GLOBAL value is used to initialize the SESSION value when your connection starts. After that, you can change the SESSION value to affect what you are doing. And changing the GLOBAL value has no effect on your current connection.

Changing the timeout to 1 is quite safe (once you understand GLOBAL vs SESSION). The only thing that will change is the frequency of getting that error.

Wednesday, March 31, 2021
 
daniel__
answered 7 Months ago
58

As per @Rahul's answer I created two queries:

    //INSERT IF NOT EXISTS
        INSERT INTO `archive_course_users`
        (`course_id`, `user_id`, `course_qty`)
        SELECT @new_course_id, '$user_id', `course_qty`
        FROM `current_course_users`
        WHERE NOT EXISTS (
            SELECT `version`
            FROM `archive_courses`
            WHERE `original_course_id` = '$course_id'
            AND `version` = '$current_version'
        )
        AND `course_id` = '$course_id'
        AND `user_id` = '$user_id'

    //UPDATE IF EXISTS
        UPDATE `archive_course_users`
        SET `course_qty` = (SELECT `course_qty` FROM `current_course_users` WHERE `user_id` = '$user_id')
        WHERE EXISTS (
            SELECT `version`
            FROM `archive_courses`
            WHERE `original_course_id` = '$course_id'
            AND `version` = '$current_version'
        )
        AND `original_course_id` = '$course_id'
        AND `user_id` = '$user_id'
Saturday, May 29, 2021
 
Lloydworth
answered 5 Months ago
34

Use a CompiledQuery!

var filter = CompiledQuery.Compile(
    (DatabaseDataContext dc, Record record, string term) =>
        record.Field1.ToLower().Contains(term) ||
        record.Field2.ToLower().Contains(term) ||
        record.Field3.ToLower().Contains(term)
);

var results = from record in DataContext.Records
              where filter(DataContext, record, term)
              select record;

For more information, see How to: Store and Reuse Queries.

Wednesday, July 28, 2021
 
rorymorris
answered 3 Months ago
84
query = (
    session.query(Post)
           .join(Post.tags)     # It's necessary to join the "children" of Post
           .filter(Post.date_out.between(start_date, end_date))
           # here comes the magic: 
           # you can filter with Tag, even though it was not directly joined)
           .filter(Tag.accepted == 1)
)

Disclaimer: this is a veeery reduced example of my actual code, I might have made a mistake while simplifying.

I hope it helps somebody.

Sunday, August 1, 2021
 
Addev
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :