Asked  7 Months ago    Answers:  5   Viewed   66 times

I am wondering if there is away (possibly a better way) to order by the order of the values in an IN() clause.

The problem is that I have 2 queries, one that gets all of the IDs and the second that retrieves all the information. The first creates the order of the IDs which I want the second to order by. The IDs are put in an IN() clause in the correct order.

So it'd be something like (extremely simplified):

SELECT id FROM table1 WHERE ... ORDER BY display_order, name

SELECT name, description, ... WHERE id IN ([id's from first])

The issue is that the second query does not return the results in the same order that the IDs are put into the IN() clause.

One solution I have found is to put all of the IDs into a temp table with an auto incrementing field which is then joined into the second query.

Is there a better option?

Note: As the first query is run "by the user" and the second is run in a background process, there is no way to combine the 2 into 1 query using sub queries.

I am using MySQL, but I'm thinking it might be useful to have it noted what options there are for other DBs as well.

 Answers

24

Use MySQL's FIELD() function:

SELECT name, description, ...
FROM ...
WHERE id IN([ids, any order])
ORDER BY FIELD(id, [ids in order])

FIELD() will return the index of the first parameter that is equal to the first parameter (other than the first parameter itself).

FIELD('a', 'a', 'b', 'c')

will return 1

FIELD('a', 'c', 'b', 'a')

will return 3

This will do exactly what you want if you paste the ids into the IN() clause and the FIELD() function in the same order.

Tuesday, June 1, 2021
 
EurekA
answered 7 Months ago
36

to reverse the order (therefore get last 10 instead of first 10), use DESC instead of ASC

EDIT

Based on your comment:

SELECT * FROM (
  SELECT * 
  FROM chat 
  WHERE (userID = $session AND toID = $friendID) 
    OR (userID = $friendID AND toID = $session)  
  ORDER BY id DESC
  LIMIT 10
) AS `table` ORDER by id ASC
Saturday, May 29, 2021
 
inieto
answered 7 Months ago
95

SQL was designed to be a declarative language, not a procedural one. So the query optimizer should not consider the order of the where clause predicates in determining how to apply them.

I'm probably going to waaaay over-simplify the following discussion of an SQL query optimizer. I wrote one years ago, along these lines (it was tons of fun!). If you really want to dig into modern query optimization, see Dan Tow's SQL Tuning, from O'Reilly.

In a simple SQL query optimizer, the SQL statement first gets compiled into a tree of relational algebra operations. These operations each take one or more tables as input and produce another table as output. Scan is a sequential scan that reads a table in from the database. Sort produces a sorted table. Select produces a table whose rows are selected from another table according to some selection condition. Project produces a table with only certain columns of another table. Cross Product takes two tables and produces an output table composed of every conceivable pairing of their rows.

Confusingly, the SQL SELECT clause is compiled into a relational algebra Project, while the WHERE clause turns into a relational algebra Select. The FROM clause turns into one or more Joins, each taking two tables in and producing one table out. There are other relational algebra operations involving set union, intersection, difference, and membership, but let's keep this simple.

This tree really needs to be optimized. For example, if you have:

select E.name, D.name 
from Employee E, Department D 
where E.id = 123456 and E.dept_id = D.dept_id

with 5,000 employees in 500 departments, executing an unoptimized tree will blindly produce all possible combinations of one Employee and one Department (a Cross Product) and then Select out just the one combination that was needed. The Scan of Employee will produce a 5,000 record table, the Scan of Department will produce a 500 record table, the Cross Product of those two tables will produce a 2,500,000 record table, and the Select on E.id will take that 2,500,000 record table and discard all but one, the record that was wanted.

[Real query processors will try not to materialize all of these intermediate tables in memory of course.]

So the query optimizer walks the tree and applies various optimizations. One is to break up each Select into a chain of Selects, one for each of the original Select's top level conditions, the ones and-ed together. (This is called "conjunctive normal form".) Then the individual smaller Selects are moved around in the tree and merged with other relational algebra operations to form more efficient ones.

In the above example, the optimizer first pushes the Select on E.id = 123456 down below the expensive Cross Product operation. This means the Cross Product just produces 500 rows (one for each combination of that employee and one department). Then the top level Select for E.dept_id = D.dept_id filters out the 499 unwanted rows. Not bad.

If there's an an index on Employee's id field, then the optimizer can combine the Scan of Employee with the Select on E.id = 123456 to form a fast index Lookup. This means that only one Employee row is read into memory from disk instead of 5,000. Things are looking up.

The final major optimization is to take the Select on E.dept_id = D.dept_id and combine it with the Cross Product. This turns it into a relational algebra Equijoin operation. This doesn't do much by itself. But if there's an index on Department.dept_id, then the lower level sequential Scan of Department feeding the Equijoin can be turned into a very fast index Lookup of our one employee's Department record.

Lesser optimizations involve pushing Project operations down. If the top level of your query just needs E.name and D.name, and the conditions need E.id, E.dept_id, and D.dept_id, then the Scan operations don't have to build intermediate tables with all the other columns, saving space during the query execution. We've turned a horribly slow query into two index lookups and not much else.

Getting more towards the original question, let's say you've got:

select E.name 
from Employee E 
where E.age > 21 and E.state = 'Delaware'

The unoptimized relational algebra tree, when executed, would Scan in the 5,000 employees and produce, say, the 126 ones in Delaware who are older than 21. The query optimizer also has some rough idea of the values in the database. It might know that the E.state column has the 14 states that the company has locations in, and something about the E.age distributions. So first it sees if either field is indexed. If E.state is, it makes sense to use that index to just pick out the small number of employees the query processor suspects are in Delaware based on its last computed statistics. If only E.age is, the query processor likely decides that it's not worth it, since 96% of all employees are 22 and older. So if E.state is indexed, our query processor breaks the Select and merges the E.state = 'Delaware' with the Scan to turn it into a much more efficient Index Scan.

Let's say in this example that there are no indexes on E.state and E.age. The combined Select operation takes place after the sequential "Scan" of Employee. Does it make a difference which condition in the Select is done first? Probably not a great deal. The query processor might leave them in the original order in the SQL statement, or it might be a bit more sophisticated and look at the expected expense. From the statistics, it would again find that the E.state = 'Delaware' condition should be more highly selective, so it would reverse the conditions and do that first, so that there are only 126 E.age > 21 comparisons instead of 5,000. Or it might realize that string equality comparisons are much more expensive than integer compares and leave the order alone.

At any rate, all this is very complex and your syntactic condition order is very unlikely to make a difference. I wouldn't worry about it unless you have a real performance problem and your database vendor uses the condition order as a hint.

Wednesday, July 14, 2021
 
hnkk
answered 5 Months ago
28

Of course you can. Your query looks like SQL Server, where this will likely do what you want:

  Select top 3500 a.customer_no 
  From T_CUSTOMER a  WITH (NOLOCK) JOIN
       (Select a1.customer_no
        From VXS_CUST_TKW a1 WITH (NOLOCK)
        Where a1.tkw in (141)
       ) e
       ON e.customer_no = a.customer_no
  Where 1 = 1
  group by a.customer_no
  order by max(a.create_dt) desc;

The equivalent query in MySQL would look like:

  Select a.customer_no 
  From T_CUSTOMER a JOIN
       (Select a1.customer_no
        From VXS_CUST_TKW a1 
        Where a1.tkw in (141)
       ) e
       ON e.customer_no = a.customer_no
  Where 1 = 1
  order by a.create_dt desc
  limit 3500;

I removed the distinct because it may not be necessary. If it is, add it back in.

Tuesday, August 24, 2021
 
unutbu
answered 4 Months ago
39

Change the datatype of score from string (eg varchar/text) to a number (eg int). That should solve the sorting issue.

When the values are sorted on string basis (alphabetically), the '6' in '60' comes before '8'.

As a temporary work-around you can also try order by score+0 asc to try and convert your value into a number.

Monday, August 30, 2021
 
Jeff Yates
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share