Asked  7 Months ago    Answers:  5   Viewed   49 times

This is a pretty simple question and I'm assuming the answer is "It doesn't matter" but I have to ask anyway...

I have a generic sql statement built in PHP:

$sql = 'SELECT * FROM `users` WHERE `id` IN(' . implode(', ', $object_ids) . ')';

Assuming prior validity checks ($object_ids is an array with at least 1 item and all numeric values), should I do the following instead?

if(count($object_ids) == 1) {
    $sql = 'SELECT * FROM `users` WHERE `id` = ' . array_shift($object_ids);
} else {
    $sql = 'SELECT * FROM `users` WHERE `id` IN(' . implode(', ', $object_ids) . ')';
}

Or is the overhead of checking count($object_ids) not worth what would be saved in the actual sql statement (if any at all)?

 Answers

97

Neither of them really matter in the big scope of things. The network latency in communicating with the database will far outweigh either the count($object_ids) overhead or the = vs IN overhead. I would call this a case of premature optimization.

You should profile and load-test your application to learn where the real bottlenecks are.

Wednesday, March 31, 2021
 
MassiveAttack
answered 7 Months ago
15

You are accessing 420 rows by primary key which will probably lead to an index access path. This could access 2 index pages and one data page per key. If these are in cache, the query should run fast. If not, every page access that goes to disk will incur the usual disk latency. If we assume 5ms disk latency and 80% cache hits, we arrive at 420*3*0.2*5ms=1.2 seconds which is on the order of what you're seeing.

Wednesday, March 31, 2021
 
commonpike
answered 7 Months ago
16

Why not use MySQL's datetime field and the vast array of functions that make date-related queries a snap?

Wednesday, March 31, 2021
 
saad
answered 7 Months ago
80

Creating 20,000 tables is a bad idea. You'll need 40,000 tables before long, and then more.

I called this syndrome Metadata Tribbles in my book SQL Antipatterns. You see this happen every time you plan to create a "table per X" or a "column per X".

This does cause real performance problems when you have tens of thousands of tables. Each table requires MySQL to maintain internal data structures, file descriptors, a data dictionary, etc.

There are also practical operational consequences. Do you really want to create a system that requires you to create a new table every time a new user signs up?

Instead, I'd recommend you use MySQL Partitioning.

Here's an example of partitioning the table:

CREATE TABLE statistics (
  id INT AUTO_INCREMENT NOT NULL,
  user_id INT NOT NULL,
  PRIMARY KEY (id, user_id)
) PARTITION BY HASH(user_id) PARTITIONS 101;

This gives you the benefit of defining one logical table, while also dividing the table into many physical tables for faster access when you query for a specific value of the partition key.

For example, When you run a query like your example, MySQL accesses only the correct partition containing the specific user_id:

mysql> EXPLAIN PARTITIONS SELECT * FROM statistics WHERE user_id = 1G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: statistics
   partitions: p1    <--- this shows it touches only one partition 
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 2
        Extra: Using where; Using index

The HASH method of partitioning means that the rows are placed in a partition by a modulus of the integer partition key. This does mean that many user_id's map to the same partition, but each partition would have only 1/Nth as many rows on average (where N is the number of partitions). And you define the table with a constant number of partitions, so you don't have to expand it every time you get a new user.

You can choose any number of partitions up to 1024 (or 8192 in MySQL 5.6), but some people have reported performance problems when they go that high.

It is recommended to use a prime number of partitions. In case your user_id values follow a pattern (like using only even numbers), using a prime number of partitions helps distribute the data more evenly.


Re your questions in comment:

How could I determine a resonable number of partitions?

For HASH partitioning, if you use 101 partitions like I show in the example above, then any given partition has about 1% of your rows on average. You said your statistics table has 30 million rows, so if you use this partitioning, you would have only 300k rows per partition. That is much easier for MySQL to read through. You can (and should) use indexes as well -- each partition will have its own index, and it will be only 1% as large as the index on the whole unpartitioned table would be.

So the answer to how can you determine a reasonable number of partitions is: how big is your whole table, and how big do you want the partitions to be on average?

Shouldn't the amount of partitions grow over time? If so: How can I automate that?

The number of partitions doesn't necessarily need to grow if you use HASH partitioning. Eventually you may have 30 billion rows total, but I have found that when your data volume grows by orders of magnitude, that demands a new architecture anyway. If your data grow that large, you probably need sharding over multiple servers as well as partitioning into multiple tables.

That said, you can re-partition a table with ALTER TABLE:

ALTER TABLE statistics PARTITION BY HASH(user_id) PARTITIONS 401;

This has to restructure the table (like most ALTER TABLE changes), so expect it to take a while.

You may want to monitor the size of data and indexes in partitions:

SELECT table_schema, table_name, table_rows, data_length, index_length
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE partition_method IS NOT NULL;

Like with any table, you want the total size of active indexes to fit in your buffer pool, because if MySQL has to swap parts of indexes in and out of the buffer pool during SELECT queries, performance suffers.

If you use RANGE or LIST partitioning, then adding, dropping, merging, and splitting partitions is much more common. See http://dev.mysql.com/doc/refman/5.6/en/partitioning-management-range-list.html

I encourage you to read the manual section on partitioning, and also check out this nice presentation: Boost Performance With MySQL 5.1 Partitions.

Wednesday, June 16, 2021
 
coolguy
answered 5 Months ago
19

PHP is attempting to open a connection to localhost. Because your computer is connected to your network via IPv6 it's trying the IPv6 version of 'localhost' first, which is which is an IP address of ::1

http://en.wikipedia.org/wiki/IPv6_address#Special_addresses

::1/128 — The loopback address is a unicast localhost address. If an application in a host sends packets to this address, the IPv6 stack will loop these packets back on the same virtual interface (corresponding to 127.0.0.0/8 in IPv4).

It looks like your MySQL server isn't listening to that address, instead it's only bound to an IPv4 address and so once PHP fails to open the connection it falls back and tries to open localhost via IPv4 aka 127.0.0.1

I personally prefer to use either IP addresses or use ether the Windows hosts file or Mac equivalent to define 'fake' domain names and then use those when connecting to MySQL, which resolve to IP addresses. Either way I can know exactly whether an IPv4 or IPv6 address will be used.

Both MySQL and Apache support IPv6 but you have to tell them to use an IPv6 address explicitly. For MySQL see: http://dev.mysql.com/doc/refman/5.5/en/ipv6-server-config.html

For Apache config see: http://httpd.apache.org/docs/2.2/bind.html

Apache supports multiple IP addresses so you can use both at once - if the network card in the machine has both an IPv4 and IPv6 address. MySQL only supports one address.

Wednesday, June 30, 2021
 
Stefan
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share