Asked  6 Months ago    Answers:  5   Viewed   31 times

I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?

 Answers

42

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

Tuesday, June 1, 2021
 
treeface
answered 6 Months ago
94

You do not need to do that. You are using prepared statements, which escape the variables automatically.

Wednesday, March 31, 2021
 
laurent
answered 9 Months ago
30
SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

  • Recent PostgreSQL supports it.
  • SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
  • MySQL is unpredictable and you need sql_mode=only_full_group_by:
    • GROUP BY lname ORDER BY showing wrong results;
    • Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
  • Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Tuesday, June 1, 2021
 
JakeGR
answered 6 Months ago
81

When you use just "localhost" the MySQL client library tries to use a Unix domain socket for the connection instead of a TCP/IP connection. The error is telling you that the socket, called MySQL, cannot be used to make the connection, probably because it does not exist (error number 2).

From the MySQL Documentation:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port or -P option is given to specify a port number. To ensure that the client makes a TCP/IP connection to the local server, use --host or -h to specify a host name value of 127.0.0.1, or the IP address or name of the local server. You can also specify the connection protocol explicitly, even for localhost, by using the --protocol=TCP option.

There are a few ways to solve this problem.

  1. You can just use TCP/IP instead of the Unix socket. You would do this by using 127.0.0.1 instead of localhost when you connect. The Unix socket might by faster and safer to use, though.
  2. You can change the socket in php.ini: open the MySQL configuration file my.cnf to find where MySQL creates the socket, and set PHP's mysqli.default_socket to that path. On my system it's /var/run/mysqld/mysqld.sock.
  3. Configure the socket directly in the PHP script when opening the connection. For example:

    $db = new MySQLi('localhost', 'kamil', '***', '', 0, 
                                  '/var/run/mysqld/mysqld.sock')
    
Tuesday, June 1, 2021
 
Anand
answered 6 Months ago
56
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
Wednesday, June 2, 2021
 
Xatoo
answered 6 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share