Difference between revisions of "SQL recipes"

Revision as of 22:41, 3 January 2012

I use a database often, mostly for structuring and holding the data at EnergyTeachers.org. I usually use MySQL, but also sometimes sqlite. Here are some of my tricks.

Finding Duplicates

It's one thing to identify data that is duplicate, and you see hints how to do it by counting members in a group grouped by what field(s) should be duplicate and showing those groupings that have a count larger than one. But we often want to do something to the duplicate, like delete it. A better method in SQL is to create two tables from one and find pairings across them. Here's the code for showing all duplicate rows where two fields match exactly:

SELECT t1.*
FROM tablename as t1, tablename as t2
WHERE t1.uniqueid<>t2.uniqueid AND t1.field1=t2.field1 AND t1.field2=t2.field2
ORDER BY t1.field1, t1.field2

The first WHERE condition ensures we don't match rows to themselves. This only works on tables where there is a unique id.

We use ORDER BY to show the duplicates in adjacent rows in the result.

Difference between revisions of "SQL recipes"

Revision as of 22:41, 3 January 2012

Finding Duplicates

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Advertisements

@@ Line 1: / Line 1: @@
-Stands back from the kyeboard in amazement! Thanks!
+I use a database often, mostly for structuring and holding the data at EnergyTeachers.org. I usually use MySQL, but also sometimes sqlite. Here are some of my tricks.
+===Finding Duplicates===
+It's one thing to identify data that is duplicate, and you see hints how to do it by counting members in a group grouped by what field(s) should be duplicate and showing those groupings that have a count larger than one. But we often want to do something to the duplicate, like delete it. A better method in SQL is to create two tables from one and find pairings across them. Here's the code for showing all duplicate rows where two fields match exactly:
+ SELECT t1.*
+ FROM tablename as t1, tablename as t2
+ WHERE t1.uniqueid<>t2.uniqueid AND t1.field1=t2.field1 AND t1.field2=t2.field2
+ ORDER BY t1.field1, t1.field2
+The first WHERE condition ensures we don't match rows to themselves. This only works on tables where there is a unique id.
+We use ORDER BY to show the duplicates in adjacent rows in the result.
+[[Category:Computers]]