Rails Migration: When you can’t add a uniqueness constraint because you already have duplicates

LEFT OUTER JOIN `reports` ON `reports`.`id` = `people`.`report_id`
LEFT JOIN `addresses` ON `addresses`.`people_id` = `people`.`id`
LEFT JOIN `favorite_colors` ON `favorite_colors`.`people_id` = `people.id`
Borrowed from https://robots.thoughtbot.com/the-perils-of-uniqueness-validations

What’s the easy fix?

The “easy” fix is simple — Add an index to the column you want to enforce uniqueness upon, then add a uniqueness constraint:

class AddUniquenessConstraintToFavoriteColors < ActiveRecord::Migration
def change
remove_index :favorite_colors, :person_id
add_index :favorite_colors, :person_id, unique: true
end
end

Up/down migration

I am going to be explicit about the up and down, for the rollback:

class AddUniquenessConstraintToFavoriteColors < ActiveRecord::Migration
def up
remove_index :favorite_colors, :person_id
# you can't MODIFY the index, just remove it, then re-add it with the changes
add_index :favorite_colors, :person_id, unique: true, algorithm: :inplace
# unique forces uniqueness (duh)
end

def down
remove_index :favorite_colors, :person_id
# removing the index that has uniqueness constraint
add_index :favorite_colors, :person_id
# adding one without the constraint
end
end
== 20180914203948 AddUniquenessConstraintToFavoriteColors: migrating ========================
-- remove_index(:favorite_colors, :person_id)
-> 0.0388s
-- add_index(:favorite_colors, :person_id, {:unique=>true, :algorithm=>:inplace})
rake aborted!
StandardError: An error has occurred, all later migrations canceled:

Mysql2::Error: Duplicate entry '44' for key 'index_favorite_colors_on_person_id': CREATE UNIQUE INDEX `index_favorite_colors_on_person_id` ON `favorite_colors` (`person_id`) ALGORITHM = INPLACE

Clean up duplicates and add uniqueness constraint

The big question was:

The Solution

The rake task was not a good solution. Fortunately, I work with many people who are much smarter than I, and they put me on the right track. I needed to update the migration to run a query that would:

  1. Find all duplicate rows
  2. Get the ID’s of those duplicate rows
  3. Trim one of the row IDs of the “duplicates” list
  4. Delete all the remaining IDs.
SELECT substring(dups.id_groups, position(','IN dups.id_groups) + 1)
FROM
(SELECT group_concat(fc.id) AS id_groups
FROM favorite_colors fc
WHERE EXISTS
(
SELECT 1
FROM favorite_colors tmp
WHERE tmp.person_id = fc.person_id
LIMIT 1, 1
)
GROUP BY fc.person_id
) AS dups
CREATE TABLE favorite_colors (
ID INT AUTO_INCREMENT PRIMARY KEY ,
person_id INT,
color VARCHAR(255)
);
INSERT INTO `favorite_colors` (`ID`, `person_id`, `color`)
VALUES
(1, 1, 'yellow'),
(2, 1, 'yellow'),
(3, 2, 'blue'),
(4, 2, 'blue'),
(5, 3, 'green'),
(6, 2, 'purple'),
(7, 3, 'yellow'),
(8, 4, 'black');
SELECT  * 
FROM favorite_colors fc
WHERE EXISTS
(
SELECT 1
FROM favorite_colors tmp
WHERE tmp.person_id = fc.person_id
LIMIT 1, 1
)
ORDER BY fc.person_id

Limit count, offset

SELECT 1
FROM favorite_colors
LIMIT 1, 1
SELECT  *
FROM favorite_colors
LIMIT 2, 4
SELECT  *
FROM favorite_colors
LIMIT 2
SELECT  *
FROM favorite_colors
LIMIT 2, 2

EXISTS

In the docs, we learn:

Get IDs from results

We don’t want to SELECT *, now, we want to SELECT group_concat(favorite_colors.id) AS id_groups.

SELECT  group_concat(fc.id) AS id_groups
FROM favorite_colors fc
WHERE EXISTS
(
SELECT 1
FROM favorite_colors tmp
WHERE tmp.person_id = fc.person_id
LIMIT 1, 1
)
GROUP BY fc.person_id
SELECT substring(dups.id_groups, position(','IN dups.id_groups) + 1)
FROM (
SELECT group_concat(fc.id) AS id_groups
FROM favorite_colors fc
WHERE EXISTS
(
SELECT 1
FROM favorite_colors tmp
WHERE tmp.person_id = fc.person_id
LIMIT 1, 1
)
GROUP BY fc.person_id
) AS dups
class AddUniquenessConstraintToFavoriteColors < ActiveRecord::Migration
disable_ddl_transaction!

def up
results = execute <<-SQL
SELECT substring(dups.id_groups, position(','IN dups.id_groups) + 1)
FROM
(SELECT group_concat(fc.id) AS id_groups
FROM favorite_colors fc
WHERE EXISTS
(
SELECT 1
FROM favorite_colors tmp
WHERE tmp.person_id = fc.person_id
LIMIT 1, 1
)
GROUP BY fc.person_id
) AS dups
SQL

results.each do |id_array|
# I know find_in_batches would normally be a better fit
FavoriteColors.where(id: id_array).delete_all
end
remove_index :favorite_colors, :person_id
add_index :favorite_colors, :person_id, unique: true, algorithm: :inplace
end
def down
remove_index :favorite_colors, :person_id
add_index :favorite_colors, :person_id
end
end

In Conclusion

You can use this pattern to add a uniqueness constraint to a table that already has duplicate values. This will clean out duplicates, but leave original values, and will prevent additional duplicates from being written to the table.

Update: Do not leave a column without an index at any point in the migration

I made a mistake in all this.

results.each do |id_array|
# I know find_in_batches would normally be a better fit
FavoriteColors.where(id: id_array).delete_all
end
remove_index :favorite_colors, :person_id
add_index :favorite_colors, :person_id, unique: true, algorithm: :inplace
end
results.each do |id_array|
# I know find_in_batches would normally be a better fit
FavoriteColors.where(id: id_array).delete_all
end
# rename existing index
rename_index :favorite_colors, :favorite_colors_non_unique

# add the uniqueness index (if we didn't rename the existing index, we'd get an error saying "an index by this name already exists")
add_index :favorite_colors, :person_id, unique: true, algorithm: :inplace

# This line won't get run unless the above index has been successfully added. We're removing it by the name we've given it:
remove_index :favorite_colors_non_unique

end

Useful additional resources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Josh Thompson

Josh Thompson

Software, rock climbing, books. I love to learn. https://josh.works