Deleting duplicated rows (SOLVED)

tmrd · July 12, 2015, 1:04pm

Hi,

I have a junction table like below.

junction_id (primary key & auto increment)
person_id (foreign key from person table)
person_job_id (foreign key from job table)

The thing is, there are multiple records in this table and I want to remove them except one. For instance, person1 can have three jobs, but these should be unique job_id’s. Some of these people have been matched with the same job_id twice or multiple times.

How can I delete the duplicated ones only?

EDIT : The query below worked.

DELETE n1 FROM personxperson_category n1, personxperson_category n2 WHERE n1.joint_id < n2.joint_id AND n1.person_id = n2.person_id AND n1.person_job_id = n2.person_job_id

RT_ · July 13, 2015, 1:14pm

Now you have cleaned up, if you would like to stop getting the problem again, you could add a UNIQUE index on person_id and person_job_id.

ALTER TABLE junction ADD UNIQUE ('person_id' ,'person_job_id');

r937 · July 13, 2015, 3:57pm

[quote=“RT_, post:2, topic:195589, full:true”]… you could add a UNIQUE index on person_id and person_job_id.[/quote]this actually should be the primary key, and the auto_increment column should be dropped

RT_ · July 13, 2015, 5:51pm

Assuming the rest of the database can be modified to accommodate that change.

r937 · July 13, 2015, 6:31pm

sure, except i cannot think of any scenario where there would be a consequence

RT_ · July 13, 2015, 6:44pm

I was thinking along the lines of the possibility that a lot of various queries may be in the middle-ware. Depending on how the software is implemented, it may be too costly/risky (business wise) to alter the queries, and test the results. Using UNIQUE, while technically incorrect, could be a cheaper business option, because it doesn’t interfere with the legacy implementation as much.

r937 · July 13, 2015, 11:24pm

i completely understand your concern

yet i still cannot imagine a query which would be affected –

such a query would have to directly reference the auto_increment column –

why would you need to do that? how is the id value of any use when it’s the person/job foreign keys that hold 100% of the relationship information?

RT_ · July 14, 2015, 6:55am

I don’t think you would need to do that. And if the auto increment column isn’t used then getting rid of it would definitely be the best move.

But, it is very common for people to come up with unexpected ways of doing things. It could be possible that the id is referenced as a FK by a table that holds information about the relationship between the person and job for instance. Then there could be a series of queries that join that table to other results. Or there might be a query that (for some reason) orders by id. I’ve seen a lot of funny ways people have implemented things.

Mittineague · July 14, 2015, 2:17pm

True enough. I guess to be safe instead of Deleting the column it could be renamed, and then unit tests run,
Problems found - rename back
No problems - delete to improve efficiency

system · October 13, 2015, 9:29pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.