Deleting duplicated rows (SOLVED)

Hi,

I have a junction table like below.

junction_id (primary key & auto increment)
person_id (foreign key from person table)
person_job_id (foreign key from job table)

The thing is, there are multiple records in this table and I want to remove them except one. For instance, person1 can have three jobs, but these should be unique job_id’s. Some of these people have been matched with the same job_id twice or multiple times.

How can I delete the duplicated ones only?

EDIT : The query below worked.

DELETE n1 FROM personxperson_category n1, personxperson_category n2 WHERE n1.joint_id < n2.joint_id AND n1.person_id = n2.person_id AND n1.person_job_id = n2.person_job_id

Now you have cleaned up, if you would like to stop getting the problem again, you could add a UNIQUE index on person_id and person_job_id.

ALTER TABLE junction ADD UNIQUE ('person_id' ,'person_job_id');
1 Like

[quote=“RT_, post:2, topic:195589, full:true”]… you could add a UNIQUE index on person_id and person_job_id.[/quote]this actually should be the primary key, and the auto_increment column should be dropped

1 Like

Assuming the rest of the database can be modified to accommodate that change.

sure, except i cannot think of any scenario where there would be a consequence

I was thinking along the lines of the possibility that a lot of various queries may be in the middle-ware. Depending on how the software is implemented, it may be too costly/risky (business wise) to alter the queries, and test the results. Using UNIQUE, while technically incorrect, could be a cheaper business option, because it doesn’t interfere with the legacy implementation as much.

i completely understand your concern

yet i still cannot imagine a query which would be affected –

such a query would have to directly reference the auto_increment column –

why would you need to do that? how is the id value of any use when it’s the person/job foreign keys that hold 100% of the relationship information?

I don’t think you would need to do that. And if the auto increment column isn’t used then getting rid of it would definitely be the best move.

But, it is very common for people to come up with unexpected ways of doing things. It could be possible that the id is referenced as a FK by a table that holds information about the relationship between the person and job for instance. Then there could be a series of queries that join that table to other results. Or there might be a query that (for some reason) orders by id. I’ve seen a lot of funny ways people have implemented things.

True enough. I guess to be safe instead of Deleting the column it could be renamed, and then unit tests run,
Problems found - rename back
No problems - delete to improve efficiency

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.