I need some help with writing an SQL query.
I have 2 tables with some duplicate data that I would like to associate with each other.
Table1 - ID-Group1
ID - primary key
Table2 - IDGroup2
ID - primary key
There are a large number of duplicate Firstname/lastName/add/city/state/zip in Table1 and Table2, but they have different ID's.
Is there a way to write a query to look for matches between the tables and then update Table1ID on the appropriate Table2 record?
I'm very new to all this but I think you need a JOIN query.
How many groups can a person be a member of?
Does any person have more than one (different) addresses to their name?
In post #4 you say that there are lots of misspelled/alternate spelling of name/address in both Table One and Table Two I suggest that you first identify records with spelling mistakes and correct them, then where there are alternate spellings, choose one.
It has been years of bad information management, but ultimately I have to sort through it.
There are lots of what-if scenarios here that will cause problems - misspelled/alternate spelling of name/address, duplicate records in Table1 and/or Table2, same person with two different addresses.
I am debating setting up a seperate table just to manage the duplicates where I could associate one ID with multiple other IDs, but I am not sure the best way to do that either.
Possibly one table for addresses, maybe using a lookup table between the address table and the table storing a person's name. Does any address have more than one person against it?
Is a person allowed to be a member of more than one group?
yes, there are husband/wife and/or multiple residents of the same address in our data.
Group1 and Group2 has thousands of duplicates. That is what I am trying to indentify & clean up
Can I suggest:
Group - GroupMembership - Member - MemberAdress - Address
Group: Records details about each group
Member: Records details about members
GroupMembership: Two foreign key fields, one with group id and one with member id
Address: Records a single address
MemberAddress: Two foreign key fields, one with member id and one with address id
phoenix, I think I follow what you are saying. I assume you are saying to merge the two tables into these five tables. and I understand the two foreign key fields.
But I dont know how to convert the data in order to programmatically ID the duplicates
That needs to be done, but that is about a month-long data entry job.
In the mean time, any help on how I could write some SQL to tag the ones I can programmatically would be great.
Is the combination of first name and last name what you will be using to identify duplicate user entities? The obvious problem with this approach is that two people with the same first and last name will be combined into a single person. Not ideal but is this something you're willing to except considering there isn't really any other way to differentiate separate users with the same first and last name? You could maybe lag and isolate users with the same first name and last that appear more then x times in the table. For example, if John Smith appears 10 times its most likely the case there are separate users with the same name?
every field would have to match - fistname, lastname, address, city, state, zip