Help with SQL

dkchapuis · February 10, 2010, 6:19pm

I need some help with writing an SQL query.

I have 2 tables with some duplicate data that I would like to associate with each other.
Table1 - ID-Group1
ID - primary key
FirstName
LastName
Address
City
State
Zip

Table2 - IDGroup2
ID - primary key
FirstName
LastName
Address
City
State
Zip
Table1ID

There are a large number of duplicate Firstname/lastName/add/city/state/zip in Table1 and Table2, but they have different ID’s.

Is there a way to write a query to look for matches between the tables and then update Table1ID on the appropriate Table2 record?

thanks

m300zx · February 10, 2010, 6:47pm

I’m very new to all this but I think you need a JOIN query.

SpacePhoenix · February 10, 2010, 7:35pm

How many groups can a person be a member of?

Does any person have more than one (different) addresses to their name?

SpacePhoenix · February 10, 2010, 10:32pm

In post #4 you say that there are lots of misspelled/alternate spelling of name/address in both Table One and Table Two I suggest that you first identify records with spelling mistakes and correct them, then where there are alternate spellings, choose one.

dkchapuis · February 10, 2010, 7:45pm

It has been years of bad information management, but ultimately I have to sort through it.

There are lots of what-if scenarios here that will cause problems - misspelled/alternate spelling of name/address, duplicate records in Table1 and/or Table2, same person with two different addresses.

I am debating setting up a seperate table just to manage the duplicates where I could associate one ID with multiple other IDs, but I am not sure the best way to do that either.

SpacePhoenix · February 10, 2010, 9:28pm

Possibly one table for addresses, maybe using a lookup table between the address table and the table storing a person’s name. Does any address have more than one person against it?

Is a person allowed to be a member of more than one group?

dkchapuis · February 10, 2010, 9:33pm

yes, there are husband/wife and/or multiple residents of the same address in our data.

Group1 and Group2 has thousands of duplicates. That is what I am trying to indentify & clean up

SpacePhoenix · February 10, 2010, 9:57pm

Can I suggest:

Group - GroupMembership - Member - MemberAdress - Address

Group: Records details about each group
Member: Records details about members
GroupMembership: Two foreign key fields, one with group id and one with member id
Address: Records a single address
MemberAddress: Two foreign key fields, one with member id and one with address id

dkchapuis · February 10, 2010, 10:05pm

phoenix, I think I follow what you are saying. I assume you are saying to merge the two tables into these five tables. and I understand the two foreign key fields.

But I dont know how to convert the data in order to programmatically ID the duplicates

dkchapuis · February 11, 2010, 12:09am

That needs to be done, but that is about a month-long data entry job.

In the mean time, any help on how I could write some SQL to tag the ones I can programmatically would be great.

oddz · February 11, 2010, 1:42am

Is the combination of first name and last name what you will be using to identify duplicate user entities? The obvious problem with this approach is that two people with the same first and last name will be combined into a single person. Not ideal but is this something you’re willing to except considering there isn’t really any other way to differentiate separate users with the same first and last name? You could maybe lag and isolate users with the same first name and last that appear more then x times in the table. For example, if John Smith appears 10 times its most likely the case there are separate users with the same name?

dkchapuis · February 11, 2010, 2:30am

every field would have to match - fistname, lastname, address, city, state, zip

I