mittineague — 2011-01-22T03:28:49-05:00 — #1
OK, I've been collecting data in a multi-dimensional array. The array is getting quite large now, and it might be nice to put the data online so others could use it too. Before it gets any more difficult to maintain I'm thinking it's about time to get this into a database.
I'll eventually be doing this with PHP and MySQL, but I don't think that's important at this point, I should be able to do OK once I figure out how to get started, and I'm sure I'll be able to import the data into tables, but ....
I can't figure out how to determine what table relationships to use.
I have 3 types of data. By far, the vast majority of data sets are a single Member ID with a single IP and a single URL. If this was the case for all of the data sets I would have little to no problem. Unfortunately there are a few "troublemakers".
and each has at least one IP - but may have more
and may have none - or several - URLs
IPs can be associated with one or more MemberIds
and none or several URLs
URLs can be associated with one or more MemberIds
and one or several IPs
At some point I may want to add a "comment" type.
I'm not looking for the finished answer, but I would like to be pointed to something that explains what logic to use in determining how relationships dictate table structure, or if someone's up to it, their own explanation. What I'm looking for is a kind of "logic kick-start".
guido2004 — 2011-01-22T04:45:54-05:00 — #2
Assuming you have more member data than just the id (member name for example), the first table needed would be 'members'
Then at first sight it would be logical to say that you'd need 'memberips' and 'memberurls' tables (1 to many relationship).
But, if you have more info about each IP and url (wouldn't know what, but who knows), then you might need the tables 'ips' and 'urls' to store that info, and then the 'memberips' and 'memberurls' would become the connection between the 'members' and the 'ips' and 'urls' tables (many to many relationship).
The really complicating factor (IMO) is the relation between ips and urls. You could make a 'ipurl' table, but that would only work if the relations between ip and url are completely independent. And somehow I have the feeling that they aren't, that the relationship between ip and url depends also on their relationship with member. Right?
Maybe some more info about the data, its meaning and its purpose would be helpful?
And where would you want to add that comment type? And what do you mean by comment type?
mittineague — 2011-01-22T12:58:49-05:00 — #3
Simplified, the array looks like:
-- sitepoint (same as 1)
-- 126.96.36.199 (same as 1)
-- 188.8.131.52 (same as 3)
-- apache (same as 3)
-- google (same as 2)
So tables might look like:
0 - 12345
1 - 12348
2 - 12356
3 - 12362
4 - 23467
5 - 23469
6 - 23567
0 - 184.108.40.206
1 - 220.127.116.11
2 - 18.104.22.168
3 - 22.214.171.124
4 - 126.96.36.199
5 - 188.8.131.52
5 - 184.108.40.206
6 - 220.127.116.11
6 - 18.104.22.168
0 - sitepoint
1 - google
2 - apache
2 - php
4 - sitepoint
5 - mysql
6 - apache
6 - google
But that doesn't feel right to me. And because MemberIds only ever occur once, I get the feeling I could be using them to advantage somehow.
guido2004 — 2011-01-22T14:15:28-05:00 — #4
First of all, you don't put duplicate values in the ip and url tables. That would defeat the purpose of those tables (take away data redundancy).
Second, I don't know how much you've simplified your data, but if there's no data related to the memberid (like membername, password, etc) but the ip numbers and the urls, then two tables would be enough:
the 'ip' table with two columns:
the 'url' table with two columns
Or even just one table with three columns:
typeofdata ('ip' or 'url')
mittineague — 2011-01-22T17:27:15-05:00 — #5
You mean substitute the MemberIds for the array keys in my previous post's example?
Might it be better to keep the IPs and URLs unique like:
22.214.171.124 - 12345 - 23469
126.96.36.199 - 12348
188.8.131.52 - 12356 - 23567
184.108.40.206 - 12362
220.127.116.11 - 23467
18.104.22.168 - 23469
22.214.171.124 - 23567
sitepoint - 12345 - 23467
google - 12348 - 23567
apache - 12356 - 23567
php - 12356
mysql - 23469
I'd like to be able to know the MemberId, but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.
guido2004 — 2011-01-22T17:45:01-05:00 — #6
I guess what you'd need to do is study a bit of database normalization. Google for it, or maybe someone already has a link to a valid resource by hand (I don't).
If you want to put your data in a database, don't create repetitive columns (memberid1, memberid2). It limits the number of possible memberids linked to a url or ip. And it gets real difficult retrieving the data.
If you build the database correctly (normalized) the code needed to manage it will be minimal.
Of course, everything I'm writing here is based on the scarce info you posted and a lot of assumptions, but you should be able to put all data in the three column table (like I said before):
typeofdata ('ip' or 'url')
where memberid would be the values 12345 etc.
but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.
What are the 'connected values'? Aren't they the ip and/or url values connected to each memberid? Why do you want to group them by ip/url value? They aren't grouped like that in your array.
That's why I asked for more info about the data, its meaning and its purpose. It's not easy to give a meaningful answer based on so little info.
mittineague — 2011-01-22T18:34:42-05:00 — #7
OK, I'll take a look for "normalization" and see if I can find anything that can work it's way through my thick skull
mittineague — 2011-02-22T22:18:53-05:00 — #8
Well, I did some studying re Normalization. I then took some time away from it hoping it might help me get my head around it, but I'm afraid I'm still rather clueless.
I was reluctant to divulge my intentions, but what the hey.
I'm developing an anti-SPAM tool for Mods of a popular forum
Even if I leave out IPs (low ROI for higher overhead) and Comments, I'm still at a loss determining how to not have rows with unused fields (up to around a dozen of them per row). eg.
Member_1 - URL_1 - URL_2 - URL_3
Member_2 - URL_1 - NULL - NULL
Member_3 - URL_1 - URL_2 - NULL
Member_4 - URL_1 - NULL - NULL
Member_5 - URL_1 - NULL - NULL
URL_1 - Member_1 - Member_2 - Member_3
URL_2 - Member_1 - NULL - NULL
URL_3 - Member_1 - Member_2 - NULL
URL_4 - Member_1 - NULL - NULL
URL_5 - Member_1 - NULL - NULL
and Id rather not do 2 fields with comma delimiter
Member_1 - URL_1, URL_2, URL_3
Member_2 - URL_1
Member_3 - URL_1, URL_2
Member_4 - URL_1
Member_5 - URL_1
URL_1 - Member_1, Member_2, Member_3
URL_2 - Member_1
URL_3 - Member_1, Member_2
URL_4 - Member_1
URL_5 - Member_1
Am I over-thinking this or am I as clueless as I feel?
r937 — 2011-02-22T22:42:09-05:00 — #9
your rows should have member and url only, i.e. one of each
if a member has two urls, there are two rows, and so on
that's properly normalized
mittineague — 2011-02-23T01:10:06-05:00 — #10
So something like
AI_0 - Member_1 - URL_1
AI_1 - Member_1 - URL_2
AI_2 - Member_1 - URL_3
AI_3 - Member_2 - URL_1
AI_4 - Member_3 - URL_1
AI_5 - Member_3 - URL_2
AI_6 - Member_4 - URL_1
AI_7 - Member_5 - URL_1
I guess what was hanging me up was thinking I could use the unique memberIds/URLs for keys instead of an auto_increment.
Thanks for the help
r937 — 2011-02-23T08:27:35-05:00 — #11
what are the AI thingies? auto_increments? you do not need them
mittineague — 2011-02-23T10:52:25-05:00 — #12
It might be because every example I've ever seen seems to have dratted auto_increment Id keys, but I thought (hmmm, now that I think of it, why?) that a row needed a key, and the key needed to be unique.
I've missed something basic since Wayback.
r937 — 2011-02-23T10:59:48-05:00 — #13
you're right, every table should have a primary key
CREATE TABLE member_urls
( member_id INTEGER NOT NULL
, url_id INTEGER NOT NULL
, PRIMARY KEY ( member_id , url_id )
CREATE TABLE member_urls
( member_id INTEGER NOT NULL
, url VARCHAR(123) NOT NULL
, PRIMARY KEY ( member_id , url )
(note: foreign keys omitted for brevity)
mittineague — 2011-02-23T11:13:09-05:00 — #14
Ahh, the light comes on.
Using both as the key would be unique key values.
I've never seen a CREATE like that, but that's my fault for using the PHP documentation as a tutor. They're great for the specific function they're dealing with but they often short-cut other parts of the code example to simplify it.
It is simple
Kind of like how I can miss seeing something in the store when it's on the shelf in front at eye level but will spot it when it's top/bottom or behind :eye: