OK, I’ve been collecting data in a multi-dimensional array. The array is getting quite large now, and it might be nice to put the data online so others could use it too. Before it gets any more difficult to maintain I’m thinking it’s about time to get this into a database.
I’ll eventually be doing this with PHP and MySQL, but I don’t think that’s important at this point, I should be able to do OK once I figure out how to get started, and I’m sure I’ll be able to import the data into tables, but …
I can’t figure out how to determine what table relationships to use.
I have 3 types of data. By far, the vast majority of data sets are a single Member ID with a single IP and a single URL. If this was the case for all of the data sets I would have little to no problem. Unfortunately there are a few “troublemakers”.
MemberIds
and each has at least one IP - but may have more
and may have none - or several - URLs
IPs can be associated with one or more MemberIds
and none or several URLs
URLs can be associated with one or more MemberIds
and one or several IPs
At some point I may want to add a “comment” type.
I’m not looking for the finished answer, but I would like to be pointed to something that explains what logic to use in determining how relationships dictate table structure, or if someone’s up to it, their own explanation. What I’m looking for is a kind of “logic kick-start”.
Assuming you have more member data than just the id (member name for example), the first table needed would be ‘members’
Then at first sight it would be logical to say that you’d need ‘memberips’ and ‘memberurls’ tables (1 to many relationship).
But, if you have more info about each IP and url (wouldn’t know what, but who knows), then you might need the tables ‘ips’ and ‘urls’ to store that info, and then the ‘memberips’ and ‘memberurls’ would become the connection between the ‘members’ and the ‘ips’ and ‘urls’ tables (many to many relationship).
The really complicating factor (IMO) is the relation between ips and urls. You could make a ‘ipurl’ table, but that would only work if the relations between ip and url are completely independent. And somehow I have the feeling that they aren’t, that the relationship between ip and url depends also on their relationship with member. Right?
Maybe some more info about the data, its meaning and its purpose would be helpful?
And where would you want to add that comment type? And what do you mean by comment type?
First of all, you don’t put duplicate values in the ip and url tables. That would defeat the purpose of those tables (take away data redundancy).
Second, I don’t know how much you’ve simplified your data, but if there’s no data related to the memberid (like membername, password, etc) but the ip numbers and the urls, then two tables would be enough:
the ‘ip’ table with two columns:
memberid
ip
the ‘url’ table with two columns
memberid
url
Or even just one table with three columns:
memberid
typeofdata (‘ip’ or ‘url’)
data
I’d like to be able to know the MemberId, but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.
I guess what you’d need to do is study a bit of database normalization. Google for it, or maybe someone already has a link to a valid resource by hand (I don’t).
If you want to put your data in a database, don’t create repetitive columns (memberid1, memberid2). It limits the number of possible memberids linked to a url or ip. And it gets real difficult retrieving the data.
If you build the database correctly (normalized) the code needed to manage it will be minimal.
Of course, everything I’m writing here is based on the scarce info you posted and a lot of assumptions, but you should be able to put all data in the three column table (like I said before):
memberid
typeofdata (‘ip’ or ‘url’)
data
where memberid would be the values 12345 etc.
but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.
What are the ‘connected values’? Aren’t they the ip and/or url values connected to each memberid? Why do you want to group them by ip/url value? They aren’t grouped like that in your array.
That’s why I asked for more info about the data, its meaning and its purpose. It’s not easy to give a meaningful answer based on so little info.
Well, I did some studying re Normalization. I then took some time away from it hoping it might help me get my head around it, but I’m afraid I’m still rather clueless.
I was reluctant to divulge my intentions, but what the hey.
I’m developing an anti-SPAM tool for Mods of a popular forum
Even if I leave out IPs (low ROI for higher overhead) and Comments, I’m still at a loss determining how to not have rows with unused fields (up to around a dozen of them per row). eg.
It might be because every example I’ve ever seen seems to have dratted auto_increment Id keys, but I thought (hmmm, now that I think of it, why?) that a row needed a key, and the key needed to be unique.
I’ve never seen a CREATE like that, but that’s my fault for using the PHP documentation as a tutor. They’re great for the specific function they’re dealing with but they often short-cut other parts of the code example to simplify it.
It is simple
Kind of like how I can miss seeing something in the store when it’s on the shelf in front at eye level but will spot it when it’s top/bottom or behind