Multi-dimensional array to Database schema

Mittineague · January 22, 2011, 8:28am

OK, I’ve been collecting data in a multi-dimensional array. The array is getting quite large now, and it might be nice to put the data online so others could use it too. Before it gets any more difficult to maintain I’m thinking it’s about time to get this into a database.

I’ll eventually be doing this with PHP and MySQL, but I don’t think that’s important at this point, I should be able to do OK once I figure out how to get started, and I’m sure I’ll be able to import the data into tables, but …
I can’t figure out how to determine what table relationships to use.

I have 3 types of data. By far, the vast majority of data sets are a single Member ID with a single IP and a single URL. If this was the case for all of the data sets I would have little to no problem. Unfortunately there are a few “troublemakers”.

MemberIds
and each has at least one IP - but may have more
and may have none - or several - URLs

IPs can be associated with one or more MemberIds
and none or several URLs

URLs can be associated with one or more MemberIds
and one or several IPs

At some point I may want to add a “comment” type.

I’m not looking for the finished answer, but I would like to be pointed to something that explains what logic to use in determining how relationships dictate table structure, or if someone’s up to it, their own explanation. What I’m looking for is a kind of “logic kick-start”.

Thanks.

DarthGuido · January 22, 2011, 9:45am

Assuming you have more member data than just the id (member name for example), the first table needed would be ‘members’

Then at first sight it would be logical to say that you’d need ‘memberips’ and ‘memberurls’ tables (1 to many relationship).

But, if you have more info about each IP and url (wouldn’t know what, but who knows), then you might need the tables ‘ips’ and ‘urls’ to store that info, and then the ‘memberips’ and ‘memberurls’ would become the connection between the ‘members’ and the ‘ips’ and ‘urls’ tables (many to many relationship).

The really complicating factor (IMO) is the relation between ips and urls. You could make a ‘ipurl’ table, but that would only work if the relations between ip and url are completely independent. And somehow I have the feeling that they aren’t, that the relationship between ip and url depends also on their relationship with member. Right?

Maybe some more info about the data, its meaning and its purpose would be helpful?

And where would you want to add that comment type? And what do you mean by comment type?

Mittineague · January 22, 2011, 5:58pm

Simplified, the array looks like:

[0]
12345
– 1.2.3.4
– sitepoint

[1]
12348
– 3.4.5.6
– google

[2]
12356
– 3.4.7.9
– apache
– php

[3]
12362
– 4.5.6.7

[4]
23467
– 5.6.7.8
– sitepoint (same as 1)

[5]
23469
– 5.6.8.9
– 1.2.3.4 (same as 1)
– mysql

[6]
23567
– 6.7.8.9
– 3.4.7.9 (same as 3)
– apache (same as 3)
– google (same as 2)

So tables might look like:

MemberId table
0 - 12345
1 - 12348
2 - 12356
3 - 12362
4 - 23467
5 - 23469
6 - 23567

IP table
0 - 1.2.3.4
1 - 3.4.5.6
2 - 3.4.7.9
3 - 4.5.6.7
4 - 5.6.7.8
5 - 5.6.8.9
5 - 1.2.3.4
6 - 6.7.8.9
6 - 3.4.7.9

URL table
0 - sitepoint
1 - google
2 - apache
2 - php
4 - sitepoint
5 - mysql
6 - apache
6 - google

But that doesn’t feel right to me. And because MemberIds only ever occur once, I get the feeling I could be using them to advantage somehow.

DarthGuido · January 22, 2011, 7:15pm

First of all, you don’t put duplicate values in the ip and url tables. That would defeat the purpose of those tables (take away data redundancy).

Second, I don’t know how much you’ve simplified your data, but if there’s no data related to the memberid (like membername, password, etc) but the ip numbers and the urls, then two tables would be enough:

the ‘ip’ table with two columns:
memberid
ip

the ‘url’ table with two columns
memberid
url

Or even just one table with three columns:
memberid
typeofdata (‘ip’ or ‘url’)
data

Mittineague · January 22, 2011, 10:27pm

You mean substitute the MemberIds for the array keys in my previous post’s example?

Might it be better to keep the IPs and URLs unique like:

IP table
1.2.3.4 - 12345 - 23469
3.4.5.6 - 12348
3.4.7.9 - 12356 - 23567
4.5.6.7 - 12362
5.6.7.8 - 23467
5.6.8.9 - 23469
6.7.8.9 - 23567

URL table
sitepoint - 12345 - 23467
google - 12348 - 23567
apache - 12356 - 23567
php - 12356
mysql - 23469

I’d like to be able to know the MemberId, but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.

DarthGuido · January 22, 2011, 10:45pm

I guess what you’d need to do is study a bit of database normalization. Google for it, or maybe someone already has a link to a valid resource by hand (I don’t).

If you want to put your data in a database, don’t create repetitive columns (memberid1, memberid2). It limits the number of possible memberids linked to a url or ip. And it gets real difficult retrieving the data.
If you build the database correctly (normalized) the code needed to manage it will be minimal.

Of course, everything I’m writing here is based on the scarce info you posted and a lot of assumptions, but you should be able to put all data in the three column table (like I said before):

memberid
typeofdata (‘ip’ or ‘url’)
data

where memberid would be the values 12345 etc.

but keep the identical values connected somehow in such a way as I can minimize the PHP processing to determine the connectedness.

What are the ‘connected values’? Aren’t they the ip and/or url values connected to each memberid? Why do you want to group them by ip/url value? They aren’t grouped like that in your array.

That’s why I asked for more info about the data, its meaning and its purpose. It’s not easy to give a meaningful answer based on so little info.

Mittineague · January 22, 2011, 11:34pm

OK, I’ll take a look for “normalization” and see if I can find anything that can work it’s way through my thick skull

Mittineague · February 23, 2011, 3:18am

Well, I did some studying re Normalization. I then took some time away from it hoping it might help me get my head around it, but I’m afraid I’m still rather clueless.

I was reluctant to divulge my intentions, but what the hey.

I’m developing an anti-SPAM tool for Mods of a popular forum

Even if I leave out IPs (low ROI for higher overhead) and Comments, I’m still at a loss determining how to not have rows with unused fields (up to around a dozen of them per row). eg.

Member_1 - URL_1 - URL_2 - URL_3
Member_2 - URL_1 - NULL - NULL
Member_3 - URL_1 - URL_2 - NULL
Member_4 - URL_1 - NULL - NULL
Member_5 - URL_1 - NULL - NULL

or

URL_1 - Member_1 - Member_2 - Member_3
URL_2 - Member_1 - NULL - NULL
URL_3 - Member_1 - Member_2 - NULL
URL_4 - Member_1 - NULL - NULL
URL_5 - Member_1 - NULL - NULL

and Id rather not do 2 fields with comma delimiter

Member_1 - URL_1, URL_2, URL_3
Member_2 - URL_1
Member_3 - URL_1, URL_2
Member_4 - URL_1
Member_5 - URL_1

or

URL_1 - Member_1, Member_2, Member_3
URL_2 - Member_1
URL_3 - Member_1, Member_2
URL_4 - Member_1
URL_5 - Member_1

Am I over-thinking this or am I as clueless as I feel?

r937 · February 23, 2011, 3:42am

your rows should have member and url only, i.e. one of each

if a member has two urls, there are two rows, and so on

that’s properly normalized

Mittineague · February 23, 2011, 6:10am

So something like

AI_0 - Member_1 - URL_1
AI_1 - Member_1 - URL_2
AI_2 - Member_1 - URL_3
AI_3 - Member_2 - URL_1
AI_4 - Member_3 - URL_1
AI_5 - Member_3 - URL_2
AI_6 - Member_4 - URL_1
AI_7 - Member_5 - URL_1

I guess what was hanging me up was thinking I could use the unique memberIds/URLs for keys instead of an auto_increment.

Thanks for the help

r937 · February 23, 2011, 1:27pm

what are the AI thingies? auto_increments? you do not need them

Mittineague · February 23, 2011, 3:52pm

It might be because every example I’ve ever seen seems to have dratted auto_increment Id keys, but I thought (hmmm, now that I think of it, why?) that a row needed a key, and the key needed to be unique.

I’ve missed something basic since Wayback.

r937 · February 23, 2011, 3:59pm

you’re right, every table should have a primary key

CREATE TABLE member_urls
( member_id INTEGER NOT NULL 
, url_id    INTEGER NOT NULL 
, PRIMARY KEY ( member_id , url_id )
);

or alternatively

CREATE TABLE member_urls
( member_id INTEGER NOT NULL 
, url   VARCHAR(123) NOT NULL 
, PRIMARY KEY ( member_id , url )
);

simple, innit

(note: foreign keys omitted for brevity)

Mittineague · February 23, 2011, 4:13pm

Ahh, the light comes on.

Using both as the key would be unique key values.

I’ve never seen a CREATE like that, but that’s my fault for using the PHP documentation as a tutor. They’re great for the specific function they’re dealing with but they often short-cut other parts of the code example to simplify it.

It is simple

Kind of like how I can miss seeing something in the store when it’s on the shelf in front at eye level but will spot it when it’s top/bottom or behind