Problem with 'array_intersect' when some parameters may not exist

I’m developing a custom database search (MySQL). The results come down to a group of several arrays (in this case sub-arrays of $freelistings, see below), and I need to find out what values are common to ALL of these arrays. I’ve got ‘array_intersect’ to do that, as shown below (though I’m not entirely sure whether it will take unlimited parameters):

	$finalresult = array_intersect($freelistings['dbl'], $freelistings['twn'], $freelistings['sgl'], $freelistings['fam']);

It works a treat when all the sub-arrays are present. The final result is an array containing ONLY those values that are common to all the sub-arrays.
My problem is that I can’t be sure that all of these sub-arrays will actually exist, and I can’t insert dummies as that would invalidate the results. I do know that at least one must exist, and it’s quite easy to deal with that situation:

		if (count($freelistings) == 1) {
			$finalresult = $freelistings;
		} else {
		        Do something else	
		}

The ‘Do something else’ would be to assemble the parameters for ‘array_intersect’ using only those sub-arrays that actually exist. So far I’ve not come up with a solution. I’ve tried assembling those sub-arrays that do exist into a string that could be used as the argument for ‘array_intersect’, but I get a (fatal) whitespace error on the third line of the snippet shown, presumably because PHP doesn’t like a $ inside a quote.

			$textlist = "";
			if (isset($freelistings['dbl'])) {
				$textlist .= "$freelistings['dbl'],";
			}
			if (isset($freelistings['twn'])) {
				$textlist .= "$freelistings['twn'],";
			}
			if (isset($freelistings['sgl'])) {
				$textlist .= "$freelistings['sgl'],";
			}
			if (isset($freelistings['fam'])) {
				$textlist .= "$freelistings['fam'],";
			}
			$textlist = substr($textlist, 0, -1); // cuts off final comma
			echo $textlist . '<br />';
			$finalresult = array_intersect($textlist);

None of the sub-arrays are huge. In my tests they only have about four values each, but this could go up to around twenty or more in production. Of these values there may be several that are found in all sub-arrays. There may eventually be more than four possible sub-arrays, but it seems unlikely there would be more than four or five in any one instance.
Can anyone help, please ?

So I’m guilty of not reading the whole thing, but I’m just curious why you are not handling this within the SQL statement rather than relying on PHP’s array_intersect() to do the comparisons? 9/10 anything that can be accomplished within SQL should be done there for performance and ease of use.

Thank you for your reply. The best answer I can give to your question is that I’ve not worked out how to do that. I’m already making four searches of an accommodation database for (up to) four types of rooms. Now I’m trying to find out which destination is included in all four results. The complication arises because the (human) searcher may not always look for all type of room.

Well if you want, break down your tables, relationships and the different searches / results you want and we’ll see where we can help.

If using mysql, use show create table your_tbl_name

Thank you. I can do that, but I think it merits a new thread on the MySQL forum or we’ll have the Admins after us. It’s late here, I’ll try to have it ready for you by tomorrow morning (your time).

Nah, keep the thread, it’ll be moved if need be. There’s still plenty of PHP involved with this.

OK. Do you actually want to see ‘Show create table’ ? Essentially it’s a calendar (columns are mostly days, rows are rooms). For development purposes I’ve actually only created January because any more slows down PHPMyAdmin too much.

Here’s the ‘Create Table’ stuff. I’ve shortened it by removing most of the very similar lines creating the date columns. For simplicity, and to avoid slowing PHPMyAdmin too much, I’ve only created ‘January’, but it’s enough to test out the Search system.

CREATE TABLE `avail_2013` (  `room_id` decimal(7,2) NOT NULL,  `member_id` int(4) NOT NULL,  `type` varchar(6) COLLATE utf8_bin NOT NULL,  `2013-01-01` varchar(6) COLLATE utf8_bin DEFAULT 'free',  `2013-01-02` varchar(6) COLLATE utf8_bin DEFAULT 'free',...[many similar lines inserting dates 3-29 January]...  `2013-01-30` varchar(6) COLLATE utf8_bin DEFAULT 'free',  `2013-01-31` varchar(6) COLLATE utf8_bin DEFAULT 'free',  UNIQUE KEY `room_id` (`room_id`)) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin

‘type’ (the third column) means type of room, and options are currently ‘double’, ‘twin’, ‘single’, ‘family’, abbreviated to ‘dbl’, ‘twn’, ‘sgl’, ‘fam’ respectively.

Every room has it’s own record (row) on the database. It’s status (shown in the appropriate date column) may be either ‘free’ or ‘booked’, a further condition of ‘provisional’ (‘prov’) will be introduced when I get further into this.

There’s a form where the user selects the type and number of rooms required. On submission the information from this form is used to create four search queries, of which the following are typical examples:

Double rooms:

SELECT room_id, member_id FROM avail_2013 WHERE type = 'dbl' AND `2013-01-01` = 'free' AND `2013-01-02` = 'free'

After PHP processing, the result is an array showing (as keys) the ID of those properties that have rooms available for the given dates, and the number of rooms (as values)
Result: Array ( [2043] => 3 [2008] => 1 [2066] => 2 )

Twin Rooms:

SELECT room_id, member_id FROM avail_2013 WHERE type = 'twn' AND `2013-01-01` = 'free' AND `2013-01-02` = 'free'

Result: Array ( [2043] => 2 [2008] => 1 [2066] => 3 )

Single Rooms:

SELECT room_id, member_id FROM avail_2013 WHERE type = 'sgl' AND `2013-01-01` = 'free' AND `2013-01-02` = 'free'

Result: Array ( [2043] => 1 [2066] => 1 )

Family Rooms:

SELECT room_id, member_id FROM avail_2013 WHERE type = 'fam' AND `2013-01-01` = 'free' AND `2013-01-02` = 'free'

Result: Array ( [2073] => 1 )

All the above arrays are stored as sub-arrays. $freelisting[‘dbl’], $freelisting[‘twn’], $freelisting[‘sgl’], $freelisting[‘faml’]

If the user selects only (say) double and twin rooms, the Single and Family room queries won’t run (or if they do, will give blank results). This leads to trouble later.

Now there’s further PHP processing to compare the number(s) of rooms available to the numbers required, eliminating those properties that don’t have sufficient rooms. All that works fine, though I’m sure there are other ways to do it.

Finally I want to know which properties have ALL the required rooms available which means finding out which IDs are included in ALL the arrays. This is where ‘array_intersect’ came in.

$finalresult = array_intersect($freelistings['dbl'], $freelistings['twn'],$freelistings['sgl'],$freelistings['fam']);

But it fails if any of the sub-arrays aren’t present. I can’t create dummy sub-arrays, because they wouldn’t contain the right property IDs, so the search would fail (or be wrong). I need a way of searching those sub-arrays that are present, and finding those property IDs that are common to them all. Perhaps a ‘foreach’ loop, but I can see it getting complex.

Can you fix the top portion to use the basic code or mysql instead of php highlighting? its all squishied.

So I can tell right from the get go that you need a slight redesign to your tables. Not suprising, I don’t know EXACTLY what your app is supposed to do from that table design. You never want to create a table with columns with things such as dates. From the very top view, what does this app / table / search’s goal? Show free promos for rooms during a certain month?

From what I’ve seen so far you should have the following tables:

tbl_members, tbl_rooms, tbl_room_types, tbl_mystery_table (this will contain ROWS of your dates, once I find out what they are for)

Once we get this figured out we can talk about, most likely, an over complicated task turned EXTREMELY easy query with only a few lines of PHP :slight_smile:

Well, it’s not pretty, but here’s the entire ‘Create Table’ bit, as text:

CREATE TABLE avail_2013 ( room_id decimal(7,2) NOT NULL, member_id int(4) NOT NULL, type varchar(6) COLLATE utf8_bin NOT NULL, 2013-01-01 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-02 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-03 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-04 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-05 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-06 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-07 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-08 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-09 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-10 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-11 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-12 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-13 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-14 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-15 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-16 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-17 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-18 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-19 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-20 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-21 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-22 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-23 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-24 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-25 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-26 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-27 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-28 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-29 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-30 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, 2013-01-31 varchar(6) COLLATE utf8_bin DEFAULT ‘free’, UNIQUE KEY room_id (room_id)) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin

I’ve only created January for development purposes. It’s a matrix of rooms and dates plus little bit of extra information about each room. It did seem awkward having dates in columns, but even worse having the rooms there.
I did wonder about a design with more but simpler tables, but thought it would probably complicate things and perhaps be beyond my current skills (although I’m keen to learn).
I’ll make a further posting explaining the project from the beginning.

Now here’s the detail on the project.

I’m web master for an accommodation listing site for the Isle of Mull. We list subscriber’s (‘members’) properties, and the site visitors can view the listings and click through to members web sites. A typical listing page is at http://www.holidaymull.co.uk/accommodation/SelfCatering/Central. Few members have their own on-line booking systems, and there’s no way for site visitors to make a global search for available accommodation - they just have to contact members individually and ask. (‘Global’ here means site-wide, not world-wide)

I’m looking at the feasibility of creating a global availability listing, leading on to a booking system. I’m aware that the biggest problem here will be keeping it up to date (when, say, a member takes a telephone booking outwith the system), but I’ve undertaken to see what can be done.

My objective is that the visitor will be able to make a search and get a listing (looking like the one linked above), showing only those properties offering the requested availability on the target dates.

In the listings database we have tables showing quite a lot of information about the properties, including the number of rooms (though not whether they are double, twin, single or family). We also know the location (both generally as NW, SE etc. and as lat & long in separate table), and quality grading of the properties. We’ll need to be able to search on those criteria as well (probably on general location rather than lat & long).

The properties divide into two types: Serviced accommodation, like hotels and B&Bs, which offer ‘rooms’, and self-catering accommodation that offers ‘units’ (which might be an apartment, a cottage or even a large house, each sleeping x number of people). Most members have 1-5 rooms or units, a few have more. The largest hotel has 82 rooms (the next has only 26).

There are about 125 members (with potential to double that) and perhaps 500-600 rooms or units currently available (I will get a more accurate figure).

I think that summarises the position accurately without overmuch detail, but I can explain more as we progress. Essentially I need to be able to record the status of every room/unit on every day of the year. Another way to approach that might be to know how many rooms (of each type) a property has, and then how many of those are ‘free’, ‘booked’, or ‘provisional’ on each day.

Ok, perfect!

So, this is why I was a little skeptical, with your current set up, you’d have to create columns by hand as time progresses, and things have a very horizontal display, not very clean right?

So I’ve not looked for a good source on this before, perhaps someone else can chime in and throw a better one in, but have a look here for a bit on relationships: http://www.deeptraining.com/litwin/dbdesign/fundamentalsofrelationaldatabasedesign.aspx

So the design is going to completely depend on what you currently want to track, as well as what you hope to expand to later one. What’s currently feasible for you to track and maintain?

About how it should currently look from what I’ve heard:


tbl_users
------
user_id auto increment
user_type (need this to determine if a user has access to edit / add property, or just rent rooms)
... all other user info, passwd, etc

tbl_properties
------
property_id auto increment
user_id  (owner of the prop)
property_type
property_address
...etc

tbl_units
-------
unit_id
property_id
unit_type
max_occupancy
beds
....etc

Obviously you’ll need more columns than this, but this is what you need to understand the relationships. 1 user has many properties, 1 property has many units. For the example of your hotel, if the user owns only that property, he would have 1 record in tbl_properties, prop type of hotel possibly? and 82 records on tbl_units, and unit_type could determine the size of the room. This setup might be a little much to start off with, but it allows separate types of units on the same property to be added.

Now from here we are obviously missing the booking data. Are we going to assume that all units are available to rent (assuming its not taken for the day) on any given day? Holidays? Either way we’d most likely add this in:


tbl_bookings
------
unit_id
user_id
book_date (here's your date *columns*)

Now, its only a matter of using JOINS in SQL, because we created proper relationships in order to deliver great search results to the user. This is most likely not going to be your final path, but it should get you in the right mindset. :slight_smile:

Hello Kyle,

Thanks for that.

So, this is why I was a little skeptical, with your current set up, you’d have to create columns by hand as time progresses, and things have a very horizontal display, not very clean right?

Yes, horribly horizontal. I’d have to make a new table every year, and probably allow an overlap between tables to avoid problems running queries over the year-end.

I already have a table of ‘members’, and a separate one for ‘passwords’, which allows them to log in to edit their details, and we could extend that to include their own bookings. This would be your ‘user’ table, I assume.
Visitors to the site (who may be making searches) won’t be required to log in at that stage. Things might become different when they book. Even so there’s no way they would be able to ‘edit’ entries. I might well need a ‘guests’ table for those who’ve booked.

There’s already an extensive ‘properties’ table with the sort of data you indicate (it might need some new fields).

I’d need a new ‘Units’ table with room/unit information (not unlike my current ‘avail_2013’ table but without the date columns)
And finally the ‘Bookings’ table which would tie a specific room/unit to a specific date or dates (am I right ?). This could become quite large, since we’ve got 500 rooms/units x 365 days. Most self-catering is booked by the week, and serviced accommodation (hotels, B&B) may be booked a few days at a time, reducing the rate at which the table would grow. Even so it could become 50,000 rows (if I’ve understood correctly).

So then there isn’t a table that looks like a calendar at all ?

The large hotel is part of a Scottish chain, and it’s probably NOT going to want to take part in this scheme, as it’ll have it’s own.

Now from here we are obviously missing the booking data. Are we going to assume that all units are available to rent (assuming its not taken for the day) on any given day? Holidays?

Yes, the starting position is that all rooms/units are available to rent on all days. It would be for member’s to edit their own availability to reflect their holidays. The tourist season up here is mostly March-October, and some places do close during the winter.

I’ll digest it some more and get back to you if necessary.

Correct, its not supposed to LOOK like a calendar, if you want to present a calendar to your user, you can do that, but this would be the correct way to house the data in the table.

50k records is NOT a big deal. Setting up indexes will insure that everything runs quickly. Composite primary keys should be used to prevent duplicate entries. If a record does not exist in tbl_bookings for a specific unit number on a specific date, then it is available.

Hello Kyle,

Thank you, that’s been a huge help. I’m sure there will be more queries as I take this forward.

For example I’ve never used ‘Composite Primary keys’. I take it that means indexing on a combination of two or more columns ?

I’ll convert my present method to your suggested one, and see how I go.

BTW: I did resolve the problem I started out with, it used 8 lines of code with two nested foreach loops. But I shouldn’t need it now.

Correct, this literally combines the two columns before checking to make sure they are unique.

PRIMARY KEY (col1, col2)
col1 col2


foo bar
foo bar – would not be inserted
foo doo — would be inserted

So in the ‘bookings’ table I’d index on ‘room_id’ and ‘book_date’ so that any particular combination couldn’t be repeated ? Excellent.

However, it occurs to me as I create and populate the tables (with dummy data) that this will only work if each night booked gets its own record. Using a start date and number of nights booked would not create an unique entry covering the second and subsequent nights.

I can find the booked rooms, but not the free ones. I’ve posted a thread on the Sitepoint MySQL forum.