Create dynamic include files or query database each time?

GHicks · August 18, 2012, 3:22am

I’m working on a project at the moment where there are a couple pieces of information from the database that will be used in multiple locations throughout the site in a drop down selector. The list of information will not change very often.

My 2 lines of thinking out the solution for this are:

Query the database each time to retrieve the list into an array (it will be a single piece of data with it’s associated id). Then compile that using a while (or for) loop into a series of <option> statements.
Use the form where the database would be edited from, and when the data is edited, THEN run the query, pass it through the while loop to construct the <option> statements, and then save that into a text file to use as an include where necessary in the other forms.

It’s basically a list consisting of active and non-active records. The 2 options are to add a new record, or mark an old one inactive, then the lists would consist of ONLY active records and another consisting of ALL records. I have full control over the record edit form, so I could easily add some code to create the text file.

So, which is considered the “better” approach? Choice 1 would end up making numerous calls daily, retrieving the same list, while option 2 would only call it a few times a week when the active list is modified.

Greg

kduv · August 18, 2012, 5:30am

For something like that, querying a DB won’t really have any performance limitations. You also have the added benefit of having the most up-to-date info. If you cache it in a file on the HDD, you may have to worry about whether or not the info is “fresh”.

I don’t really see any performance gains one way or the other with either approach. You may as well pull the data live from the DB.

My 2 cents.

GHicks · August 18, 2012, 5:57am

That’s basically what I was curious about, is there a performance difference? The data can only become “stale” when another particular form is used, there are no other parameters that would change the data outside of that single form. So I was thinking that by re-creating the static text file when those changes were made then database queries could be decreased. Though it is a fairly basic query, not like it is compiling data from several tables or anything.

OK, I’ll just do it live each time one of those forms is requested.

Thanks for the input.

Greg

cheesedude · August 18, 2012, 7:02am

A simple SELECT query only takes a few thousandths of a second to execute. How many daily queries per day are you expecting? 100,000? If you are only talking a few dozen or even hundreds, I wouldn’t worry about any performance issues about doing it the query way. Doing it with queries is going to be much easier than having to cache the data in file.

Cups · August 18, 2012, 1:43pm

I’d cache it and include it every time as long as the admin screen responsible for updating the text file is guaranteed to update the cached file.

To satisfy yourself this is robust and dependable you may develop a few simple classes to do this work, which themselves can then be unit tested.

The cached file will end up being held in memory if you use a memcache product such as APC (even though there are caches in mysql too which you could take into consideration).

This might only be the simplest and most basic building block of an overall caching strategy, but to my mind it is a good one to have, and as I use it as often as I can.

All the questions about caching pivot on the “well, it depends …” dilemma – which I accept – but starting from this basic file caching I went on and did a lot of nightly caches and builds on complex data structures, building menus, A to Zs, maps containing multiple data etc. Now this “cron driven caching” won’t suit everyone, it ties the site code tightly to the OS – but golly, it can speed up your site (don’t forget “it depends…”).

I say to you that you should do it (for a while at least) just to make yourself think more about caching as a strategy, then if/when you reject the idea at least you will have given yourself some concrete reasons why you do not use it – and will be slightly better placed to decide when to use it in the future.

GHicks · August 18, 2012, 8:10pm

cheesedude - the values returned by this are used in the homepage (as well as a few other navigational pages) with multiple users accessing the system throughout the day (maybe up to a dozen at any given time during the workday). So it can be accessed several hundred times a day as the system is a key component in the daily activities of the business. It is currently written in Java (with all the additional requirements that imposes) but is actually a fairly straightforward database front-end. Java is it’s 2nd iteration as it was originally coded in Access (when the company was small), rewritten in Java about 10 years ago, and is now being restructured again using PHP/Javascript (as per the original programmers recommendations since PHP is so much more matured than when he rewrote 10 years ago). He is loosely affiliated with the project, but has moved on to other projects and doesn’t have time to devote to this one at the moment, so I have inherited the rewrite (and eventually) the addition of new features to cover the additional needs of a “sister” company that was begun about 3 years ago.

Cups - I was also considering how this would be beneficial for a caching strategy and keeping things clean. While there are multiple users, very few of them have the ability to manipulate the data in question. And that would only be done through the Add/Edit form for that data in the database. That is why I started to consider storing a simple text file with basic HTML code (<option value=“dbvalue”>dbtitle</option> lines). And then use a simple include where that is necessary on the various forms. The system could then rebuild that form as a function of adding/editing the data in the database. Database calls would then be reduced to ONLY retrieving data associated with what the end user is really concerned with.

I still think that neither solution is the “right” one, I think they both have their pros and cons, though I also don’t think either one would be detrimental overall.

I do have a second group of data that would be very similar to this and used on different navigation screens, maybe I’ll write one in each format and see if I notice any difficulties with either one once multiple users are accessing the system. Rewriting either one would be basically duplicating the other code and making a few basic adjustments to the SQL statement as well as for the returned data display.

Greg

Markdidj · August 18, 2012, 9:28pm

I would go with the 2nd method as reading a text file is alot faster than reading a database when no sorting is required. Updating the text file is usually slower, but as it’s not done very often it’s OK.

I use the method a lot in my programming. My menu is a text file, the spam words/phrases are in text, current-month is an int in a text file, blocked ip’s, privileged ip’s, company (corporate) colours. (I could go on)
Only when a lot of updating, searching or sorting is needed do I use databases

Markdidj · August 18, 2012, 10:04pm

I’ve created a textFileDatabase library for PHP and ASP which contains file2text, file2array, line2fileStart, line2fileEnd, replaceLineinFile, replaceFile.
I also use this library for other things in my site, like storing guests input so I can manually add it after checking so it keeps user input away from main databases.

A simple library to create, and very handy to have

Serenarules · August 19, 2012, 11:39am

@GHicks

Depending on how complex your SQL statement is that generates the data, you may want to consider a hybrid approach. Complexity increases processing time, so if you have a things in your SQL that may cause full table scans, multiple joins, or any kind of aggregate fields, the live approach isn’t real great. An example of this would be a forum system, similar to this one, that reports ‘Latest Posts’ on the home page, and takes into account the permissions of the currently logged in user.

In the above case, it is the processing of the SQL that causes the real slowdown in high traffic situations, not fetching the results. Therefore, my recommendation is to simply author a cron job. Use this cron job to actually generate the results periodically, and then store the results in a new table. Then change the SQL in your index page to simply read the latest row(s), based on your timeout or some other ‘trigger’ action, from the results table. By eliminating the need to scan and collate, the hit on performance will now be negligable.

This has several benefits, including: no need to mess with changes in file permissions should you switch host OS, makes backups and replication tasks easier by keeping all site pertinent data in the same storage mechanism, requires minimal change to your current application.