Is retrieving frm database faster or from a file

technov5 · January 8, 2010, 7:41am

hi,

Is retrieving data from the database faster or getting it from text files on the server. Each entry will have a different file and the file name can be something like the username or something which is known and the database wont have to be searched to get the filename.

Thank You

disgracian · January 8, 2010, 10:12am

The filesystem will be faster with a small amount of files/records, but a database will scale much better. Having said that, the filesystem will never be perceptible faster (the difference would be measurable in milliseconds).

If a database is available, you’re almost certainly better off going that way.

Cheers,
D.

SpacePhoenix · January 8, 2010, 6:52pm

How big will the files be? Do you really want 1000s of files potentially a couple of MB in size, clogging up your database when they could be stored in a file system which surprisingly is designed for the storage of files?

felgall · January 8, 2010, 7:26pm

A database is just a collection of files with a fast lookup system built in.

r937 · January 8, 2010, 7:34pm

oh stephen, that’s ridiculous

a “fast lookup system” indeed

felgall · January 8, 2010, 8:59pm

I was just keeping it simple in connection with the question. It is basically a simple definition of what a database is though so it can’t possibly be ridiculous.

The whole purpose of a database is to put a layer on top of the files that makes finding the required information from those files in a quick and efficient manner.

So with any significant amount of data the files with the extra layer that using a database provides will generally allow the data to be accessed more quickly than any search you wrote yourself to do the same lookup.

If using the extra layer provided by the database didn’t make things quicker then people would write their programs to access the data directly bypassing that layer.

r937 · January 8, 2010, 9:04pm

on the contrary, if there is any possibility at all, no matter how small–and there is!!!–then, yes, it is ridiculous

see, there you go again

the “whole” purpose? no, not even close – what about enforcing relational integrity?

see what i mean? your over-simplification is ridiculous

if i’ve told you once, i’ve told you a million times, don’t exaggerate!!!

this is ludicrous, as only a handful of extremely skilled engineers know how to read a database file natively

disgracian · January 8, 2010, 10:50pm

One thing I didn’t think of until the reply after mine, was the scenario of storing the files themselves in the filesystem, and paths to those files in the database.

It’s a slightly contentious issue among DBAs (and r937 will probably have a more informed opinion on this than I do, so I’ll just present the options and leave value judgements out), but you can go one of two ways:

Store the contents of the files directly in the database as some sort of BLOB field (depends on the DBMS). Most databases have a datatype for arbitrarily large amounts of binary or text data.
Store the files in the filesystem, and store the path to those files in the database instead of the files themselves, along with other metadata about them that the filesystem doesn’t provide. The files then can be named anything because the database will handle the lookups. This will prevent, in your situation, potentially having to rename a file if somebody’s username changes.

Cheers,
D.

r937 · January 8, 2010, 10:59pm

in that scenario, d, i would opt to store the files in the filesystem

the meta-data (file creator, description of contents, keywords, tags, number of hits, et cetera) would of course be supremely suited for database storage and searching

the only advantage i can think of that storing files as BLOBs gives you is that the backup strategy is rather simple – just back up the database, no worries about synchronizing the database backup with the filesystem backup

the disadvantages, of course, are legion

disgracian · January 9, 2010, 12:11am

Another advantage to the BLOB approach is that access to those files is now regulated by the DBMS, there’s no need to duplicate it in the FS and possibly have to recreate it if the database moves to another server (and possibly a different FS).

Cheers,
D.

felgall · January 9, 2010, 12:28am

Flat file databases, heirarchical databases, network databases, and object oriented databases don’t even have the concept of relational integrity to enforce. So since referential integrity as a concept isn’t even a part of most database model theory there in no way that the implementation of those databases would even attempt to enforce it. Only with a relational database is referential integrity even a concept that the database might choose to enforce and in many cases even relational databases do not properly enforce referential integrity (eg. it is impossible to properly enforce referential integrity where any fields in the database are allowed to exist without having a specific value in the field).

Obviously if you are not going to use the database layer of a particular database model then you’d use a different format for the actual data. Each database uses its own internal file format based on what the actual database processing layer is going to need and so in many cases your generalisation is completely wrong since the data in many database models is designed to be able to be read directly as well as via the database calls. Again you are limiitiing yourself to a specifiic implementation of a database rather than considering databases in general.

A flat file IS a database where you get to build your own access layer on it and so if you built it properly then that database would be the quickest way of accessing your data. So the question of database versus flat file where a flat file can be a database is a rather meaningless question anyway.

You are obviously taking a very narrow view as to just what a database is that only includes one particular database model since relational integrity is not even considered for most types of databases since those models don’t require referential integrity as a part of that model.

Providing a standard way of accessing the data that will be the quickest way under normal circumstances is the ONLY thing that all the different types of database have in common.

r937 · January 9, 2010, 12:35am

stephen, give it a rest

i have years of experience with IMS and IDMS (precursors to the relational model)

you’re just back-pedalling, as usual

(eg. it is impossible to properly enforce referential integrity where any fields in the database are allowed to exist without having a specific value in the field)…

Again you are limiitiing yourself to a specifiic implementation of a database rather than considering databases in general…

So the question of database versus flat file where a flat file can be a database is a rather meaningless question anyway.
i’m calling you out, this is just more of your bafflegab and bullcrap

felgall · January 9, 2010, 12:52am

Those aren’t precursors to the relational model.

IMS is a heirarchical database that has no connection to relational databases whatsoever and no way to build any meaningful relational layer onto it.
IDMS was a network database the last time I used it (admittedly 20 years ago) it may have had a relational database layer tacked on since then but I’d expect that it would still be able to be used as a network database.

Neither heirarchical nor network databases enforce referential integrity.

When I first learnt to use IMS there was a lot made of how selecting the appropriate file structure with the database in order to allow the database to access the data in the most efficient manner. The same sort of information was also presented with regard to IDMS. When I first learnt about relational databases the point was made that the difference between a network and a relational database is that a relational database has an optimiser built in whic optimises the access to the data so as to make it as efficiently as possible based on what is known about the data.

With every database that I have ever worked with (including those I used before learning IMS and the several relational databases I have worked with since) the main point that has always been emphasised as to why a database is used rather than accessing the data in the files directly is all down to providing a consistent and FAST method of handling all the different sorts of access requests. You can write your own access module to handle a specific query on a flat file that works more efficiently than a database call but any other accesses to that data will then be less efficient than using a database would have been.

So I’ll repeat the correct information that I said in my first post A database is just a collection of files with a fast lookup system built in. That remains a correct summary regardless of what names anyone calls me for stating it that way. Everything else I’ve said in this thread explains what that sentence means in more detail.

Dan_Grossman · January 9, 2010, 5:05am

A database need not have multiple files (a MySQL InnoDB database has only one data file, in-memory databases have none), precursor does not mean “implements the relational model”, and fast lookup is not a universal property or goal of all databases. When you get called out for making things up, and you know the people calling you out aren’t likely to be ignorant about the subject, maybe you could back down sometimes. It’d save a lot of writing.

SpacePhoenix · January 9, 2010, 7:20am

disgracian:

One thing I didn’t think of until the reply after mine, was the scenario of storing the files themselves in the filesystem, and paths to those files in the database.

It’s a slightly contentious issue among DBAs (and r937 will probably have a more informed opinion on this than I do, so I’ll just present the options and leave value judgements out), but you can go one of two ways:

Store the contents of the files directly in the database as some sort of BLOB field (depends on the DBMS). Most databases have a datatype for arbitrarily large amounts of binary or text data.

Store the files in the filesystem, and store the path to those files in the database instead of the files themselves, along with other metadata about them that the filesystem doesn’t provide. The files then can be named anything because the database will handle the lookups. This will prevent, in your situation, potentially having to rename a file if somebody’s username changes.

Cheers,
D.

I was considering which way to go myself for file storage, i went for storing it in the file system and recording the necessary info (file type, mine, etc) in a db table.

With regard to storing them in BLOBs, what would you do if you had to move servers and for whatever reason BLOBs were not supported, you’d be screwed. By storing them in the file system you don’t have to worry about that. Say you were storing photos and say they were about 5MB each (they would probably be bigger files than that with the number of megapixels really modern cameras have), would you want a few thousand or even a few hundred of them clogging up your database? I wouldn’t.

By keeping the files in the file system they wouldn’t necessarily have to be kept on the same server, which would give a bit more flexibility as to data backup. What happens if your database becomes (for whatever reason) corrupted beyond recovery, if you store the files in the database, you loose absolutely everything but if you store them in the file system, you will have still lost your database but you will still have the files.

Dan_Grossman · January 9, 2010, 7:41am

It also opens the possibility of offloading the file storage to a 3rd party like Amazon S3 while running the database locally, or serving the files from a web server optimized for static files rather than the one running the application/code.

disgracian · January 9, 2010, 8:15am

I think most popular/serious DBMSs have this, but the differences between them would be a pain, as well as some of the arbitrary limitations that aren’t imposed by a decent filesystem. I might be wrong, but I think Postgres has a 1GB limit.

would you want a few thousand or even a few hundred of them [image files] clogging up your database? I wouldn’t.

I’m not sure what the difference would be in this case. I don’t see how they’d “clog up” a database any worse than they’d clog up a filesystem.

By keeping the files in the file system they wouldn’t necessarily have to be kept on the same server, which would give a bit more flexibility as to data backup.

Wouldn’t a distributed database give you the same option?

What happens if your database becomes (for whatever reason) corrupted beyond recovery

That’s more of an arbitrary “what if”. What if the filesystem becomes corrupt? What if, because of a lack of proper locking, the files become corrupt by two simultaneous writes?

Cheers,
D.

SpacePhoenix · January 9, 2010, 11:02am

Found this, it is an intresting read. Looks like flickr store there images in their file system. I dread to think how huge their database would be if they stored the actual images in it (or how long it would take to execute queries).

disgracian · January 9, 2010, 11:17am

With suitable indexing I can’t see how it would affect query speed in any significant way.

Cheers,
D.

Dan_Grossman · January 9, 2010, 7:32pm

AFAIK, the DB would only store a pointer to the BLOB data in each row, so the table would still have similar performance characteristics as only storing a filename. Only when the final result set needs to be returned does it need to follow that pointer and get the BLOB data off disk.