File Uploads & Version Control

Stormrider · February 4, 2010, 2:37pm

I just wanted to see what people’s techniques were for dealing with file uploads on a site in version control.

I’m sure many of you, like me, have your websites under some kind of version control - for me it’s Subversion but there are many different ones out there.

Websites quite often allow file uploads for the users, or even generate files itself which it puts in a specified folder, but these files aren’t under version control like the codebase itself, or the other files you might have in the project. When it comes to checking the code out elsewhere, or updating the site, these files get ‘left out’ and aren’t put in the new version - I export the site from SVN when I want to update rather than serve from a working copy, which can be a bad idea if your server isn’t set up correctly.

I have thought about using a symbolic link to deal with it, but I wanted to see if anyone else has come across this problem, and how they solved it.

Any suggestions?

TigerStripes · February 5, 2010, 5:30am

It is good practice to use test data, rather than live data, on your development server, so SVN should never see the user files.

Just set svn-ignore to exclude the folders where your application stores these files, and you should then be able to rsync your exported application code to your web server. Just don’t set the --delete flag!

Personally, I don’t always do it that way. If I know I’ve only changed a few files, I’ll check the svn history and manually scp them.

kyberfabrikken · February 5, 2010, 9:39am

I generally have a deploy script that exports the repository and then symlinks certain folders into a shared place. So yes, I use symlinks for that. I think that’s a pretty common solution.

Stormrider · February 5, 2010, 10:06am

Yeh, I have a rollout script at the moment that does an export to a folder like exports/r56/ or whatever, then points a symlink called ‘site’ to the new export - apache serves from this symlink.

The only trouble with using symlinks for this is that I do my development under Windows… which has no such thing. But I’m sure I can figure out something.

I already have an ‘environment’ config that sets up different paths etc depending on whether it is the dev, test or live environments, so I should be able to use that to change the uploads path as well.

joebert · February 7, 2010, 2:54am

I keep three things going.

Traditional version control with SVN for the programming logic, templates, etc
An uncompressed backup of the site maintained via rsync that includes generated content/etc
A compressed version of the rsync backup with the last known good configuration

AlienDev · February 7, 2010, 5:32am

Vista has symlinks

I’m using them right now.

G_Schuster · February 7, 2010, 8:28am

They’re called “JunctionPoints”.
To create them, use a software like “JunctionLink Magic”: http://www.rekenwonder.com/linkmagic.htm

kyberfabrikken · February 7, 2010, 1:21pm

I believe that Vista and Windows 7 (Or is it just Windows 7?) has real symlinks. Older versions of ntfs had junctionpoints, which are only sort-of-like symlinks.

Ren · February 7, 2010, 1:50pm

I’d use content addressing, all uploaded files get stored based on the hash of their content.
Then a database tracks application names to the hash over time.

In essence application implemented symlinks, with the added benefit of history, so can revert/rollback to any prior version.

Also would allow branching, so could run the same application, but with different skins/templates.

jshpro2 · February 10, 2010, 12:02pm

You can create a repo just for backups. SVN will work with binary data and even SQL files. Kind of neat how it only stores the difference in the SQL exports, so its an incremental backup solution of sorts.

But again you may want to just have a separate repo so it keeps the logs and revision #s separate. I don’t like the rsync idea because rsync would also copy over, say… a corrupted file, overwriting the backed up version.

joebert · February 11, 2010, 10:13am

That’s what a compressed “last good known configuration” copy of the rsync directory is for.

jshpro2 · February 11, 2010, 12:46pm

Scenario.

You overwrite your “last good” with a new backup you think is good. You realize you did not test everything and some files got corrupted. How do you get them back?

sk89q · February 11, 2010, 4:44pm

Vista and up have real symlinks. Junction points are also handled much better than in XP (it’s harder to do something stupid like delete the original files).

Use mklink in command prompt to create either symlinks or junction points. It’s pretty useful.

And a better analogy for “content addressing” might be inodes.