I’m starting work on a big project to publish a lot of government development data online. The data consists of different kinds of datasets – budgets, HR and personnel, development milestones for various projects, zip codes, census info etc. I’m looking to build a scalable architecture for this, and wanted some thoughts on what technology and architecture to pick. I have worked with smaller databases and have working knowledge of databases, but this is the first time i'm exploring connecting and integrating several databases
Here are the key requirements:
- All dataset can be assumed to be in the same database format (most likely MySQL)
- The datasets need to have the ability to connect to each other. In other words, I need to be able to run queries that span multiple different datasets (for example: what is the average income of families with atleast 2 kids in area code X)
- The architecture needs to be scalable to easily allow plugging in more datasets. I should be able to add a whole new dataset and maybe write a small wrapper, and everything should continue functioning normally.
- Any technologies used need to be open-source and/or free.
Eventually the goal is for people to be able to form and run queries via a web interface.
I’d really appreciate any thoughts or pointers on this.
and where is your question? You can run query accross multiple databases as long as they on the same server, speed and extendability depends just from database design...
"The architecture needs to be scalable to easily allow plugging in more datasets" it is just from your database design, which is very important, but we can not tell you how to do that becouse we do not know details...