Is this possible? Detect changes on one DB and apply those changes on another DB

oikram · March 18, 2010, 12:55pm

Not very often we ask about if something is possible. I’m clueless about the job I have in hands, and I’m wondering what will you do on this scenario.

We have two different web applications, from different companies (both with php encrypted files) and the task is:

When the user do some changes on one application, those changes get somehow “replicated” on the database of this new application.
I’m saying somehow, because they have different database names, different tables, and different columns and fields as well.

Both DB share the same server, they can share the same DSN details.

The idea is to smooth the new web system implementation process.

mySQL 5.1 and PHP 5.2 are in place.

What are the possibilities on a case like this?

Best Regards,
Márcio

rustybuddy · March 18, 2010, 2:11pm

mmmm… upon first reading your post title I was thinking a slave db would work for you. But after reading the entire post I’m not sure b/c the db’s are not the same structure.

How many cols/tables are we talking about mirroring here? Maybe it’s just making a script that is called whenever there is an update, the script then updates the other table?

oikram · March 18, 2010, 2:33pm

82 tables, more or less, 560 cols, around this.

That was the request made to me, my question would be, what kind of script would be that? How it will interact on this scenario? Is it possible to do it, if both php files on both systems are encrypted? That’s why I thought a pure database solution could do it, but maybe this is a hell of a quest… :s

oikram · March 18, 2010, 5:36pm

Actually, this is what I’m expecting as a workflow:

1- Paper - analyze/write down each table on different databases and understand if there is any relation on the data they can contain.

2- Paper - analyse/write down each column on each table and understand the relations.

3- Machine - Make test on every possible situation that will insert data into the database tables, in order to see, if there is any generated data requirement, or some columns specific values that should be in place.

4- Machine - Write 82 complex (for me at least) queries using INSERT INTO SELECT ON DUPLICATE KEY UPDATE …

5- Put it all together on a php file, and make that script run on cron job.

What do you guys think?

Regards,
Márcio

rustybuddy · March 18, 2010, 6:50pm

@#5. If you want the databases to be in-sync the scripts will need to be ran on every update. Not just via cron.

You will have to have some type of relation table that maps keys from one db to another. Then you will have to have a way to maintain this mapping. (ie, if one site is added to, it has to be added to the other db and the mapping has to be noted.)

I don’t really see this as 82 queries. I would shoot toward developing 1 function in a class available to both sites. Whenever an update or insert is performed (to either database) the class is called and passed some parameters then the function performs the sync real time.

just some freehand logic for you…


function sync-databases($entry_id, $table_updated)
   {
    // find out which table was just updated
   // we will want to sync that data to the other table
   $table_to_update = $opposite_of_table_updated;

   if($table_updated == user_db1 || $table_updated == user_db2)
       {
       // here we can put some custom mapping in or call our mapping table
       // so we will know which fields to update in either case
       }

   elseif($table_updated == content_db1 || $table_updated == content_db2)
       {
        // here is some mapping for the content 
        // you use the id to determine which entry to pull the data from
        // you use the mapping table to determine which entry to update
        }

Just a thought of how I ‘might’ approach the situation. There are probably many other ways to do this.

oikram · March 18, 2010, 8:21pm

I’m following, but I’m so newbie, I need to learn more about that mapping part. By your excerpt however, it seems that I will need a conditional for each table… or I’m seeing it wrongly? (this is the most probable).

About the cron job, if the syncronization must be made by hand, it will be more then good, since, and that is the hardest of them all, I see no way to “detect” when either an Update or Insert or Delete (yes, I will need to track the delete operations as well), as occurred, in order to process the corresponding script.

I’m so in the mud. :injured:

rustybuddy · March 18, 2010, 8:26pm

yeah, it was just an example using the if’s. if two or more tables are similar you might be able to combine them into one statement.

I was suggesting that you find all the db statements that update or insert certain content and add your class function after the db is updated or inserted into. This way all this sync stuff would happen automatically.

The problem with the cron job is this… The cron script will find two non-like entries that should be the same. How will it know which of them is the newest? What if both entries have been changed since the last cron ran? Then you will inevitably overwrite somebody’s data which is never good.

I think for your situation, if you can’t make it happen real time you are setting yourself up for failure.

Best of Luck!!

Vali · March 18, 2010, 8:36pm

Depending on your database size, you cannot do it using a php script/scronjob… unless you want that script to load the entire data on the two databases, transfer all the crap over the network, parse it, and send it back for updates.

There are 2 things you can do (without touching the code):
#1 Create triggers on both databases, and some queue tables for each table you have.
Every time an update/insert/delete occurs, the trigger would insert this record in your queue table. Then, a cron script would run and parse all entries in the queue tables, moving them to the other database, and clearing the queues.
If your systems are very similar, your triggers could even do the inserts in the other database, so no need for a PHP script.

#2 Add a master/slave configuration on your databases, and log every query that goes to the slave.
You need the master/slave since the logging will slow down your master.
Then, get a PHP script to parse those queries, and translate them to the other database.

In both cases, make sure you don’t add a loop where you move the same data from system A to B, and then back in A, B, A etc. (ex: primary keys might be different, so you need to account for this in your porting scripts)

I used these techniques with high load db servers (tens of millions of records, thousands of queries a second and multiple db servers) when moving from one system to another.

Hope it helps.

oikram · March 19, 2010, 9:42am

a)
The database is not that big, it will be more or less… 51MB I suppose.
Both databases are accessible from the same user.

b)
The destination database, at the beginning of this process, will be empty.

About the first possibility, that I presume is the best I have:

I will investigate the meaning of triggers and what are queue tables. I do have a first question here:
Will this work when we have different database schemas (understanding by this, the fact that, the column and tables names are different between databases)?

On inserting, since the destination database will be empty, the ID’s will be equal on all tables on both databases.

Later on the process, if some records are deleted on the original database, they should we deleted as well on the destination database. The inverse process will not be required but, if we do it, we will get the ID’s correspondence on both databases. If I see this correctly.

According to this new informations a) and b), if you believe something could be done differently, please, let me know, so that I can properly investigate more for this subject.

Thank you both, again,
Márcio

oikram · March 19, 2010, 10:55am

If we have the same user for both databases, we can access both databases on one query. In this scenario, do we need queue tables ?

If we do something like this, does it make sense?


CREATE TRIGGER oldDatabase.updateTablexTrigger AFTER INSERT ON oldDatabase.oldTablex
  FOR EACH ROW
    INSERT INTO newDatabase.Tablex2 SET oldDatabase.oldTablex.old_id = newDatabase.Tablex2 NEW.new_id;

When we do a trigger does it “stay attached” on the database information?

Thanks a lot,
Márcio

Viflux · March 19, 2010, 12:01pm

It sounds like you want to look at the Replication features of MySQL.

oikram · March 19, 2010, 12:15pm

I don’t know if replication is the correct thing here, since, I believe, I’m not entitled to have a Master/Slave relation because one database as table named “Tickets” and another “Support_Tickets” and the same goes for each column on each table.

I don’t mind if I have to do 82 queries, really, I will learn a lot on each of them, so, no issues there. I do mind however if, after doing it, nothing will work as expected.

But if you tell me, replication is the way to go, then please, tell me:
on that vast ocean of possibilities related to replication, taking this scenario, what should I look in more detail?

UPDATE: I have read some documentation on replication but, I end up on synchronization instead, assuming, that they are contradictory, if we have one, we will not have another and but, this may not be the case, and probably we can have DATA replication on DATABASE synchronization?

UPDATE 2: From mysql website: “Replication enables data from one MySQL database server (the master) to be replicated to one or more MySQL database servers (the slaves)”;
I’m running both databases on one server, with the same user, nothing to fancy. Can the replication still apply on this cases?

Thanks,
Márcio

oikram · March 19, 2010, 2:17pm

oikram:

If we do something like this, does it make sense?


CREATE TRIGGER oldDatabase.updateTablexTrigger AFTER INSERT ON oldDatabase.oldTablex
  FOR EACH ROW
    INSERT INTO newDatabase.Tablex2 SET oldDatabase.oldTablex.old_id = newDatabase.Tablex2 NEW.new_id;

Do we need to do 3 triggers for each table on both databases, one for UPDATE other for INSERT and another for DELETE, so that, when if and update is made on the old database, the new database get the values, if it’s done on the new database, the old database get the values as well.

Since there are 82 tables to work with on the “new database” and 55 on the “old database” I need to create 137 queries x 3 times (insert, update, delete) = 411 queries!!

This is getting insane, no? Or I’m I getting insane?

Again,
Márcio

Vali · March 19, 2010, 3:52pm

oikram: if you don’t care for scalability, and you can keep your two DBs on the same server, then all you need are TRIGGERs.

And you are correct, you need 1 trigger for each action.
That is why I suggesting having a master/slave db for your main database (one that moves the data over to the other one).

You will use the master normally, replicate it in the slave, where you have your 1000 triggers, that move the data over to the new database. (but not sure if this will slow down your system or not, since your on the same server for everything…)

oikram · March 19, 2010, 4:09pm

You have a point on scalability, sure. But I’m far from being a database administrator and I’m trying to ease the process. I deny doing no matter what, but I do accept the fact that the system will not be optimal. So, yes, no issues there.

I’m not getting the relation between having triggers for each action and having a master/slave relation between the two databases. Can you please clarify this relation. (sorry).

You said:

So I will have two databases or three? I’m confused with this master/slave here.

Can I please ask some references about the master/slave procedure on this cases please?

Thanks a lot,
Márcio

oikram · March 19, 2010, 4:52pm

Is the need for master/slave due to the fact that if we do the trigger on both tables, we will get a loop?

Vali · March 19, 2010, 6:08pm

Nope, it’s so your selects don’t slow down.

By Adding triggers, you basically multiply your queries by 2 (for the ones that change the db).
That in turn can slow down your select queries, depending on your actual queries.

oikram · March 22, 2010, 10:04am

Ok, but in order to have a Master/Slave configuration, I will need to separate the databases and put one in one server, and another one, in another server. This will allow the application to run smoothly. No way to have a Master/Slave database relations on the databases without two separate servers right?

The way to avoid a loop with triggers is by using “User-Defined Variables” .

While this process occurs, should we lock the tables? If so, do we need to create a mechanist to alert the user that this record cannot be changed at this time, because someone else is doing it ?

Am I on the right track here?

If you which, simple boolean answers will suffice.

Thanks in advance,
Márcio

Vali · March 23, 2010, 4:24pm

You can have master/slave databases on the same machine.
And you really don’t want to lock your live tables until you move the data over… that will slow things down to much.

Try the triggers first, and if that’s to slow, then look into master/slave and so on.

oikram · March 23, 2010, 5:32pm

Roger! thanks a lot.

I will now inform the bosses that, in order to accomplish this, I need to analyse each table, one by one, each column… and prepare the 1000 triggers.

Vali, one last question, this is, normally, a time consuming task right?
I mean, I will need months to accomplish this by myself. :s

:goof:
Márcio