diff and patch for database tables using PDI

Pentaho Data Integration (Kettle) has two cool steps that can "diff" and "patch" database tables.


They can be usefull when you need to sync two tables across databases and would not want to "drop-and-load" the entire table each time or when you have a large update and you want to take updates and inserts from batch files.


The diff step allow you to sync two tables (might be on different databases) by selecting all rows from both tables, comparing them to each other (you can specify the key-fields and the compare operator) and marking each row as identical/changed/new/deleted.


The merge step will run updates/inserts/deletes against the second table so at the end it would be identical to the first table.


By adding a filter to remove deletes after the diff, the program will only create updates/inserts resulting in batch updating.