Guide to replicating SQL Server to PostgreSQL
There can be many reasons for wanting to replicate your data from a MS SQL Server installation to your PostgreSQL installation. For example, as a step in migration or to be able to use PostgreSQL features for data analysis while not having to touch existing clients working with MSSQL. For me in this case, I wanted to use tsearch2 to search some fulltext data, because the fulltext indexer in SQL Server really isn’t very good.
It turns out that SQL Server ships with replication functionality that can solve this problem with relatively little pain (depending on your schema of course), providing full transactional replication. It’s master/slave only, and SQL Server will be the master, but it’s still quite useful.
Here are the steps to do this for a simple example database – should work for more complex database as well of course. It expects you to set up a user named sqlrepl in the PostgreSQL database, that the replication system will use to connect with. Make sure that this user has permissions to connect from the SQL Server machine in pg_hba.conf.
- Make sure you have the PostgreSQL ODBC drivers installed on the SQL Server machine (I’m using version 8.01.02).
- Create the example databases:
- In SQL Server:
CREATE DATABASE origin go USE origin go CREATE TABLE tab1( id int identity not null primary key, t varchar(128) not null) INSERT INTO tab1 (t) VALUES ('Test 1') INSERT INTO tab1 (t) VALUES ('Test 2')
- Then, in PostgreSQL:
CREATE DATABASE slave OWNER sqlrepl; \connect slave CREATE TABLE tab1(id int not null primary key, t varchar(128) not null); ALTER TABLE tab1 OWNER TO sqlrepl;
- Create a ODBC System Datasource on the SQL Server. This is done using the odbcad32 command. Make sure that you create a system datasource, and make sure you use the PostgreSQL ANSI driver (there are some problems with the UNICODE driver in the way SQL Server uses it)
- Start SQL Server Enterprise Manager. Create a new linked server. This is done by right-clicking the Linked Servers node under Security and picking New Linked Server. Enter the name of the linked server (in all uppercase, in our case PGSLAVE), and pick the driver Microsoft OLE DB Provider for ODBC Drivers. Note that you should not pick the PostgreSQL ODBC driver here. Finally, enter the name of the ODBC datasource just created. Make sure the link works by clicking the Tables node and verify that you can see the tables of you database.
- Configure the subscriber:
- Right-click on the Replication node and select Configure publishing, subscribers, and distribution.
- Select the tab Subscribers
- Click New
- Select OLE DB data source
- Pick the linked server you created (PGSLAVE). Re-enter the login information.
- Click OK and close all dialogs
- Create the publication:
- Right-click Publication under Replication and select New publication
- Select your database, click Next
- Select Transactional publication, click Next
- Uncheck SQL Server 2000 and check Heterogeneous data sources, click Next
- Click Article Defaults
- Open the Snapshot tab
- Change name conflicts to Keep existing table unchanged. In some cases it will work with drop and recreate, but I prefer creating the tables manually to make sure there is no mixup with datatypes and such (considering MSSQL doesn’t really know about PostgreSQL datatypes)
- Click OK
- Put a checkbox on the tables to replicate (tab1 in this example). Click Next
- You will get a warning about IDENTITY properties not being replicated to subscribers. This will happen if you have any IDENTITY columns in your table. In most cases, you can just ignore it. Click Next
- Possibly modify description if you want to, click Next
- Click next through the rest of the Wizard, and click Finish
- Create the subscription:
- Open properties for the publication
- Open the Subscriptions tab
- Click Push new
- Select the created subscriber (PGSLAVE)
- Click Next through the rest of the wizard, make sure to check the box for Start snapshot agent
- Now sit back and watch your MSSQL data first being bulk-loaded into PostgreSQL, and then transactionally replicated (you will see a couple of seconds delay after a commit, same as when you replicate between two SQL Servers).