Skip to content

Sync data from Postgres

Sync updates to your primary Postgres database with Tiger Cloud in real time

You use the source PostgreSQL connector in Tiger Cloud to synchronize all data or specific tables from a PostgreSQL database instance to your service, in real time. You run the connector continuously, turning PostgreSQL into a primary database with your service as a logical replica. This enables you to leverage Tiger Cloud‘s real-time analytics capabilities on your replica data.

Connectors overview in Tiger Console

The source PostgreSQL connector in Tiger Cloud leverages the well-established PostgreSQL logical replication protocol. By relying on this protocol, Tiger Cloud ensures compatibility, familiarity, and a broader knowledge base, making it easier for you to adopt the connector and integrate your data.

You use the source PostgreSQL connector for data synchronization, rather than migration. This includes:

  • Copy existing data from a PostgreSQL instance to a Tiger Cloud service:

    • Copy data at up to 150 GB/hr.

      You need at least a 4 CPU/16 GB source database, and a 4 CPU/16 GB target service.

    • Copy the publication tables in parallel.

      Large tables are still copied using a single connection. Parallel copying is in the backlog.

    • Forget foreign key relationships.

      The connector disables foreign key validation during the sync. For example, if a metrics table refers to the id column on the tags table, you can still sync only the metrics table without worrying about their foreign key relationships.

    • Track progress.

      PostgreSQL exposes COPY progress under pg_stat_progress_copy.

  • Synchronize real-time changes from a PostgreSQL instance to a Tiger Cloud service.

  • Add and remove tables on demand using the PostgreSQL PUBLICATION interface.

  • Enable features such as hypertables, columnstore, and continuous aggregates on your logical replica.

  • Indexes, primary key, unique constraints, and sequences are not migrated. Create needed indexes on the target for your queries.

  • TimescaleDB as source has limited support (e.g. no continuous aggregates).

  • Schema changes must be coordinated: apply compatible changes on the target first, then on the source.

  • WAL volume on the source increases during large table copy.

  • Continuous aggregates: The connector uses session_replication_role=replica during copy, so triggers (including continuous aggregate invalidation) do not run. Data synced during initial load below a continuous aggregate’s materialization watermark may not appear in the aggregate until you manually refresh. If the aggregate exists on the source, include it in the connector’s publication; if only on the target, use the force option of refresh_continuous_aggregate to refresh affected ranges.

Prerequisites for this integration guide

To follow these steps, you'll need:

  • The PostgreSQL client tools installed on your sync machine.

  • The source PostgreSQL instance and the target Tiger Cloud service must have the same extensions installed.

    The source PostgreSQL connector does not create extensions on the target. If the table uses column types from an extension, first create the extension on the target Tiger Cloud service before syncing the table.

Limitations

  • Indexes, including the primary key, unique constraints, and sequences are not migrated to the target Tiger Cloud service.

    We recommend that, depending on your query patterns, you create only the necessary indexes on the target Tiger Cloud service.

  • Using TimescaleDB as the source has limited support (no CAGGs).

  • The source must be running PostgreSQL 13 or later.

  • Schema changes must be co-ordinated.

    Make compatible changes to the schema in your Tiger Cloud service first, then make the same changes to the source PostgreSQL instance.

  • Ensure that the source PostgreSQL instance and the target Tiger Cloud service have the same extensions installed.

    The source PostgreSQL connector does not create extensions on the target. If the table uses column types from an extension, first create the extension on the target Tiger Cloud service before syncing the table.

  • There is WAL volume growth on the source PostgreSQL instance during large table copy.

  • Continuous aggregate invalidation

    The connector uses session_replication_role=replica during data replication, which prevents table triggers from firing. This includes the internal triggers that mark continuous aggregates as invalid when underlying data changes.

    If you have continuous aggregates on your target database, they do not automatically refresh for data inserted during the migration. This limitation only applies to data below the continuous aggregate’s materialization watermark. For example, backfilled data. New rows synced above the continuous aggregate watermark are used correctly when refreshing.

    This can lead to:

    • Missing data in continuous aggregates for the migration period.
    • Stale aggregate data.
    • Queries returning incomplete results.

    If the continuous aggregate exists in the source database, best practice is to add it to the PostgreSQL connector publication. If it only exists on the target database, manually refresh the continuous aggregate using the force option of refresh_continuous_aggregate.

Set your connection string

This variable holds the connection information for the source database. In the terminal on your migration machine, set the following:

Terminal window
export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
Tips

Avoid using connection strings that route through connection poolers like PgBouncer or similar tools. This tool requires a direct connection to the database to function properly.

Tune your source database

Updating parameters on a PostgreSQL instance will cause an outage. Choose a time that will cause the least issues to tune this database.

  1. Tune the Write Ahead Log (WAL) on the RDS/Aurora PostgreSQL source database
    1. In RDS console, select the RDS instance to migrate.

    2. Click Configuration, scroll down and note the DB instance parameter group, then click Parameter Groups.

      RDS parameter groups in the AWS console
    3. Click Create parameter group, fill in the form with the following values, then click Create:

      • Parameter group name, whatever suits your fancy.
      • Description, knock yourself out with this one.
      • Engine type, {C.PG}
      • Parameter group family, the same as DB instance parameter group in your Configuration.
    4. In Parameter groups, select the parameter group you created, then click Edit.

    5. Update the following parameters, then click Save changes:

      • rds.logical_replication set to 1: record the information needed for logical decoding.
      • wal_sender_timeout set to 0: disable the timeout for the sender process.
    6. In RDS, navigate back to your databases, select the RDS instance to migrate, and click Modify.

    7. Scroll down to Database options, select your new parameter group, and click Continue.

    8. Click Apply immediately or choose a maintenance window, then click Modify DB instance.

    Changing parameters will cause an outage. Wait for the database instance to reboot before continuing. After it comes back up, verify that the new settings are in effect on your database.

  2. Create a user for the source PostgreSQL connector and assign permissions
    1. Create <pg connector username>:

      Terminal window
      psql $SOURCE -c "CREATE USER <pg connector username> PASSWORD '<password>'"

      You can use an existing user. However, you must ensure that the user has the following permissions.

    2. Grant permissions to create a replication slot:

      Terminal window
      psql $SOURCE -c "GRANT rds_replication TO <pg connector username>"
    3. Grant permissions to create a publication:

      Terminal window
      psql $SOURCE -c "GRANT CREATE ON DATABASE <database name> TO <pg connector username>"
    4. Assign the user permissions on the source database:

      Terminal window
      psql $SOURCE <<EOF
      GRANT USAGE ON SCHEMA "public" TO <pg connector username>;
      GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO <pg connector username>;
      ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT ON TABLES TO <pg connector username>;
      EOF

      If the tables you are syncing are not in the public schema, grant the user permissions for each schema you are syncing:

      Terminal window
      psql $SOURCE <<EOF
      GRANT USAGE ON SCHEMA <schema> TO <pg connector username>;
      GRANT SELECT ON ALL TABLES IN SCHEMA <schema> TO <pg connector username>;
      ALTER DEFAULT PRIVILEGES IN SCHEMA <schema> GRANT SELECT ON TABLES TO <pg connector username>;
      EOF
    5. On each table you want to sync, make <pg connector username> the owner:

      Terminal window
      psql $SOURCE -c 'ALTER TABLE <table name> OWNER TO <pg connector username>;'

      You can skip this step if the replicating user is already the owner of the tables.

  3. Enable replication DELETE and UPDATE operations

    Replica identity assists data replication by identifying the rows being modified. Your options are that each table and hypertable in the source database should either have:

    • A primary key: data replication defaults to the primary key of the table being replicated. Nothing to do.
    • A viable unique index: each table has a unique, non-partial, non-deferrable index that includes only columns marked as NOT NULL. If a UNIQUE index does not exist, create one to assist the migration. You can delete it after migration. For each table, set REPLICA IDENTITY to the viable unique index:
      psql -X -d $SOURCE -c ‘ALTER TABLE <table name> REPLICA IDENTITY USING INDEX <_index_name>‘
    • No primary key or viable unique index: use brute force. For each table, set REPLICA IDENTITY to FULL:
      psql -X -d $SOURCE -c ‘ALTER TABLE <table_name> REPLICA IDENTITY FULL’
      For each UPDATE or DELETE statement, PostgreSQL reads the whole table to find all matching rows. This results in significantly slower replication. If you are expecting a large number of UPDATE or DELETE operations on the table, best practice is to not use FULL.

Synchronize data to your Tiger Cloud service

To sync data from your PostgreSQL database to your Tiger Cloud service using Tiger Console:

  1. Connect to your Tiger Cloud service

    In Tiger Console, select the service to sync live data to.

  2. Prepare the source database
    PostgreSQL connector wizard in Tiger Console

    a. Click Connectors > {C.PG}. b. Set the name for the new connector by clicking the pencil icon. c. Check the boxes for Set wal_level to logical and Update your credentials, then click Continue.

  3. Connect the source database and the target service
    PostgreSQL connector connection string in Tiger Console

    a. Enter your database credentials or a PostgreSQL connection string. This is the connection string for <pg connector username>.

    b. Recommended for private databases. If the source database is not reachable from the public Internet, toggle Enable SSH tunneling and route the connector through a bastion host:

    • Copy the public key shown in the wizard and append it to the authorized_keys file of the user the connector will log in as on the bastion host:

      Terminal window
      echo "<public key from the connector wizard>" >> ~/.ssh/authorized_keys
      # Verify
      cat ~/.ssh/authorized_keys
    • Enter the bastion Hostname, User, and Port in the wizard.

    c. Click Connect to database. Tiger Console connects to the source database and retrieves the schema information.

  4. Select the data to synchronize
    Starting the PostgreSQL connector in Tiger Console

    Choose where the connector picks tables from:

    • From an existing publication: select a PostgreSQL PUBLICATION name, then pick schemas and tables from those it includes. Use this when you cannot grant the connector user CREATE privilege on the source — your DBA can pre-create the publication — or when you want to apply row or column filters on published tables.
    • Directly from the source: select schemas, then pick the tables to sync from them. Tiger Console creates the publication for you.

    Click Select tables +. For each selected table, Tiger Console checks the schema and, if possible, suggests the column to use as the time dimension in a hypertable.

  5. Configure Initial Data Copy workers
    Configure Initial Data Copy worker count in Tiger Console

    Specify the number of parallel connections — Initial Data Copy (IDC) workers — used to process the initial data copy. Higher values can speed up the initial copy but may increase load on the source database. Defaults to 4.

    Tune this based on the resources available on the source database and the number of tables being synced. Large individual tables still copy through a single connection, so higher worker counts help most when you have many small to medium tables.

    Click Create Connector. Tiger Console starts the source PostgreSQL connector between the source database and the target service and displays the progress.

  6. Monitor and manage the connector
    Editing a PostgreSQL connector in Tiger Console
    1. To review the syncing progress for each table, click Connectors > Source connectors, then select the name of your connector in the table.

    2. To edit the connector, click Connectors > Source connectors, then select the name of your connector in the table. You can rename the connector, delete or add new tables for syncing.

    3. To pause a connector, click Connectors > Source connectors, then open the three-dot menu on the right and select Pause.

    4. To delete a connector, click Connectors > Source connectors, then open the three-dot menu on the right and select Delete. You must pause the connector before deleting it.

And that is it, you are using the source PostgreSQL connector to synchronize all the data, or specific tables, from a PostgreSQL database instance to your Tiger Cloud service, in real time.