We’re currently running an upgrade from a v24 APDB to v29 using dax_apdb_migrate, and we’re facing a severe performance bottleneck due to the scale of our data.
Our table sizes are:
DiaSource: 243 million rows
DiaObject: 243 million rows
DiaForcedSource: 16.5 billion rows
So far:
Updating DiaSource (filling new columns) took ~16 hours
The DiaForcedSource update has been running for over 6 days and is still in progress
Given that DiaForcedSource is ~70× larger than DiaSource, we estimate the full migration could take weeks or even months at this rate.
Is there any way to accelerate large-table migration, such as: Enabling parallel / multi-threaded data processing?
Any guidance—or plans to optimize dax_apdb_migrate for large-scale deployments—would be extremely valuable.
I’m afraid there is no way to parallelize it, the migration needs to be done in one transaction/session and Postgres internally runs it as a single thread. Scaling is normally not by the number of rows, but by the volume of data that Postgres has to rewrite. DiaForcedSource is much narrower than DiaSource, so even if it has 70x rows it does not mean it will take that much longer. Is that a particular migration that takes that long (e.g. schema_1.0.0)?
One potential way to accelerate things is to use better hardware on server side, e.g. faster storage, larger memory, etc. Note that Postgres-based APDB is not intended for production-scale data volume, we built it only for testing purposes. In general single-server RDBMS solutions do not work well with very large datasets, if you plan to scale your setup even further you may want to look at Cassandra as a backend.
One more point - there may be some server-side parameters that could improve Postgres performance, your DBA could probably help you to identify bottlenecks and change relevant parameters.
Thanks for the detailed explanation! We’ll look into tuning Postgres parameters for now, and start evaluating Cassandra as a long-term backend for production-scale APDB.
schema_1.0.0 migration is very heavy because it needs to recalculate visit and detector columns from ccdVisitId, which is done on client side. With 16.5 billion rows that will take very much time. The number of rows in DiaForcedSource table is certainly a problem, not just for migration but also while querying that table. I believe in Rubin pipelines the algorithms were modified to reduce the number of generated forced sources by doing forced photometry only on DiaObjects with nDiaSources > 1 (not doing it on false positives that appear only once).
Andy
EDIT: I checked the code and it’s not 16.5B calculations that happen on client side, bottleneck is still on server side.
Sure! Here’s the output so far from the upgrade command:
2026-01-29 09:24:59,604 INFO schema_1_0_0_py - All table names: ['SSObject', 'DiaObject_To_Object_Match', 'alembic_version', 'DiaObject', 'metadata', 'DiaSource', 'DiaForcedSource']
2026-01-29 14:32:54,993 INFO schema_1_0_0_py - Found 1565508 distinct ccdVisitIds
2026-01-29 14:33:14,826 INFO schema_1_0_0_py - Found instrument: {instrument: 'WFC'}
2026-01-29 14:33:14,883 WARNING astro_metadata_translator.translator - WfcTranslator: observation_type is defined explicitly but will be replaced by _trivial_map
2026-01-29 14:33:14,886 INFO schema_1_0_0_py - creating migration_ccdvisit_map table
2026-01-29 14:33:54,541 INFO schema_1_0_0_py - inserted total 1565508 rows into migration_ccdvisit_map
2026-01-29 14:33:54,552 INFO schema_1_0_0_py - Filling new columns in DiaSource table
2026-01-30 06:48:11,510 INFO schema_1_0_0_py - Filling new columns in DiaForcedSource table
The last line was printed on Jan 30, and it’s still running (now Feb 6). No further logs since then.
Looks reasonable (not saying that one week is reasonable). Do you have access to monitoring information on server side? (I have no idea what kind of setup you have for Postgres, but we can try to figure out if there are anomalies anywhere, including networking issues)
We’re discussing internally how best to gather the system information you suggested (we don’t have a dedicated DBA or monitoring setup). I’ll get back to you as soon as we can.
Dear @caimx , I’m from Rubin’s Community Science Team, and I’m serving as the forum watcher this week. I just wanted to follow up to see whether you and your team need any additional assistance at the moment or if you’re still gathering information. No rush — I just wanted to check in!
At the moment, we’re letting the migration run as-is since it’s on a test database and we can afford to wait. We don’t have dedicated DBA support or extra bandwidth to investigate performance tuning right now, so we’ve decided to just monitor how long the full upgrade takes.
We really appreciate the offer of help — and may reach out again if we hit a blocker in the future!