How can you partition multiple survey datasets in qserv?

BobMann · June 21, 2018, 11:40am

The UK DAC plans to host non-LSST survey datasets (e.g. from VISTA near-IR surveys) within qserv alongside the LSST release databases. Some of these external survey databases contain tables large enough to require spatial partitioning, and, clearly, we need them to follow the same partitioning scheme as the LSST data to ensure that spatial joins involve data on the same machine.

The problem is that the VISTA and LSST objects may not be related in a way that readily maps onto the director-child relationship assumed for partitioning in qserv. For our current tests (which are using a UKIDSS data release), we are trying to fudge things by adding an extra column to the child table that will contain the value of the ID attribute for an Object (director) that is close to each Source (child).

That should work, but is clunky, so we were wondering whether it might be possible to circumvent the director-child relationship and do the partitioning using the (ra, dec) ranges of the chunks directly - i.e. if we’ve ingested an LSST data release partitioned spatially according to the positions in the rows in the Object table, are the (ra, dec) ranges of the chunks so defined recorded somewhere in such a way that we could then ingest our VISTA tables into corresponding chunks using those range values directly?

brianv0 · June 21, 2018, 12:06pm

The VISTA Object tables would just need to have the same partitioning scheme (aka “Chunking Scheme”) as the LSST tables but they effectively be their own, mutually compatible, director (and may also have their own child tables), so that qserv can distribute the chunks to the servers in a colocated manner. As you’ve pointed out, the Director-Child partitioning is really a reference partitioning scheme where the child is colocated next to the director, which is spatially partitioned according to the chunking scheme. Spatial Joins between LSST Object and VISTA Object should be efficient and executed on-node. It should also be the case that joining between child (Source) objects should be possible and efficient too, though qserv may need some work to handle that optimally.

I think this works right now in PDAC, the rest of the qserv team should chime in soon.

BobMann · June 21, 2018, 1:39pm

Many thanks for that prompt reply, Brian; much appreciated.

Can you point me to some documentation (or example config files) showing how we would specify the VISTA partitioning scheme to ensure that its chunks would get sent to the same nodes as the corresponding LSST chunks?

Thanks again,

Bob

fritzm · June 21, 2018, 2:13pm

Hi Bob,

Assuming you are using the existing qserv-data-loader.py, it’s a matter of making sure that partitioning parameters “num-stripes”, “num-sub-stripes”, and “overlap” match for each invocation. Those three parameters determine the chunk-id to sky area mapping completely, and the chunk-id to worker mapping is persisted within the Qserv cluster itself where it is reused on each invocation of the data loader script.

Documentation is pretty sparse but as a few suggestions you might look at:

and docstrings within:

https://github.com/lsst/qserv/blob/master/admin/python/lsst/qserv/admin/dataLoader.py

There are a few example config files at:

https://github.com/lsst/partition/tree/master/config

BobMann · June 21, 2018, 2:29pm

Excellent - many thanks, Fritz; that’s very helpful.

We’ll take a good look at those docs now and have a go.

Thanks again,

Bob

Teng · August 6, 2018, 8:19am

Dear devs,

Thanks for your instructions.

As Bob has said, it will be difficult to apply a director-child relationship between the VISTA and LSST objects, so I have been doing spacial joint test queries between two equally partitioned tables.

I partitioned two tables from the UKIDSS data release (dxsSource and dxsDetection) and ingested the chunks into a 3-node Qserv instance (one head node and two worker nodes). For reference, the partitioning config files (common.cfg, dxsSource.cfg and dxsDetection.cfg) are attached. The two tables are partitioned and ingested with the same set of parameters.

Then I tried to perform one simple spacial joint query between them

SELECT s.sourceID, d.objID FROM UKIDSSDR8_dxsSource.dxsSource s, UKIDSSDR8_dxsDetection.dxsDetection d WHERE scisql_angSep(s.ra, s.decl, d.ra, d.decl) < 0.0167;

But Qserv complains

ERROR 4110 (Proxy): Query processing error: QI=?: Failed to instantiate query: AnalysisError:Query involves partitioned table joins that Qserv does not know how to evaluate using only partition-local data.

I used one of the recent Qserv docker containers (travis_master) for the test on a centos7 machine. For reference, the log file of the proxy during the query is also attached. I’m wondering whether Qserv currently supports such kind of queries? And if yes, what have I done wrong?

Many thanks,
Teng

common.cfg (586 Bytes)
dxsDetection.cfg (2.7 KB)

Teng · August 6, 2018, 8:20am

And other 2 attachments:

dxsSource.cfg (2.4 KB)
mysql-proxy-lua.log (65.8 KB)

ktl · August 6, 2018, 2:27pm

I think this is an instance of a similar problem we ran into in the PDAC: the Qserv loader uses different partitioning scheme identifiers for each load even if the partitioning parameters are identical. We found it necessary to manually patch the identifier in the Qserv-internal metadata to enable this join to work. The Qserv developers can provide more detail.

Teng · August 8, 2018, 8:46am

Thanks for your reply.

Is there a way to coordinate the partitioning scheme identifiers? This might need query against Qserv Meta-tables?

fritzm · August 8, 2018, 4:11pm

For now, some minor surgery on the css tables is required (longer term, we’ll be adding the ability to reference the partitioning scheme of an already-loaded database explicitly at load time so the css ends up correct.)

@gapon, could you comment here with the details please?

gapon · August 8, 2018, 5:51pm

Synchronizing partitioning schemes of catalogs in Qserv CSS is a straightforward operation. It has the following steps:

connect to the MariaDB service of the Qserv “czar” as user root
mysql --protocol=tcp -hlocalhost -P 3306 -uroot -p****
for any catalog you wish to get synchronized with others do (using catalog sdss_stripe82_01 which we have at NCSA/PDAC as an example):
SELECT kvKey,kvVal FROM qservCssData.kvData WHERE kvKey='/DBS/sdss_stripe82_01/.packed.json';

This will produce something like this:
+---------------------------------+--------------------------------------------------------------------------------+
| kvKey | kvVal |
+---------------------------------+--------------------------------------------------------------------------------+
| /DBS/wise_4band_00/.packed.json | {"partitioningId":"0000000026","releaseStatus":"RELEASED","storageClass":"L2"} |
+---------------------------------+--------------------------------------------------------------------------------+

update kvVal of other similar entries in CSS so that they would have the same string.

ATTENTION: the above mentioned output specifically refers to “partitioningId”:“0000000026”. In your case the identifier may be different. Just make sure all catalogs in your synchronized group have the same number which you will see in your CSS. You may see all partitioning identifiers in your CSS along with the partitioning parameters by:
SELECT kvKey,kvVal FROM qservCssData.kvData WHERE kvKey LIKE '/PARTITIONING/%';
Also make sure the synchronized catalogs have the same values reported by that query. For example, the one I was using earlier would have:
| /PARTITIONING/_0000000026/.packed.json | {"nStripes":"340","nSubStripes":"12","overlap":"0.016670"} |

Restart your Qserv so that it would reload the updated CSS

Teng · August 9, 2018, 9:34am

Thank you so much. I’ll give a try!

Teng · August 9, 2018, 1:08pm

Hi Igor,

I modified the css table under your instruction and the query is passing the validation.

But Qserv still complains “Table ‘qservResult.result_203’ doesn’t exist” after the query is submitted. I checked the log and found multiple errors of,

[1470] Error(s) in result for chunk #799: [1146] Table ‘UKIDSSDR8_dxsDetection.dxsDetection_799’ doesn’t exist

Maybe this is due to the non-alignment of the chunks on multiple servers:

$ qserv-admin.py “SHOW CHUNKS UKIDSSDR8_dxsDetection.dxsDetection”
chunk: 444
worker node: worker1
chunk: 479
worker node: worker2
chunk: 480
worker node: worker2
chunk: 523
worker node: worker2
chunk: 524
worker node: worker1
chunk: 758
worker node: worker1
chunk: 763
worker node: worker2
chunk: 799
worker node: worker1
chunk: 800
worker node: worker1

$ qserv-admin.py “SHOW CHUNKS UKIDSSDR8_dxsSource.dxsSource”
chunk: 444
worker node: worker1
chunk: 479
worker node: worker2
chunk: 480
worker node: worker1
chunk: 523
worker node: worker1
chunk: 524
worker node: worker2
chunk: 758
worker node: worker1
chunk: 763
worker node: worker2
chunk: 799
worker node: worker2
chunk: 800
worker node: worker1

Is there a way to force chunks with same IDs to sit on same machines or move them manually after ingested?

Many thanks,
Teng

gapon · August 9, 2018, 4:19pm

HI Teng

yes, that is most likely a reason of this problem. You would need to manually colocate chunks (of your catalogs) on the worker nodes. I should also let you know that I’ve been working on the Qserv Replication System which also takes care of this problem. Unfortunately, the system hasn’t reach a level when it would be easy to set it up. I only have two customized setups of the system (next to Qserv): NCSA/PDAC and IN2P3 (DM development cluster).

So, if you can then redistribute chunks in your setup manually. Otherwise I will be able to help you to set up the Replication system after I’m back from the LSST All Hands Meeting (which is next week Aug 13-17).

Igor

Teng · August 10, 2018, 7:54am

Hi Igor,

Thanks. I’ll try to manually move the chunks first as they are at small scale currently. Could you give me some basic instructions of that? I think this also involves surgery on kvData which I don’t fully understand.

Cheers,
Teng

gapon · August 10, 2018, 7:48pm

HI Teng

first of all, you don’t need making any further changes to CSS because of this (chunks) move. However, you will need to tell each Qserv’s node that its chunk list has been updated (after you finish moving the files and restart Qserv). I will give you instructions later.

Here is the idea of how to move the files. You need to look at each worker node’s MySQL directory where you have your databases. Let’s suppose you have 2 database whose chunks need to be collocated:
<qserv-data-dir>/mysql/db1
<qserv-data-dir>/mysql/db2

And let’s suppose you have the following tables (which can be named differently) in those databases:
db1.Object
db1.Source
db2.OtherObject
db2.OtherSource

In that case you should see the following files for each chunk (using chunk number 123 as an example) in the corresponding mysql directory of database db1:
.../mysql/db1/Object_123.frm
.../mysql/db1/Object_123.MYD
.../mysql/db1/Object_123.MYI
.../mysql/db1/ObjectFullOverlap_123.frm
.../mysql/db1/ObjectFullOverlap_123.MYD
.../mysql/db1/ObjectFullOverlap_123.MYI
.../mysql/db1/Source_123.frm
.../mysql/db1/Source_123.MYD
.../mysql/db1/Source_123.MYI
.../mysql/db1/SourceFullOverlap_123.frm
.../mysql/db1/SourceFullOverlap_123.MYD
.../mysql/db1/SourceFullOverlap_123.MYI

And, a similar picture will be for the second database:
.../mysql/db2/OtherObject_123.frm
.../mysql/db2/OtherObject_123.MYD
.../mysql/db2/OtherObject_123.MYI
.../mysql/db2/OtherObjectFullOverlap_123.frm
.../mysql/db2/OtherObjectFullOverlap_123.MYD
.../mysql/db2/OtherObjectFullOverlap_123.MYI
.../mysql/db2/OtherSource_123.frm
.../mysql/db2/OtherSource_123.MYD
.../mysql/db2/OtherSource_123.MYI
.../mysql/db2/OtherSourceFullOverlap_123.frm
.../mysql/db2/OtherSourceFullOverlap_123.MYD
.../mysql/db2/OtherSourceFullOverlap_123.MYI

Now, what you need to move (not copy!) those files close to each other so that all files of chunk 123 were on the same worker node:
.../mysql/db1/*_123.MYI
.../mysql/db2/*_123.MYI

NOTES:

Please, ignore a special chunk number 1234567890. This chunk must be already present on all worker nodes of all databases and tables.
the chunk collocation requirement only applies to chunk numbers which are present in both databases. And you should not don’t care about other chunks. Though, quite frankly, I’m not sure what Qserv will do if you’ll be trying to do JOINs in those areas where the other database doesn’t have any data.

And it’s up to you to decide on how you balance chunks across worker nodes. It doesn’t matter except you want some more-or-less equal spread of chunks across the workers to prevent some of them being overloaded and others underutilized while processing the shared scan queries.

The next step will be to tell Qserv that you moved the files. I will explain this in a separate message.,

Regards,
Igor