The UK DAC plans to host non-LSST survey datasets (e.g. from VISTA near-IR surveys) within qserv alongside the LSST release databases. Some of these external survey databases contain tables large enough to require spatial partitioning, and, clearly, we need them to follow the same partitioning scheme as the LSST data to ensure that spatial joins involve data on the same machine.
The problem is that the VISTA and LSST objects may not be related in a way that readily maps onto the director-child relationship assumed for partitioning in qserv. For our current tests (which are using a UKIDSS data release), we are trying to fudge things by adding an extra column to the child table that will contain the value of the ID attribute for an Object (director) that is close to each Source (child).
That should work, but is clunky, so we were wondering whether it might be possible to circumvent the director-child relationship and do the partitioning using the (ra, dec) ranges of the chunks directly - i.e. if we’ve ingested an LSST data release partitioned spatially according to the positions in the rows in the Object table, are the (ra, dec) ranges of the chunks so defined recorded somewhere in such a way that we could then ingest our VISTA tables into corresponding chunks using those range values directly?
The VISTA Object tables would just need to have the same partitioning scheme (aka “Chunking Scheme”) as the LSST tables but they effectively be their own, mutually compatible, director (and may also have their own child tables), so that qserv can distribute the chunks to the servers in a colocated manner. As you’ve pointed out, the Director-Child partitioning is really a reference partitioning scheme where the child is colocated next to the director, which is spatially partitioned according to the chunking scheme. Spatial Joins between LSST Object and VISTA Object should be efficient and executed on-node. It should also be the case that joining between child (Source) objects should be possible and efficient too, though qserv may need some work to handle that optimally.
I think this works right now in PDAC, the rest of the qserv team should chime in soon.
Many thanks for that prompt reply, Brian; much appreciated.
Can you point me to some documentation (or example config files) showing how we would specify the VISTA partitioning scheme to ensure that its chunks would get sent to the same nodes as the corresponding LSST chunks?
Assuming you are using the existing qserv-data-loader.py, it’s a matter of making sure that partitioning parameters “num-stripes”, “num-sub-stripes”, and “overlap” match for each invocation. Those three parameters determine the chunk-id to sky area mapping completely, and the chunk-id to worker mapping is persisted within the Qserv cluster itself where it is reused on each invocation of the data loader script.
Documentation is pretty sparse but as a few suggestions you might look at:
As Bob has said, it will be difficult to apply a director-child relationship between the VISTA and LSST objects, so I have been doing spacial joint test queries between two equally partitioned tables.
I partitioned two tables from the UKIDSS data release (dxsSource and dxsDetection) and ingested the chunks into a 3-node Qserv instance (one head node and two worker nodes). For reference, the partitioning config files (common.cfg, dxsSource.cfg and dxsDetection.cfg) are attached. The two tables are partitioned and ingested with the same set of parameters.
Then I tried to perform one simple spacial joint query between them
SELECT s.sourceID, d.objID FROM UKIDSSDR8_dxsSource.dxsSource s, UKIDSSDR8_dxsDetection.dxsDetection d WHERE scisql_angSep(s.ra, s.decl, d.ra, d.decl) < 0.0167;
But Qserv complains
ERROR 4110 (Proxy): Query processing error: QI=?: Failed to instantiate query: AnalysisError:Query involves partitioned table joins that Qserv does not know how to evaluate using only partition-local data.
I used one of the recent Qserv docker containers (travis_master) for the test on a centos7 machine. For reference, the log file of the proxy during the query is also attached. I’m wondering whether Qserv currently supports such kind of queries? And if yes, what have I done wrong?
I think this is an instance of a similar problem we ran into in the PDAC: the Qserv loader uses different partitioning scheme identifiers for each load even if the partitioning parameters are identical. We found it necessary to manually patch the identifier in the Qserv-internal metadata to enable this join to work. The Qserv developers can provide more detail.
For now, some minor surgery on the css tables is required (longer term, we’ll be adding the ability to reference the partitioning scheme of an already-loaded database explicitly at load time so the css ends up correct.)
@gapon, could you comment here with the details please?
Synchronizing partitioning schemes of catalogs in Qserv CSS is a straightforward operation. It has the following steps:
connect to the MariaDB service of the Qserv “czar” as user root mysql --protocol=tcp -hlocalhost -P 3306 -uroot -p****
for any catalog you wish to get synchronized with others do (using catalog sdss_stripe82_01 which we have at NCSA/PDAC as an example): SELECT kvKey,kvVal FROM qservCssData.kvData WHERE kvKey='/DBS/sdss_stripe82_01/.packed.json';
This will produce something like this: +---------------------------------+--------------------------------------------------------------------------------+ | kvKey | kvVal | +---------------------------------+--------------------------------------------------------------------------------+ | /DBS/wise_4band_00/.packed.json | {"partitioningId":"0000000026","releaseStatus":"RELEASED","storageClass":"L2"} | +---------------------------------+--------------------------------------------------------------------------------+
update kvVal of other similar entries in CSS so that they would have the same string.
ATTENTION: the above mentioned output specifically refers to “partitioningId”:“0000000026”. In your case the identifier may be different. Just make sure all catalogs in your synchronized group have the same number which you will see in your CSS. You may see all partitioning identifiers in your CSS along with the partitioning parameters by: SELECT kvKey,kvVal FROM qservCssData.kvData WHERE kvKey LIKE '/PARTITIONING/%';
Also make sure the synchronized catalogs have the same values reported by that query. For example, the one I was using earlier would have: | /PARTITIONING/_0000000026/.packed.json | {"nStripes":"340","nSubStripes":"12","overlap":"0.016670"} |
Restart your Qserv so that it would reload the updated CSS
yes, that is most likely a reason of this problem. You would need to manually colocate chunks (of your catalogs) on the worker nodes. I should also let you know that I’ve been working on the Qserv Replication System which also takes care of this problem. Unfortunately, the system hasn’t reach a level when it would be easy to set it up. I only have two customized setups of the system (next to Qserv): NCSA/PDAC and IN2P3 (DM development cluster).
So, if you can then redistribute chunks in your setup manually. Otherwise I will be able to help you to set up the Replication system after I’m back from the LSST All Hands Meeting (which is next week Aug 13-17).
Thanks. I’ll try to manually move the chunks first as they are at small scale currently. Could you give me some basic instructions of that? I think this also involves surgery on kvData which I don’t fully understand.
first of all, you don’t need making any further changes to CSS because of this (chunks) move. However, you will need to tell each Qserv’s node that its chunk list has been updated (after you finish moving the files and restart Qserv). I will give you instructions later.
Here is the idea of how to move the files. You need to look at each worker node’s MySQL directory where you have your databases. Let’s suppose you have 2 database whose chunks need to be collocated: <qserv-data-dir>/mysql/db1 <qserv-data-dir>/mysql/db2
And let’s suppose you have the following tables (which can be named differently) in those databases: db1.Object db1.Source db2.OtherObject db2.OtherSource
In that case you should see the following files for each chunk (using chunk number 123 as an example) in the corresponding mysql directory of database db1: .../mysql/db1/Object_123.frm .../mysql/db1/Object_123.MYD .../mysql/db1/Object_123.MYI .../mysql/db1/ObjectFullOverlap_123.frm .../mysql/db1/ObjectFullOverlap_123.MYD .../mysql/db1/ObjectFullOverlap_123.MYI .../mysql/db1/Source_123.frm .../mysql/db1/Source_123.MYD .../mysql/db1/Source_123.MYI .../mysql/db1/SourceFullOverlap_123.frm .../mysql/db1/SourceFullOverlap_123.MYD .../mysql/db1/SourceFullOverlap_123.MYI
And, a similar picture will be for the second database: .../mysql/db2/OtherObject_123.frm .../mysql/db2/OtherObject_123.MYD .../mysql/db2/OtherObject_123.MYI .../mysql/db2/OtherObjectFullOverlap_123.frm .../mysql/db2/OtherObjectFullOverlap_123.MYD .../mysql/db2/OtherObjectFullOverlap_123.MYI .../mysql/db2/OtherSource_123.frm .../mysql/db2/OtherSource_123.MYD .../mysql/db2/OtherSource_123.MYI .../mysql/db2/OtherSourceFullOverlap_123.frm .../mysql/db2/OtherSourceFullOverlap_123.MYD .../mysql/db2/OtherSourceFullOverlap_123.MYI
Now, what you need to move (not copy!) those files close to each other so that all files of chunk 123 were on the same worker node: .../mysql/db1/*_123.MYI .../mysql/db2/*_123.MYI
NOTES:
Please, ignore a special chunk number 1234567890. This chunk must be already present on all worker nodes of all databases and tables.
the chunk collocation requirement only applies to chunk numbers which are present in both databases. And you should not don’t care about other chunks. Though, quite frankly, I’m not sure what Qserv will do if you’ll be trying to do JOINs in those areas where the other database doesn’t have any data.
And it’s up to you to decide on how you balance chunks across worker nodes. It doesn’t matter except you want some more-or-less equal spread of chunks across the workers to prevent some of them being overloaded and others underutilized while processing the shared scan queries.
The next step will be to tell Qserv that you moved the files. I will explain this in a separate message.,