IVOA Interop meeting notes (May 2016)

IVOA Meeting Summary

These are some of my notes about the IVOA Interop meeting in South Africa. I’m focusing on Data Access and probably leaving quite a bit out, so feel free to ask questions about specific things and I’ll answer in the comments. Due to the parallel tracks, I did miss some talks. My primary focus while attending was on Data Access Layer (DAL) and Grid and Web Services (GWS) talks. My actual talk is linked in this DM Highlights post.

Motivation of attending the meeting

LSST has several interests in VO interfaces and their implementations. First off, it is imperative that we support standardized interfaces for data access so that our users can reuse community-standard tools, such as TOPCAT. Secondly, for Data Access, HTTP interfaces are an extremely flexible way of providing and managing data access both inside a datacenter and across the internet, and the IVOA has several standards we could potentially leverage where they make sense. Finally, it’s in our best interest to reuse data access interface implementations and code where we can in order to reduce our workload and hopefully improve the existing body of work where we can.

Implementations

IRSA/IPAC

IRSA is likely the largest repository of VO-accessible data in terms of data volume, as many legacy catalogs are available in addition to WISE data. Most of the VO code is in C++.

SIA (v1)
  • ~10k queries a day for most of 2015, bursting up to 30k average since February.
SCS
  • 150 - 300 unique IPs across 100-150 subnets.
  • 15k average queries/day, with a spike up to 225k queries a day for a few months (one user)
  • Small search radius queries are ran in-process, larger queries farmed out to SLURM.
TAP
  • ~80k-120k average queries/day over a year, mostly North America
    • Completely dominated by one NEOWISE-R user, otherwise average of 300-10k queries a day.
  • sync: 20-170 IPs, 10-70 subnets
  • async: 21 IPs, 12 subnets
    • Germans (ab)used this to tile WISE data in search for Planet Nine
    • async not used as much as sync
ADQL

IRSA plans on submitting a variant of ADQL with restricted geometry for standardization (could be useful to collaborate here)

Registry

Due to the variety of data they have across several experiments, they have decided to run their own registry, but have had quite a bit of pain in doing so.

CADC

The Data Center in CADC is heavily built around VO standards and interfaces. Most of the software is implemented in Java, and quite a bit of work goes into implementing most of the VO standards wherever possible. In general, some people use CADC implementations are reference implementations (e.g. NED)

SIA v2

SIA v2 is heavily oriented around SODA which is a standard that more formally describes how transformations, or more generally, server actions, can be implemented. They do have a production SODA implementation which can operate on multi-d data, reducing data according to spatial, spectral, temporal and polarization parameters.

TAP

CADC is currently implementing TAP 1.1 as the standard progresses. They export both CAOM2 and ObsCore 1.1 archive metadata through their TAP interfaces. They have additional endpoints for the same tap service which are specific to authentication schemes (i.e. anon, username+password, x509)

Other news from IVOA

CDS/VizieR

CDS has put a lot of work into their web portal. Would be interesting for SUIT team:

CASDA

CASDA (CSIRO ASKAP Science Data Archive) is implementing SODA for their images, but they’ve only implemented async as they have quite a bit of data on tape. Their service returns full images or can perform spatial and plane cutouts.

Cosmopterix

  • Docker containers for database platforms. This is intended to quickly get up, test, and/or validate TAP and ADQL platforms

Data Formats

  • Tom McGlynn had a good talk on issues with VOTable validation. Several services aren’t actually producing proper VOTable output. TOPCAT has to be extremely lenient in what it accepts.
  • Mark Taylor is working on TAPLint to validate TAP services (and VOTable output); this could be extremely useful for us as we’re working on our own implementation.
  • Consensus that HDF5 is important, no consensus on what to do about it in the current term.

Collaborations

Qserv

  • Stelios Voutsinas, Dave Morris (both at Edinburg) are interested in Qserv for Cosmopterix

  • Dave Morris is interested in running a Qserv Cluster for testing

    • Both were notified this may be possible around fall
  • Stelios and Dave also have history of ADQL queries from their services. I believe Gregory Mantelet (GAVO) does too.

    • We would like to mine that data and understand common query patterns, maybe use a parser to identify structure. This could help inform Qserv team as well as future ADQL implementations
  • Matthew Graham interested in Qserv as well (not sure if this is for Caltech or AURA)

TAP

  • Mark Taylor says if LSST implemented TAP many many people would bereally happy

    • Doesn’t think it makes sense to bother with SIA right now because dust hasn’t settled
  • Consensus that you can go ahead and implement whichever response formats you want from TAP, just understand they might not become a standard

  • Near universal agreement there should be some JSON output from TAP

    • But no agreement on if that should be 1:1 mapping from VOTable+XML, or should be more of enhanced version of CSV output
  • Walter Landry has some statistics on the popularity of IPAC’s catalogs

    • Walter is curious about Simple Cone Search response time (Serge advertised 30ms)
  • No real traction for officially adding LIMIT keyword to ADQL (TOP is descended from SQL Server)

  • Christophe Arviset enthusiastically asked during talk about LSST contributions to (TAP/ADQL)
    specifications. I mentioned it’s probably too late in this cycle of TAP/ADQL to doo much about anything, and that TAP 2/ADQL 3 would be our target if anything.

1 Like