The transientBucket
table has become a little unruly and can be difficult to manage and navigate. I’ve therefore sliced this table up in to various sub-tables. Incidentally this has helped me generalise the marshall so that the same core-tables can be used by multiple survey webapps, each survey having its own set of book-keeping tables much like the current pesstoObjects
table.
The new core tables are:
marshall_sources
- one row for each unique source in the marshall with RA, DEC and Sherlock Classificationmarshall_sources_discoveries
- discovery information for each transient from each of the surveys that have ‘discovered’ itmarshall_sources_photometry
- all survey photometry info for each transient, including non-detection limitsmarshall_sources_spectra
- all spectral info for each transientmarshall_sources_related_files
- a list of related files associated with each transient
Table of Contents
Refactoring and Cleaning
Before slicing the table up it was important to remove any duplicate entries from the transientBucket
table. Here’s the query I have been using to find duplicates:
SELECT DISTINCT
transientBucketId, survey, name, spectralType
FROM
transientBucket
WHERE
name IN (SELECT
name
FROM
(SELECT
COUNT(*) AS count, transientBucketId, name, survey
FROM
(SELECT DISTINCT
transientBucketId, name, survey
FROM
transientBucket
WHERE
survey NOT IN ('atel-coords' , 'atel-names', 'bright sn list', 'atel', 'pessto', 'TNS', 'ATLAS FP')
AND name NOT LIKE 'AT2%'
AND name NOT LIKE 'SN20%'
AND name NOT LIKE 'SN20%'
AND name NOT LIKE 'SN21%'
AND name NOT LIKE 'SN22%'
and spectralType is null
ORDER BY name) AS c
GROUP BY name) d
WHERE
count > 1)
AND survey NOT IN ('atel-coords' , 'atel-names',
'bright sn list',
'atel',
'pessto',
'TNS',
'ATLAS FP')
and spectralType is null
ORDER BY name , dateCreated;
The table is now free of duplicates.
I’ve also made an attempt at homogenising survey names across the marshall as a survey name entered by a user or from other external sources often does not match the survey names used by the automated import scripts. Here’s the current survey cleaning script I run before syncing the transientBucket with the new core-tables:
update transientBucket set survey = "CRTS-SSS" where survey = "CRTS" and name like "SSS%";
update transientBucket set survey = "CRTS-MLS" where survey = "CRTS" and name like "MLS%";
update transientBucket set survey = "CRTS-CSS" where survey = "CRTS" and name like "CSS%";
update transientBucket set survey = "CRTS-SSS" where survey = "CRTS-SSS" and name like "SSS%";
update transientBucket set survey = "CRTS-MLS" where survey = "CRTS-MLS" and name like "MLS%";
update transientBucket set survey = "CRTS-CSS" where survey = "CRTS-CSS" and name like "CSS%";
update transientBucket set survey = "CBET" where survey = "" or survey = "unknown" and name like "SN%";
update transientBucket set survey = "Gaia" where survey = "" or survey = "unknown" and name like "Gaia%";
update transientBucket set survey = "ASASSN" where survey = "" or survey = "unknown" and name like "ASASSN%";
update transientBucket set survey = "TOCP" where survey = "" or survey = "unknown" and name like "PSN%";
update transientBucket set survey = "MASTER" where survey = "" or survey = "unknown" and name like "MASTER%";
update transientBucket set survey = "LSQ" where survey = "" or survey = "unknown" and name like "LSQ%";
update transientBucket set survey = "PS1" where survey = "" or survey = "unknown" and name like "PS1%";
update transientBucket set survey = "DECam" where name in ("EMILIA", "GRETA", "GRETA_2", "PAMELA");
update transientBucket set survey = "amateur" where survey like "amateur%";
update transientBucket set survey = "Berto Monard" where survey like "%Monard%";
update transientBucket set survey = "ASASSN" where survey like "ASAS%";
update transientBucket set survey = "CBET" where survey like "cbat%";
update transientBucket set survey = "CBET" where survey like "CBET%";
update transientBucket set survey = "Gaia" where survey = "GaiaAlerts";
update transientBucket set survey = "MASTER" where survey = "master";
update transientBucket set survey = "TOCP" where survey = "TOPC";
update transientBucket set survey = "TNTS" where survey like "TNTS%";
update transientBucket set survey = "CHASE" where survey = "chase";
update transientBucket set survey = "OGLE" where survey = "OGLE-IV";
update transientBucket set survey = "Pan-STARRS1" where survey = "PanSTARRS";
update transientBucket set survey = "TOCP" where survey = "tocp";
update transientBucket set survey = "Pan-STARRS1" where survey = "PSST";
update transientBucket set survey = "Pan-STARRS1" where survey like "PS1%";
update transientBucket set survey = "CRTS" where name like "MLS%" and spectralType is null;
update transientBucket set survey = "MASTER" where name like "MASTER%" and spectralType is null;
update transientBucket set survey = "TOCP" where name like "PSN%" and spectralType is null;
Table Creation and Sync
Here is the sync_transientbucket_to_core_tables.sql script and a summary of what’s going on:
marshall_sources
Here’s the command to sync transientBucket
> marshall_sources
INSERT IGNORE into marshall_sources (marshallId,raDeg,decDeg,dateCreated,sherlockClassification)
SELECT transientBucketId,raDeg,decDeg,dateCreated,sherlockClassification from transientBucket t where masterIdFlag = 1
ON DUPLICATE KEY UPDATE marshallId=t.transientBucketId,raDeg=t.raDeg,decDeg=t.decDeg,dateCreated=t.dateCreated,sherlockClassification=t.sherlockClassification;
Note, on first import of this table there are 32,000 unique sources
marshall_sources_discoveries
The marshall_sources_discoveries
table hosts the survey specific data for a transient such as URL, survey name etc. The survey name of the transient must be unique in this table.
Here’s the command to export data from the transientBucket
into the marshall_sources_discoveries
table. Note this command tries to always update the column value with non-null entries from the transientBucket
.
INSERT IGNORE into marshall_sources_discoveries (marshallId,surveyId,survey,discoveryPhase,surveyObjectUrl,transientTypePrediction,transientTypePredicationSource,hostRedshift,hostRedshiftType,referenceImageUrl,targetImageUrl,subtractedImageUrl,tripletImageUrl,finderImageUrl,lightcurveURL,dateCreated,masterId,raDeg,decDeg)
select transientBucketId,name,survey,discoveryPhase,surveyObjectUrl,transientTypePrediction,transientTypePredicationSource,hostRedshift,hostRedshiftType,referenceImageUrl,targetImageUrl,subtractedImageUrl,tripletImageUrl,finderImageUrl,lightcurveURL,dateCreated,masterIdFlag,raDeg,decDeg from transientBucket t
where survey not in ("atel","atel-coords","atel-names","ATLAS FP","bright sn list","TNS","LSQ discoveries","non-pessto atel")
ON DUPLICATE KEY UPDATE
marshallId = IF(t.transientBucketId is not null, VALUES(marshallId), marshall_sources_discoveries.marshallId),
surveyId = IF(t.name is not null, VALUES(surveyId), marshall_sources_discoveries.surveyId),
survey = IF(t.survey is not null, VALUES(survey), marshall_sources_discoveries.survey),
discoveryPhase = IF(t.discoveryPhase is not null, VALUES(discoveryPhase), marshall_sources_discoveries.discoveryPhase),
surveyObjectUrl = IF(t.surveyObjectUrl is not null, VALUES(surveyObjectUrl), marshall_sources_discoveries.surveyObjectUrl),
transientTypePrediction = IF(t.transientTypePrediction is not null, VALUES(transientTypePrediction), marshall_sources_discoveries.transientTypePrediction),
transientTypePredicationSource = IF(t.transientTypePredicationSource is not null, VALUES(transientTypePredicationSource), marshall_sources_discoveries.transientTypePredicationSource),
hostRedshift = IF(t.hostRedshift is not null, VALUES(hostRedshift), marshall_sources_discoveries.hostRedshift),
hostRedshiftType = IF(t.hostRedshiftType is not null, VALUES(hostRedshiftType), marshall_sources_discoveries.hostRedshiftType),
referenceImageUrl = IF(t.referenceImageUrl is not null, VALUES(referenceImageUrl), marshall_sources_discoveries.referenceImageUrl),
targetImageUrl = IF(t.targetImageUrl is not null, VALUES(targetImageUrl), marshall_sources_discoveries.targetImageUrl),
subtractedImageUrl = IF(t.subtractedImageUrl is not null, VALUES(subtractedImageUrl), marshall_sources_discoveries.subtractedImageUrl),
tripletImageUrl = IF(t.tripletImageUrl is not null, VALUES(tripletImageUrl), marshall_sources_discoveries.tripletImageUrl),
finderImageUrl = IF(t.finderImageUrl is not null, VALUES(finderImageUrl), marshall_sources_discoveries.finderImageUrl),
lightcurveURL = IF(t.lightcurveURL is not null, VALUES(lightcurveURL), marshall_sources_discoveries.lightcurveURL),
dateCreated = IF(t.dateCreated is not null, VALUES(dateCreated), marshall_sources_discoveries.dateCreated),
masterId = IF(t.masterIdFlag is not null, VALUES(masterId), marshall_sources_discoveries.masterId),
raDeg = IF(t.raDeg is not null, VALUES(raDeg), marshall_sources_discoveries.raDeg),
decDeg = IF(t.decDeg is not null, VALUES(decDeg), marshall_sources_discoveries.decDeg);
The first export resulted in 30,000 new rows. The 2,000 difference from the marshall_sources
and marshall_sources_discoveries
is mainly due to old sources from the TNS not having an internal survey name.
marshall_sources_photometry
Here’s the command to export from the transientBucket
to marshall_sources_photometry
:
INSERT IGNORE INTO marshall_sources_photometry (marshallId,surveyId,observationDate,observationMJD,magnitude,magnitudeError,filter,telescope,instrument,dateDeleted,limitingMag,dateCreated)
select transientBucketId,name,observationDate,observationMJD,magnitude,magnitudeError,filter,telescope,instrument,dateDeleted,limitingMag,dateCreated from transientBucket t where survey not in ("atel","atel-coords","atel-names","ATLAS FP","bright sn list","TNS","LSQ discoveries","non-pessto atel") and magnitude is not null
ON DUPLICATE KEY UPDATE
marshallId = IF(t.transientBucketId is not null, VALUES(marshallId), marshall_sources_photometry.marshallId),
observationDate = IF(t.observationDate is not null, VALUES(observationDate), marshall_sources_photometry.observationDate),
magnitudeError = IF(t.magnitudeError is not null, VALUES(magnitudeError), marshall_sources_photometry.magnitudeError),
filter = IF(t.filter is not null, VALUES(filter), marshall_sources_photometry.filter),
telescope = IF(t.telescope is not null, VALUES(telescope), marshall_sources_photometry.telescope),
instrument = IF(t.instrument is not null, VALUES(instrument), marshall_sources_photometry.instrument),
dateDeleted = IF(t.dateDeleted is not null, VALUES(dateDeleted), marshall_sources_photometry.dateDeleted),
limitingMag = IF(t.limitingMag is not null, VALUES(limitingMag), marshall_sources_photometry.limitingMag),
dateCreated = IF(t.dateCreated is not null, VALUES(dateCreated), marshall_sources_photometry.dateCreated);
The first export resulted in 365,000 new rows.
marshall_sources_spectra
Here’s the spectra export command:
INSERT IGNORE INTO marshall_sources_spectra (marshallId,surveyId,survey,observationDate,observationMJD,transientRedshift,transientRedshiftNotes,spectralType,telescope,instrument,reducer,dateCreated,classificationWRTMax,classificationPhase)
select transientBucketId,name,survey,observationDate,observationMJD,transientRedshift,transientRedshiftNotes,spectralType,telescope,instrument,reducer,dateCreated,classificationWRTMax,classificationPhase from transientBucket t where spectralType is not null
ON DUPLICATE KEY UPDATE
marshallId = IF(t.transientBucketId is not null, VALUES(marshallId), marshall_sources_spectra.marshallId),
survey = IF(t.survey is not null, VALUES(survey), marshall_sources_spectra.survey),
observationDate = IF(t.observationDate is not null, VALUES(observationDate), marshall_sources_spectra.observationDate),
transientRedshift = IF(t.transientRedshift is not null, VALUES(transientRedshift), marshall_sources_spectra.transientRedshift),
transientRedshiftNotes = IF(t.transientRedshiftNotes is not null, VALUES(transientRedshiftNotes), marshall_sources_spectra.transientRedshiftNotes),
spectralType = IF(t.spectralType is not null, VALUES(spectralType), marshall_sources_spectra.spectralType),
telescope = IF(t.telescope is not null, VALUES(telescope), marshall_sources_spectra.telescope),
instrument = IF(t.instrument is not null, VALUES(instrument), marshall_sources_spectra.instrument),
reducer = IF(t.reducer is not null, VALUES(reducer), marshall_sources_spectra.reducer),
dateCreated = IF(t.dateCreated is not null, VALUES(dateCreated), marshall_sources_spectra.dateCreated),
classificationWRTMax = IF(t.classificationWRTMax is not null, VALUES(classificationWRTMax), marshall_sources_spectra.classificationWRTMax),
classificationPhase = IF(t.classificationPhase is not null, VALUES(classificationPhase), marshall_sources_spectra.classificationPhase);
The first export resulted in 11,000 new rows.
marshall_sources_related_files
Not yet implemented but I will use it to keep track of source related files (spectra, finder charts, image stamps, lightcurve pngs etc) from around the web.