Automated Active Learning for ZTF Data

  • Formulated a robust active learning framework with automation in selecting the data points to be labeled to reduce human efforts by 90% and improve the performance of existing classification models such as DNN and XGBoost.
  • Analyzed billions of astronomical sources and their time-series representation of varying intensities (light curves) from the Zwicky Transient Facility (ZTF) survey and used API queries and data visualization for preprocessing tasks

Quick Links:

Active Learning Pipeline

flowchart

Active learning pipeline that selects the most impactful sources using: i. the 6 types based on the classification scores, ii. spatial cross-matching and demanding agreement between the g and r bands, iii. braai scores, and iv. amplitudes.

Consolidated data format

Relabelling has to be done by trained professionals, and they should be provided all the useful information in a consolidated form to save time. Thus, created combined plots of all the required information in a single image, as shown below

flowchart

Combined image containing the consolidated data to be shown to the domain expert for labelling containing the following information: light curve, phase-folded light curve, cutouts, Gaia HR diagram, class scores and metadata such as ra, dec, period, amplitude, standard deviation and inverse von Neumann ratio

Slide Deck