Machine Learning for Flare Prediction Study

supervised machine learning for flare prediction n.w
1 / 9
Embed
Share

Explore the impact of features and training set generation on flare prediction using supervised machine learning. Focus on space weather forecasting, feature-based modeling, and the importance of data preparation for accurate results. Utilizes SHARP data products, GOES data for labeling, and various machine learning methods for prediction and feature ranking.

  • Machine Learning
  • Flare Prediction
  • Space Weather Forecasting
  • Data Preparation
  • Feature Ranking

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. supervised machine learning for flare prediction: the impact of features and of the training set generation process on the forecasting performances michele piana the MIDA group dipartimento di matematica universit di genova CNR SPIN genova ESWW16, liege, november 21 2019

  2. the context space weather forecasting using machine learning: focus on flares feature-based, supervised machine learning SDO/HMI magnetograms as input data; GOES data for labelling main messages: data preparation matters information hidden in active regions is (highly) redundant binary, deterministic prediction is still a dream

  3. data, features and labels data: SHARP data products in the SDO/HMI database (2D images of continuum intensity, the full three-component magnetic field vector, and the line-of-sight component of each photospheric HARP) time range: from 2012 September 14 to 2016 April 30 24 hour sampling features: property extraction algorithms (credit: FLARECAST) led to 167 feautures flare association: each SHARP AR is associated to either a GOES solar flare with class C1 and above (C1+) or a GOES solar flare with class M1 and above (M1+) flare association leads to four more features (peak magnitude, time difference with start time, time difference with peak time, time difference with end time)

  4. data preparation four training sets are generated, corresponding to four issuing times (00:00 UT, 06:00 UT, 12:00 UT, 18:00 UT) focus on ARs rather then feature vectors: o2/3 ARs randomly extracted from the set of all ARs belonging to a specific issuing time othe 171-dimension feature vectors belonging to the extracted ARs are labelled whether a C1+ flare occurred in the next 24 hours the test set is made of the feature vectors belonging to the remaining 1/3 ARs training/test set generation is randomly repeated 100 times to enable statistical analysis

  5. prediction and feature ranking machine learning methods (credit: FLARECAST): ohybrid LASSO orandom forest oother supervised considered (SVC, logit) optimization: unsupervised fuzzy clustering (benvenuto, piana, campi, massone; ApJ; 2018) feature ranking: oboth ML methods compute relative importance associated to each feature orecursive feature elimination: 1. train the classifier 2. compute the importance for each feature 3. remove the feature with smallest importance and begin again

  6. results: about training and scores (campi, benvenuto, massone, bloomfield, geourgoulis, piana; ApJ; 2019) training according to active regions training according to features

  7. results: top-ten (campi, benvenuto, massone, bloomfield, geourgoulis, piana; ApJ; 2019) number of times each feature is selected in the top-10 rankings, on average over the 100 random realizations of the test set, for all issuing times

  8. results: redundancy of information(campi, benvenuto, massone, bloomfield, geourgoulis, piana; ApJ; 2019) TSS scores obtained by using just the 10 top-ten features added one at a time

  9. conclusions main messages: data preparation matters information hidden in active regions is (highly) redundant binary, deterministic prediction is still a dream in progress: systematic analysis of the top-ten features in the ranking (robustness with respect to method, issuing time, training set) comparison with deep learning exploring the connections between machine learning and numerical models

More Related Content