Date: Wednesday 26 June 2013
Time: 01.00 PM to 02.00 PM
Venue: 9138 Cantor

Speaker: Dr Kassim Mwitondi

Kassim’s research interests are in the enhancement and application of data mining models for detecting potentially useful information from large dimensional datasets and in the global promotion of open source applications in scientific research.

Title: Using optimized distributional parameters as inputs in a sequential unsupervised and supervised modelling of sunspots data

Detecting naturally arising structures in data is central to knowledge extraction from data. In most applications, the main challenge is in the choice of the appropriate model for exploring the data features. The choice is generally poorly understood and any tentative choice may be too restrictive. Growing volumes of data, disparate data sources and modelling techniques entail the need for model optimization via adaptability rather than comparability. We propose a novel two-stage algorithm to modelling continuous data consisting of an unsupervised stage whereby the algorithm searches through the data for optimal parameter values and a supervised stage that adapts the parameters for predictive modelling. 

The method is implemented on the sunspots data with inherently Gaussian distributional properties and assumed bi-modality. Optimal values separating high from lows cycles are obtained via multiple simulations. Early patterns for each recorded cycle reveal that the first 3 years provide a sufficient basis for predicting the peak. Multiple Support Vector Machine runs using repeatedly improved data parameters show that the approach yields greater accuracy and reliability than conventional approaches and provides a good basis for model selection. Model reliability is established via multiple simulations of this type.

