Machine Learning

Statgraphics contains a large number of procedures for machine learning. Machine learning is a branch of AI (Artificial Intelligence) where algorithms are used to recognize patterns in data. It is used in many fields, including speech recognition, email filtering, medicine, agriculture, and quality assurance.

Three major sets of algorithms are available in Statgraphics:

Methodology
Unsupervised methods for dividing data into clusters.
Unsupervised methods for identifying anomalies.
Supervised methods for implementing predictive analytics.

Cluster Analysis

Statgraphics contains 4 procedures for clustering multivariate data:

  1. Affinity Propagation - The Affinity Propagation procedure implements a machine-learning process to create groups or clusters of multivariate quantitative observations. A distinct advantage of affinity propagation is that it does not require the user to specify the number of clusters. Instead, it searches for exemplars, which are observations that best represent a set of observations that then are placed in a cluster.
  2. DBSCAN - The DBSCAN procedure (Density-Based Spatial Clustering of Applications with Noise) implements a different machine-learning process to create groups or clusters. It is capable of identifying high-density clusters of arbitrary shapes. Two distinct advantages of DBSCAN over other clustering techniques are: (1) it does not require that the user specify beforehand the number of clusters present in the data; and (2)  observations in low-density areas may be classified as noise and not assigned to any cluster.
  3. K-Means Clustering - The K-Means Clustering procedure implements a machine-learning process that divides data into a pre-specified number of clusters. Clusters are created by grouping observations which are close together in the space of the input variables. 
  4. Spectral Clustering - In spectral clustering, data points are treated as nodes of a graph. The points are projected into a lower dimensional space where the clusters are formed.

Anomaly (Outlier) Detection

Statgraphics contains 4 procedures for detecting unusual data values:

  1. Local Outlier Factor 
  2. Isolation Forest 
  3. One-Class SVM
  4. Grubb's Test
Details are available on the Outlier Detection page.


Step 1: selects the output variable and features that will be considered as predictors.
Step 2: divides the cases into training, test and prediction sets.
Step 3: sets the values of any SML Assistant options.
Step 4: applies one or more of 10 methods for constructing predictive models.
Step 5: uses the models to make predictions for cases in which the output value is unknown.