Machine Learning

Machine Learning Algorithms

ML Modelling

Machine learning models are trained programs (also called files, software, processes, or any other type of systems), intending to produce outcomes (find patterns, classify items, detect and track, predict values, and so on) based on decisions made through self-learning, exploration, and data (both labelled and unlabelled). ML modelling operates through mathematical algorithms.

This page summaries some of the most popular and commonly used machine learning algorithms.

Supervised learning

Supervised learning uses labelled data. It splits into two main techniques: regression and classification.

Regression

Regression measures the relationship between dependent and independent variables.

  • Simple Linear Regression is used to make predictions by identifying the relationship between two variables, dependent and independent.
  • Multiple Linear Regression makes predictions by identifying the relationship between three or more variables, dependent and independents.
  • Polynomial Regression uses the relationship between variables to predict outcomes in non-linear problems.

Classification

Classification measures the probability of a variable falling into a predetermined category.

  • Logistic Regression makes predictions by measuring probability, aiming at classifying a categorical output.
  • Naïve Bayes is a probabilistic classifier based on Bayes theorem, assuming predictors independence.
  • K-Nearest Neighbour (KNN) classifies data by measuring proximity and association between data points.

Regression & Classification

  • Decision Trees either make predictions or categorise by learning decision rules.
  • Neural Networks generate an outcome by learning the relationship between the input and output through a system of layered structure similar to the human brain.

Unsupervised learning

Unsupervised learning uses unlabelled data. It can be categorised into three parts: clustering, dimensionality reduction, and association rules.

Clustering

Clustering separates datapoints into groups based on differences and/or similarities.

  • K-means Clustering groups datapoints together due to similarities and looks for hidden patterns.
  • Hierarchical Clustering groups similar datapoints together into a hierarchy of clusters.
  • Probabilistic Clustering groups datapoints, with each point having probability belonging to each cluster.

Dimensionality reduction

Dimensionality reduction transforms datasets from high-dimensional to low-dimensional space, by reducing the number of features.

  • Principal Component Analysis (PCA) finds the most notable features by reducing dimensionality.
  • Singular Value Decomposition (SVD) reduces dimensionality by matric factorisation methods.
  • Autoencoder reduces dimensionality by reducing the noise in data in neural networks.

Association rules (data mining)

Association rules finds patterns and relationships between variables in a dataset.

  • Apriori generates association rules by extracting frequent item sets in a dataset.
  • Eclat generates association rules by extracting frequent item sets in a dataset, operating vertically.
  • FP-growth extracts frequent item set patterns in a datasets by using FP Trees.

There are several more machine learning algorithms, this page summaries only a fraction.


Next: Large Language Models (LLMs)