Glossary
Important terms and metrics definitions
These are some of the key concepts mentioned and used in this documentation.
Predictive AI vs Generative AI
- Generative AI: Generative AI creates new, original content (like text, images, or code) by learning patterns from existing data. Generative AI produces entirely new output that wasn’t in the original dataset.
- Predictive AI: Predictive AI uses historical data to forecast future events. Predictive AI makes informed guesses about what might happen.
Binary vs Multi-class classification
-
Binary: Binary classification predicts one of two possible outcomes. For example, a binary classifier might identify an insurance claim as “fraudulent” or “not fraudulent” .
-
Multi-class: Multi-class classification predicts one out of three or more possible outcomes. For example, a multi-class classifier could identify an image as a “banana,” “apple,” “peach” or “orange”.
Key terms
-
Feature: A measurable property of a data point used as input to a model (e.g., height, house size, email subject line). Features provide the information a model needs to find patterns.
-
Dataset: A collection of data used in machine learning, either for training (to learn patterns) or testing (to evaluate predictions).
-
Training: The process of teaching an algorithm by feeding it labelled data so it adjusts internal parameters (like weights) to minimize errors. The goal is to produce a model that generalizes well to unseen data.
-
Hyperparameter: A configuration set before training that controls how learning happens. Unlike learned parameters, hyperparameters (e.g., learning rate, number of layers, tree depth) are chosen by the practitioner and affect performance.
-
Model: The trained output of machine learning. Essentially a function that has learned patterns from data and can make predictions or decisions on new inputs.
-
Prediction: Also called inference, the process of obtaining outcomes from trained model when presenting new, unseen data
Features ──► Dataset ──► Training (with Hyperparameters) ──► Model ──► Predictions
Advanced training and performance metrics
-
True Positive (TP): Correctly predicted positive instances.
-
True Negative (TN): Correctly predicted negative instances.
-
False Positive (FP): Incorrectly predicted positive instances.
-
False Negative (FN): Incorrectly predicted negative instances.
-
Precision: A performance metric for classification models that measures the proportion of correct positive predictions out of all positive predictions made.
-
Recall: In machine learning Recall is defined as
Recall = (True Positives)/(True Positives + False Negatives) -
F1: The
F1score is a machine learning evaluation metric that represents the harmonic mean of precision (minimizing false positives) and recall (minimizing false negatives), providing a single value to measure a model’s accuracy. A perfectF1score is1:F1= 2*(Precision x Recall)/(Precision + Recall) -
False Positive Rate (FPR): is the proportion of actual negatives that were incorrectly identified as positive.
FPR = FP / (FP + TN) -
RoC Curve: The RoC curve is generated by plotting the
TPR(on the y-axis) against theFPR(on the x-axis) for every possible threshold value of the classifier. -
AUD: The Area Under the Curve (AUC) is the integral of the RoC curve. A perfect classifier will have an AUC of 1, while a random classifier will have an AUC of 0.5.
-
Training performance: This is an average as:
- For binary classifications,
Training Performance = (( True Positives / Total Positives) + (True Negatives / Total Negatives)) / 2- For multi-class classification the test performance is the average of the of the ratio (corrected classified/total class size) for all classes out of the training data.
-
Test performance: This is the same average as above, but using the test data instead of the training data
-
Overall achieved performance: The overall achieved performance is the average of training and testing performances where test data is an unseen data. The datasets in Butterfly AI platform are split 90% data is for training and 10% is test.
-
Overfitting: This phenomenon occurs when a model becomes too complex and fits the training data extremely closely, including its noise and irrelevant details, to the point where it fails to generalize and make accurate predictions on new, unseen data. This leads to a model that performs exceptionally well on the training set but poorly in real-world applications.