to navigate

to select

to close

On this page

Training

Prepare datasets, train models, and generate predictions

This guide outlines the core process of preparing data, train models and run predictions within Butterfly AI platform.

Overview

You can interact with BAI in two ways:

Web Dashboard — For research, PoC, and interactive model training with hyperparameter tuning (what this guide is about)
REST API/Python — For programmatic, production-ready use of your trained models via the Butterfly AI API (see API guide)

All data and models are securely logged and stored on the cloud under your account.

How classification type is determined

The BAI platform automatically detects whether the task is binary or multi-class classification.
A balanced test set is generated using your labelled training data.
High accuracy generally implies good F1 Score and AUC (Area Under the Curve).

Video walkthrough

Now you can continue reading this detailed step by step guide or watch this video walkthrough (present also in the Getting started guide) on how to create datasets, training models and predict outcomes within Butterfly AI platform.

Dataset creation

notifications

Before proceeding, ensure your dataset CSV is formatted correctly by following CSV Format Guide

Login to BAI dashboard.
Click “Datasets” in the top-right menu.
Click “+ Create” to start a new dataset.
Enter a name for your dataset.
Set the number of buckets: default is 20 (range: 4–100).
Upload your labelled CSV file.
Click “Save”.

After this, the dataset will start processing. This may take up to few minutes or hours depending on the size and complexity of the data.

Model training

notifications

The dataset must have finished processing before attempting to train a model with it

Navigate to “Trainings” in the sidebar.
Click “Create”.
Configure the training:
- Scaling Factor: (default: 19, range: 8–499)
- Performance Threshold: (value between 0 and 1)
- Select your dataset from the dropdown.
Click “Save”.

notifications

The platform runs 4 algorithms in parallel and selects a new champion model if one outperforms the rest.

Understanding training results

The accuracy reflects the performance on a balanced, unseen 10% test set.
Logs and metrics are saved with the model.
Further training with tuned hyperparameters can improve performance. See Hyperparameter tuning guide

Prediction running

notifications

Predictions can be run once training has been completed and the first champion model is created

Click “Predictions” in the left menu.
Click “Create”.
Select:
- Your original training dataset
- Your batch prediction CSV (see CSV Format Guide for instructions on how to create)

Click “Save”.

Downloading results

Monitor progress on the Predictions listing page. Once finished, click “Download” to retrieve your prediction results.

Step 10 - Download Predictions

Results include predicted labels and their probability scores.

Step 10 - Download Predictions

Recap

Stage	Description
Dataset	Upload labelled CSV, set number of buckets
Training	Set scaling factor & threshold, train model
Prediction	Upload unlabelled CSV, get prediction results

Regression use case

The BAI platform is mainly designed for classification. However, you can solve regression type prediction problems with high accuracy by converting continuous targets into discrete classes.

Example: Wind Turbine Power prediction

Bin the output range (e.g., 0–2000 kW) into classes:
- 0–200 kW → Class 1
- 200–400 kW → Class 2
- …
- 1800–2000 kW → Class 10
Train a multi-class classifier on these buckets.
To increase granularity:
- Take the winning bin (e.g., 400–600 kW)
- Divide it further (e.g., 10 × 20 kW sub-bins)
- Train again for finer predictions.

Troubleshooting NaN predictions

Predictions may occasionally return NaN probability values when the number of buckets is insufficient for the model to make a confident prediction. To resolve this, create a new dataset with a higher number of buckets and retrain. If NaN values persist, your training dataset may need more labelled samples.

Input CSV File Format

Structure your training and …

Hyperparameter tuning

Gradually improve model …

Training

Overview link

How classification type is determined link

Video walkthrough link

Dataset creation link

Model training link

Understanding training results link

Prediction running link

Downloading results link

Recap link

Regression use case link

Example: Wind Turbine Power prediction link

Troubleshooting NaN predictions link

Overview

How classification type is determined

Video walkthrough

Dataset creation

Model training

Understanding training results

Prediction running

Downloading results

Recap

Regression use case

Example: Wind Turbine Power prediction

Troubleshooting NaN predictions