This guide outlines the core process of preparing data, train models and run predictions within Butterfly AI platform.


Overview

You can interact with BAI in two ways:

  1. Web Dashboard — For research, PoC, and interactive model training with hyperparameter tuning (what this guide is about)
  2. REST API/Python — For programmatic, production-ready use of your trained models via the Butterfly AI API (see API guide)

All data and models are securely logged and stored on the cloud under your account.

How classification type is determined
  • The BAI platform automatically detects whether the task is binary or multi-class classification.
  • A balanced test set is generated using your labelled training data.
  • High accuracy generally implies good F1 Score and AUC (Area Under the Curve).

Video walkthrough

Now you can continue reading this detailed step by step guide or watch this video walkthrough (present also in the Getting started guide) on how to create datasets, training models and predict outcomes within Butterfly AI platform.



Dataset creation



  1. Login to BAI dashboard.

  2. Click “Datasets” in the top-right menu.

    Step 1 - Datasets Menu

  3. Click “+ Create” to start a new dataset.

    Step 2 - Create Dataset

  4. Enter a name for your dataset.

  5. Set the number of buckets: default is 20 (range: 4–100).

  6. Upload your labelled CSV file.

  7. Click “Save”.

    Step 3 - Save Dataset


After this, the dataset will start processing. This may take up to few minutes or hours depending on the size and complexity of the data.


Model training



  1. Navigate to “Trainings” in the sidebar.

    Step 4 - Open Trainings

  2. Click “Create”.

    Step 5 - Create Training

  3. Configure the training:

    • Scaling Factor: (default: 19, range: 8–499)
    • Performance Threshold: (value between 0 and 1)
    • Select your dataset from the dropdown.

  4. Click “Save”.

    Step 6 - Save Training


Understanding training results

  • The accuracy reflects the performance on a balanced, unseen 10% test set.
  • Logs and metrics are saved with the model.
  • Further training with tuned hyperparameters can improve performance. See Hyperparameter tuning guide

Prediction running



  1. Click “Predictions” in the left menu.

    Step 7 - Open Predictions

  2. Click “Create”.

    Step 8 - Create Prediction

  3. Select:

    • Your original training dataset
    • Your batch prediction CSV (see CSV Format Guide for instructions on how to create)

  1. Click “Save”.

    Step 9 - Save Prediction

Downloading results

Monitor progress on the Predictions listing page. Once finished, click “Download” to retrieve your prediction results.

Step 10 - Download Predictions


Results include predicted labels and their probability scores.

Step 10 - Download Predictions


Recap

Stage Description
Dataset Upload labelled CSV, set number of buckets
Training Set scaling factor & threshold, train model
Prediction Upload unlabelled CSV, get prediction results

Regression use case

The BAI platform is mainly designed for classification. However, you can solve regression type prediction problems with high accuracy by converting continuous targets into discrete classes.

Example: Wind Turbine Power prediction

  1. Bin the output range (e.g., 0–2000 kW) into classes:

    • 0–200 kW → Class 1
    • 200–400 kW → Class 2
    • 1800–2000 kW → Class 10
  2. Train a multi-class classifier on these buckets.

  3. To increase granularity:

    • Take the winning bin (e.g., 400–600 kW)
    • Divide it further (e.g., 10 × 20 kW sub-bins)
    • Train again for finer predictions.