Overview

Butterfly AI is a predictive AI toolkit that allows you to easily turn your tabular, labelled data into actionable predictions on new, unseen data.

For that, the following is needed:

  • CSV file with tabular, labelled data
  • CSV file with unseen data, same format of labelled data

From there, getting your first predictions is only few steps away:

  • Access the platform
  • Create a dataset by uploading your CSV file with labelled data
  • Train a model from the created dataset
  • Create a prediction that uses the model by uploading your unseen data CSV file
  • Download your CSV unseen data file populated with results

In this page you can:



Examine sample CSV data

To fully understand how CSV should be preparated for this platform, follow the input CSV creation guide. But for convenience, a couple pre-created labelled and blind CSV files are provided.

This sample data represents a set of readings from IoT devices present in a oil plant, aimed at predicting failure in key infrastructure.


Key considerations:

  • The timestamp acts as the unique ID
  • The relevant features are the columns temperature, flow_rate, vibration_level, valve_position, motor_speed, chemical_concentration
  • The outcome (binary classification for this example) is coded in the anomaly_label column (0=no anomaly, 1=anomaly)

More data and use cases can be explored in depth in the Use cases section.

Access the platform

First of all, request access to Butterfly AI.

Request Access

Once the form is submitted, you should shortly receive an email with your credentials. These credentials work for both the Dashboard and REST API.


Create your first dataset

To get started, let’s create a dataset from the sample labelled CSV file


Train a model

Once the dataset has finished processing (reached COMPLETED status), it’s time to train the first model from it.

At this point, a model exists with the following training metrics:

  • Overall achieved performance: 0.997219
  • Training performance: 0.9934427
  • Test performance: 1.0000

Each new training will override this model as long as achievedPerformance is greater than the existing one. This model can now be used to create any number of Predictions on unseen data.

Check the Training guide for more details on the process.


Obtain a prediction

Once a model with the desired performance has been successfully created from training, it can be used to run predictions on unseen and unlabelled data (inference) as many times as needed. For the sake of this guide this blind CSV data is being used to try out prediction creation. This is going to be a batch prediction, in which each row represents a single, individual inference.


This is how the prediction results look like:

Result CSV

The IDs on the original blind file have been populated with a predicted label. Overall, this predicted label would be correct in 99% of the cases.


Video walkthrough for Parkinson diagnosis data

This is a video walkthrough outlining the step by step process to analyse Parkinson diagnosis data within Butterfly AI platform.



Where to go from here

This guide has explored the key workflows within Butterfly AI platform and to get a first prediction. In a nutshell, that’s what the platform is about: CSV with data –> training –> prediction on unseen data.

For binary predictions, the Butterfly AI Platform assigns a probability that reflects the model’s confidence. A label is chosen if its probability is above (1) or below (0) 0.5. Values closer to 0 or 1 indicate higher certainty. The probability appears as a column in the prediction results file.

Next steps:

  • Get in depth understanding of the dataset creation and training processes via the Training guide
  • Improve gradually the performance of the trained model via Hyperparameter tuning
  • Explore real life use cases in the Use cases section
  • Integrate Butterfly AI in your existing workflow using the API