To fully understand how CSV should be preparated for this platform, follow the input CSV creation guide. But for convenience, a couple pre-created labelled and blind CSV files are provided.
This sample data represents a set of readings from IoT devices present in a oil plant, aimed at predicting failure in key infrastructure.
Once the form is submitted, you should shortly receive an email with your credentials. These credentials work for both the Dashboard and REST API.
The Butterfly AI Dashboard is a simple web application that allows you to upload your CSV with sample data, train a model and run predictions on new, unseen data.
Use the link and provided credentials to log into the platform
After successful login, the following page should appear:
To access via API, use the provided links and credentials to login. The following are curl commands that use the endpoints fully described in the API Reference. There’s also a Postman collection with some built-in useful automations.
Select Datasets from the side menu, then Create button on right hand side. Then populate the form:
Dataset name: set a descriptive name, must be unique across the platform
Number of buckets: set it as 10 for this dataset. This is one of the hyperparameters that can be later modified to enhance model performance. More details on Hyperparameter tuning.
Click Save and wait for the dataset to finish processing.
Once the dataset is in COMPLETED status, it’s ready for training.
Creating a Dataset using the API is done as 2-step process:
Create an empty Dataset
Use the signed URL returned from the previous step to add the labelled CSV data
In order to upload the CSV data against the newly created dataset, a curl command like this can be used:
curl -i -XPUT '{uploadUrl}' \ <-- the signed url returned in previous step
--header 'X-Goog-Content-Length-Range: 10,534773760' \ <-- the `extraHeaders` content, only 1 header in current release
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer {token}' \ <-- the token obtained after successful login
--data-binary '@/path/to/dataset-anomaly-gas-oil-plant.csv'
if all goes correctly, the result of the above command should be similar to:
Once the dataset has finished processing (reached COMPLETED status), it’s time to train the first model from it.
Choose Trainings from the sidebar, then click Create button. Fill the form with the following:
Scaling Factor: set it to 19. This is one of the key training hyperparameters. Full details are present in the Training guide and Hyperparameter tuning guide.
Performance threshold: set it to 0.99. This is another training hyperparameter, representing the desired prediction accuracy (99%)
Dataset: Select the newly created dataset
The training process starts, showing the real time performance of the 4 proprietary training algorithms of Butterfly AI:
It should take no more than 10 minutes for this dataset training to complete and get the initial Champion model:
Training via the API can be started with the following command:
A training job with key 3f856c87-ceb2-4988-ad2b-60719741c38b has been created. This job progress can be polled using the trainingJobProgressUrl directly to obtain basic progress and status data:
At this point, a model exists with the following training metrics:
Overall achieved performance: 0.997219
Training performance: 0.9934427
Test performance: 1.0000
Each new training will override this model as long as achievedPerformance is greater than the existing one. This model can now be used to create any number of Predictions on unseen data.
Check the Training guide for more details on the process.
Once a model with the desired performance has been successfully created from training, it can be used to run predictions on unseen and unlabelled data (inference) as many times as needed. For the sake of this guide this blind CSV data is being used to try out prediction creation. This is going to be a batch prediction, in which each row represents a single, individual inference.
Select Predictions on the sidebar, then Create:
Select the original dataset
Upload the blind CSV
After few moments (depending on size), the prediction completes and the resulting CSV can be downloaded using the Download link:
To create a prediction using the API you’ll need to provide the modelKey to use, the command looks like:
This guide has explored the key workflows within Butterfly AI platform and to get a first prediction. In a nutshell, that’s what the platform is about: CSV with data –> training –> prediction on unseen data.
For binary predictions, the Butterfly AI Platform assigns a probability that reflects the model’s confidence. A label is chosen if its probability is above (1) or below (0) 0.5. Values closer to 0 or 1 indicate higher certainty. The probability appears as a column in the prediction results file.
Next steps:
Get in depth understanding of the dataset creation and training processes via the Training guide