Butterfly AI (0.1.1)

Download OpenAPI specification:

Butterfly AI is a toolkit to build predictive AI applications.

Gather your data, label it, train, predict. All in minutes.

Harness the power of fast training at high accuracy for both binary and multi-class scenarios.

Datasets

Manage data in CSV format

List all datasets

Returns a paginated list of all created datasets

query Parameters
offset
required
integer <int64> >= 0

Specifies the number of elements to skip before starting to collect the result

limit
required
integer <int64> >= 1

Specifies the maximum number of items to return

state
string (DatasetCreationStatus)
Enum: "PENDING" "PROCESSING" "COMPLETED" "FAILED"

Specifies the dataset state to filter for

withModelOnly
boolean

Specifies whether or not the datasets returned should have a model associated

Responses

Response samples

Content type
application/json
{
  • "datasets": "[{\n \"datasetKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"datasetName\": \"Dataset 11\",\n \"description\": \"Description 11\"\n \"status\": \"PROCESSING\",\n \"datasetCreationProgressUrl\": \"http://server/3a1a9a83-d549-4f25-b7ed-8174e0c955de/progress\",\n \"createdOn\": \"\"\n},\n{\n \"datasetKey\": \"7a1a9a83-d549-4f25-b7ed-8174e0c955dd\",\n \"datasetName\": \"Dataset 12\",\n \"description\": \"Description 12\",\n \"status\": \"COMPLETED\",\n \"datasetCreationProgressUrl\": \"http://server/3a1a9a83-d549-4f25-b7ed-8174e0c955de/progress\",\n \"createdOn\": \"\"\n}]\n",
  • "offset": 10,
  • "limit": 10,
  • "nextOffset": 127919133,
  • "total": 12
}

Create empty dataset

Create a brand new dataset. A dataset represents a collection labelled data (rows in a CSV file) that can be used to train models. Upon creation, the dataset is initially empty and in PENDING status. Data must be added in CSV format using the endpoint POST /datasets/{datasetKey}/data

Request Body schema: application/json
required
datasetName
required
string [ 3 .. 30 ]

The unique dataset name

description
string or null [ 3 .. 150 ]

The dataset description in plain text

numberOfBuckets
integer [ 4 .. 100 ]

Responses

Request samples

Content type
application/json
{
  • "datasetName": "Example Dataset",
  • "numberOfBuckets": 10
}

Response samples

Content type
application/json
{
  • "datasetKey": "ca4c2644-9446-4ebb-9f38-bf314666aedb",
  • "numberOfBuckets": 0,
  • "status": "PENDING",
  • "datasetCreationProgressUrl": "string",
  • "datasetUploadInfo": {
    }
}

Retrieve dataset details

Retrieve dataset details by dataset key

path Parameters
datasetKey
required
string <uuid> ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[4][0-9a-fA-F]...

Responses

Response samples

Content type
application/json
{
  • "datasetKey": "ca4c2644-9446-4ebb-9f38-bf314666aedb",
  • "datasetName": "string",
  • "status": "PENDING",
  • "createdOn": "string",
  • "updatedOn": "string",
  • "numberOfBuckets": 0
}

Delete a dataset

Deletes a dataset

path Parameters
datasetKey
required
string <uuid> ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[4][0-9a-fA-F]...

Responses

Add data to dataset

Adds data in CSV format to an existing (empty) dataset. On success, it returns back the dataset in PROCESSING status. Data can only be added to datasets in PENDING status.

The CSV file should be structured as follows:

  • A header row must be included at the top
  • The first column is treated as theID, a unique identifier for each row in the file
  • The last column(s) vary depending on the type of data:
    • For binary data: a single last column containing the labels for negative (0, 'N', ...) or positive (1, 'Y', ...) samples
    • For multi-class data: multiple columns labelled with the class names added to the rightmost area of the CSV file
  • Between the first and last columns, an arbitrary number of feature columns can be included. These features should be relevant to the domain and nature of the predictions. While there is no initial limit, it is recommended not to exceed 200 feature columns.

The dataset is prepared for training and when ready its status will become COMPLETED. Progress can be monitored via the returned datasetCreationProgressUrl.

Important Note This endpoint can only be used as-is if the CSV file is of less or equal to 32MB. If the file size is bigger than 32MB, use the uploadUrl returned after successful creation of the empty dataset with the X-Goog-Content-Length-Range header to directly PUT a larger file for processing.

curl --location --request PUT \ 
'{signedUrl}\
--header 'X-Goog-Content-Length-Range: 10,534773760' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJidXR0ZXJmbHktYWkiLCJzdWIiOiJzYWllZGFiZWRpMTk5OUBnbWFpbC5jb20iLCJuYmYiOjE3NTgyMzY3MjYsImV4cCI6MTc1ODI0MDMyNiwiaWF0IjoxNzU4MjM2NzI2LCJyb2xlcyI6W119.aeL5R_mONXFo2_HIBicZN5_55sNFudy512qYwXt4RbU' \
--data-binary '@/path/to/LoanRisk.csv'
path Parameters
datasetKey
required
string <uuid> ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[4][0-9a-fA-F]...
Request Body schema: multipart/form-data
required
file
required
string <binary>

Responses

Response samples

Content type
application/json
"{\n \"datasetKey\": \"2b0a9a83-d549-4f25-b7ed-8174e0c955cd\",\n \"datasetName: \"Example dataset\",\n \"description\": \"This is an example dataset description\",\n \"status\": \"PROCESSING\",\n \"datasetCreationProgressUrl\": \"http://server/api/datasets/2b0a9a83-d549-4f25-b7ed-8174e0c955cd\"\n}\n"

Training

Manage training of models using a small number of hyperparameters

List training jobs

List all training

query Parameters
offset
required
integer <int64> >= 0
limit
required
integer <int64> >= 1
parentOnly
boolean
Default: false

Responses

Response samples

Content type
application/json
{
  • "trainingJobs": [
    ],
  • "offset": 0,
  • "nextOffset": 0,
  • "limit": 0,
  • "total": 0
}

List training jobs by dataset

List all training done by a given dataset paginated.

path Parameters
datasetKey
required
string <uuid>
query Parameters
offset
required
integer <int64> >= 0
limit
required
integer <int64> >= 1

Responses

Response samples

Content type
application/json
" {\n \"trainingJobs\": [{\n \"trainingJobKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"PENDING\",\n \"scalingFactor\": 19,\n \"targetPerformance\": 0.85\n \"datasetKey\": \"5a1a9a83-d549-4f25-b7ed-8174e0c955de\"\n },\n {\n \"trainingJobKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"COMPLETED\",\n \"targetPerformance\": 0.85,\n \"achievedPerformance\": 0.87,\n \"modelKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"datasetKey\": \"5a1a9a83-d549-4f25-b7ed-8174e0c955de\"\n }]\n \"offset\": 10,\n \"limit\": 10,\n \"total\": 20\n }\n"

Create training from dataset

Creates a new training job for the dataset. The dataset must be in COMPLETED state to be able to run training from. Training only needs 2 mandatory hyperparameters:

  • scalingFactor: a number between 4 and 400, an internal parameter that vary by use case. A reference document can be found (here)[https://butterfly-ai.mathficast.com]
  • performanceThreshold: a floating point number between 0 and 1, indicating the desired performance at which training will stop. It represents the desired percentage of correct predictions (0 = 0%, 0.5 = 50%, 1 = 100%). Both 0 and 1 are allowed but they are unrealistic values. The smaller this value is, the faster training will completed.

4 algorithms run in parallel and compete achieve the desired performanceThreshold. Once one of them reaches it, the process finishes in COMPLETED status. Progress of the training process can be obtained using endpoint GET /api/training/3a1a9a83-d549-4f25-b7ed-8174e0c955de/progress

path Parameters
datasetKey
required
string <uuid>
Request Body schema: application/json
required
scalingFactor
required
integer <int32> [ 0 .. 499 ]
performanceThreshold
required
number <double> [ 0 .. 1 ]

Responses

Request samples

Content type
application/json
{
  • "scalingFactor": 19,
  • "targetPerformance": 0.85
}

Response samples

Content type
application/json
"{\n \"trainingJobKey\": \"41eac90e-de55-4d60-8b10-8a57ee27db2e\",\n \"datasetKey\": \"7215eec1-f233-419e-bad2-8fda560dff75\",\n \"status\": \"PENDING\",\n \"trainingJobProgressUrl\": \"https://server/api/v1/training/41eac90e-de55-4d60-8b10-8a57ee27db2e/progress\"\n }\n"

Get training job detail

Get training job detail by key

path Parameters
trainingJobKey
required
string <uuid>

Responses

Response samples

Content type
application/json
{
  • "trainingJobKey": "3a1a9a83-d549-4f25-b7ed-8174e0c955de",
  • "datasetKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
  • "status": "PENDING",
  • "scalingFactor": 19,
  • "targetPerformance": 0.85
}

Cancel a training job

Cancel a training job given its key

path Parameters
trainingJobKey
required
string <uuid>

Responses

Response samples

Content type
application/json
{
  • "status": "CANCELLED"
}

Get training job progress

Get training job progress by its key. The status indicates the progress.

path Parameters
trainingJobKey
required
string <uuid>

Responses

Response samples

Content type
application/json
"{\n \"status\": \"COMPLETED\",\n \"jobs\": [\n {\n \"trainingJobKey\": \"bc9aa412-8d4f-4b00-89af-0e6e3dcf8e57\",\n \"algorithm\": \"BSEV01\",\n \"status\": \"NOT_COMPLETED\",\n \"latestPerformance\": 0.6198999,\n \"recentPerformances\": [\n 0.6198999,\n 0.6198999,\n 0.6198999,\n 0.6198999,\n 0.6198999\n ]\n },\n {\n \"trainingJobKey\": \"d9ab4260-6416-45d6-91e4-7b0b8a0ad56b\",\n \"algorithm\": \"BFIF01\",\n \"status\": \"NOT_COMPLETED\",\n \"latestPerformance\": 0.7714181,\n \"recentPerformances\": [\n 0.7712554,\n 0.7712554,\n 0.77137744,\n 0.7714181\n ]\n },\n {\n \"trainingJobKey\": \"e7edd490-3ee2-49f8-a49c-e9f9759965e8\",\n \"algorithm\": \"BSIX01\",\n \"status\": \"NOT_COMPLETED\",\n \"latestPerformance\": 0.859485,\n \"recentPerformances\": [\n 0.85789835,\n 0.86017656,\n 0.8659534,\n 0.8659534,\n 0.859485\n ]\n },\n {\n \"trainingJobKey\": \"f2275fde-73ed-43d0-948a-17ff8f586874\",\n \"algorithm\": \"BSEV02\",\n \"status\": \"COMPLETED\",\n \"latestPerformance\": 0.92002606,\n \"recentPerformances\": [\n 0.9193752,\n 0.91998535,\n 0.91998535,\n 0.9197413,\n 0.92002606\n ]\n }\n ]\n"

Predictions

Predict outcomes from unseen data using trained models

Create a prediction

Creates a new prediction using the specified model and blind CSV file

query Parameters
modelKey
required
string <uuid>

The key of the model to use for the prediction

Request Body schema: multipart/form-data
required
file
required
string <binary>

Responses

Response samples

Content type
application/json
{
  • "predictionKey": "b4c94095-79f1-452d-801c-e3b9bcf6faa2",
  • "status": "PENDING",
  • "predictionCreationProgressUrl": "string"
}

Create a prediction

Creates a new prediction using the specified model and blind CSV file

path Parameters
modelKey
required
string <uuid>
Request Body schema: multipart/form-data
required
file
required
string <binary>

Responses

Response samples

Content type
application/json
{
  • "predictionKey": "b4c94095-79f1-452d-801c-e3b9bcf6faa2",
  • "status": "PENDING",
  • "predictionCreationProgressUrl": "string"
}

List predictions by model

List all predictions done using a given model, paginated

path Parameters
modelKey
required
string <uuid>
query Parameters
offset
required
integer <int64> >= 0
limit
required
integer <int64> >= 1

Responses

Response samples

Content type
application/json
" {\n \"predictions\": [{\n \"predictionKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"modelKey: \"4a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"datasetKey: \"2a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"COMPLETED\",\n \"predictionResultUrl\": \"https://server/prediction-result.csv\"\n }]\n \"offset\": 10,\n \"nextOffset\": 172881991,\n \"limit\": 10,\n \"total\": 20\n }\n"

Get prediction progress

Get prediction progress. The status indicates the progress

path Parameters
predictionKey
required
string <uuid>

Responses

Response samples

Content type
application/json
{
  • "status": "PENDING"
}

Get prediction detail

Get prediction

path Parameters
predictionKey
required
string <uuid>

Responses

Response samples

Content type
application/json
{
  • "predictionKey": "3a1a9a83-d549-4f25-b7ed-8174e0c955de",
  • "modelKey": "4a1a9a83-d549-4f25-b7ed-8174e0c955da",
  • "datasetKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
  • "targetPerformance": 0.87,
  • "achievedPerformance": 0.89,
  • "status": "COMPLETED",
  • "predictionResultUrl": "https://server/prediction-result.csv"
}

List predictions by dataset

List all predictions done for a given dataset, paginated

path Parameters
datasetKey
required
string <uuid>
query Parameters
offset
required
integer <int64> >= 0
limit
required
integer <int64> >= 1

Responses

Response samples

Content type
application/json
" {\n \"predictions\": [{\n \"predictionKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"modelKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"COMPLETED\",\n \"predictionResultUrl\": \"https://server/prediction-result.csv\"\n }]\n \"offset\": 10,\n \"limit\": 10\n }\n"

Models

Trained models

List models by dataset

List all models trained against a given dataset, paginated.

path Parameters
datasetKey
required
string <uuid>
query Parameters
offset
required
integer <int64> >= 0
limit
required
integer <int64> >= 1

Responses

Response samples

Content type
application/json
" {\n \"models\": [{\n \"modelKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"version\": 1,\n \"achievedPerformance\": \"0.85\"\n }]\n \"offset\": 10,\n \"limit\": 10\n }\n"

Get model detail

Get model detail

path Parameters
modelKey
required
string <uuid>

Responses

Response samples

Content type
application/json
{
  • "modelKey": "3a1a9a83-d549-4f25-b7ed-8174e0c955de",
  • "version": 1,
  • "datasetKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
  • "trainingJobKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
  • "achievedPerformance": "0.85",
  • "createdOn": "2024-08-07T06:38:20Z"
}