Download OpenAPI specification:
Butterfly AI is a toolkit to build predictive AI applications.
Gather your data, label it, train, predict. All in minutes.
Harness the power of fast training at high accuracy for both binary and multi-class scenarios.
Returns a paginated list of all created datasets
| offset
required
|
integer <int64>
>= 0
Specifies the number of elements to skip before starting to collect the result |
| limit
required
|
integer <int64>
>= 1
Specifies the maximum number of items to return |
| state |
string (DatasetCreationStatus)
Enum: "PENDING" "PROCESSING" "COMPLETED" "FAILED"
Specifies the dataset state to filter for |
| withModelOnly |
boolean
Specifies whether or not the datasets returned should have a model associated |
{- "datasets": "[{\n \"datasetKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"datasetName\": \"Dataset 11\",\n \"description\": \"Description 11\"\n \"status\": \"PROCESSING\",\n \"datasetCreationProgressUrl\": \"http://server/3a1a9a83-d549-4f25-b7ed-8174e0c955de/progress\",\n \"createdOn\": \"\"\n},\n{\n \"datasetKey\": \"7a1a9a83-d549-4f25-b7ed-8174e0c955dd\",\n \"datasetName\": \"Dataset 12\",\n \"description\": \"Description 12\",\n \"status\": \"COMPLETED\",\n \"datasetCreationProgressUrl\": \"http://server/3a1a9a83-d549-4f25-b7ed-8174e0c955de/progress\",\n \"createdOn\": \"\"\n}]\n",
- "offset": 10,
- "limit": 10,
- "nextOffset": 127919133,
- "total": 12
}
Create a brand new dataset. A dataset represents a collection labelled data (rows in a CSV file)
that can be used to train models.
Upon creation, the dataset is initially empty and in PENDING status.
Data must be added in CSV format using the endpoint POST /datasets/{datasetKey}/data
| datasetName
required
|
string [ 3 .. 30 ]
The unique dataset name |
| description |
string or null [ 3 .. 150 ]
The dataset description in plain text |
| numberOfBuckets |
integer [ 4 .. 100 ]
|
{- "datasetName": "Example Dataset",
- "numberOfBuckets": 10
}
{- "datasetKey": "ca4c2644-9446-4ebb-9f38-bf314666aedb",
- "numberOfBuckets": 0,
- "status": "PENDING",
- "datasetCreationProgressUrl": "string",
- "datasetUploadInfo": {
- "uploadUrl": "string",
- "extraHeaders": "string"
}
}
Retrieve dataset details by dataset key
| datasetKey
required
|
string <uuid>
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[4][0-9a-fA-F]...
|
{- "datasetKey": "ca4c2644-9446-4ebb-9f38-bf314666aedb",
- "datasetName": "string",
- "status": "PENDING",
- "createdOn": "string",
- "updatedOn": "string",
- "numberOfBuckets": 0
}
Adds data in CSV format to an existing (empty) dataset. On success, it returns back the dataset in PROCESSING status. Data can only be added to datasets in PENDING status.
The CSV file should be structured as follows:
ID, a unique identifier for each row in the file
The dataset is prepared for training and when ready its status will become COMPLETED. Progress can
be monitored via the returned datasetCreationProgressUrl.
Important Note
This endpoint can only be used as-is if the CSV file is of less or equal to 32MB.
If the file size is bigger than 32MB, use the uploadUrl returned after successful
creation of the empty dataset with the X-Goog-Content-Length-Range header to directly
PUT a larger file for processing.
curl --location --request PUT \
'{signedUrl}\
--header 'X-Goog-Content-Length-Range: 10,534773760' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJidXR0ZXJmbHktYWkiLCJzdWIiOiJzYWllZGFiZWRpMTk5OUBnbWFpbC5jb20iLCJuYmYiOjE3NTgyMzY3MjYsImV4cCI6MTc1ODI0MDMyNiwiaWF0IjoxNzU4MjM2NzI2LCJyb2xlcyI6W119.aeL5R_mONXFo2_HIBicZN5_55sNFudy512qYwXt4RbU' \
--data-binary '@/path/to/LoanRisk.csv'
| datasetKey
required
|
string <uuid>
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[4][0-9a-fA-F]...
|
| file
required
|
string <binary>
|
"{\n \"datasetKey\": \"2b0a9a83-d549-4f25-b7ed-8174e0c955cd\",\n \"datasetName: \"Example dataset\",\n \"description\": \"This is an example dataset description\",\n \"status\": \"PROCESSING\",\n \"datasetCreationProgressUrl\": \"http://server/api/datasets/2b0a9a83-d549-4f25-b7ed-8174e0c955cd\"\n}\n"
List all training
| offset
required
|
integer <int64>
>= 0
|
| limit
required
|
integer <int64>
>= 1
|
| parentOnly |
boolean
Default: false
|
{- "trainingJobs": [
- {
- "trainingJobKey": "16c71e6c-9cb4-428a-948e-d456de707552",
- "datasetKey": "ca4c2644-9446-4ebb-9f38-bf314666aedb",
- "datasetName": "string",
- "status": "PENDING",
- "scalingFactor": 0,
- "targetPerformance": 0.1,
- "modelKey": "24e46901-af1c-449b-be2d-fea0c7b3f573",
- "achievedPerformance": 0.1,
- "trainingPerformance": 0.1,
- "testPerformance": 0.1,
- "createdOn": "string"
}
], - "offset": 0,
- "nextOffset": 0,
- "limit": 0,
- "total": 0
}
List all training done by a given dataset paginated.
| datasetKey
required
|
string <uuid>
|
| offset
required
|
integer <int64>
>= 0
|
| limit
required
|
integer <int64>
>= 1
|
" {\n \"trainingJobs\": [{\n \"trainingJobKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"PENDING\",\n \"scalingFactor\": 19,\n \"targetPerformance\": 0.85\n \"datasetKey\": \"5a1a9a83-d549-4f25-b7ed-8174e0c955de\"\n },\n {\n \"trainingJobKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"COMPLETED\",\n \"targetPerformance\": 0.85,\n \"achievedPerformance\": 0.87,\n \"modelKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"datasetKey\": \"5a1a9a83-d549-4f25-b7ed-8174e0c955de\"\n }]\n \"offset\": 10,\n \"limit\": 10,\n \"total\": 20\n }\n"
Creates a new training job for the dataset. The dataset must be in COMPLETED state to be able to run training from. Training only needs 2 mandatory hyperparameters:
scalingFactor: a number between 4 and 400, an internal parameter that vary by use
case. A reference document can be
found (here)[https://butterfly-ai.mathficast.com]performanceThreshold: a floating point number between 0 and 1, indicating the
desired performance at which training will stop.
It represents the desired percentage of correct predictions (0 = 0%, 0.5 = 50%, 1 = 100%). Both 0
and 1 are allowed but
they are unrealistic values. The smaller this value is, the faster training will completed.4 algorithms run in parallel and compete achieve the desired performanceThreshold.
Once one of them reaches it, the process finishes in COMPLETED status.
Progress of the training process can be obtained using endpoint
GET /api/training/3a1a9a83-d549-4f25-b7ed-8174e0c955de/progress
| datasetKey
required
|
string <uuid>
|
| scalingFactor
required
|
integer <int32>
[ 0 .. 499 ]
|
| performanceThreshold
required
|
number <double>
[ 0 .. 1 ]
|
{- "scalingFactor": 19,
- "targetPerformance": 0.85
}
"{\n \"trainingJobKey\": \"41eac90e-de55-4d60-8b10-8a57ee27db2e\",\n \"datasetKey\": \"7215eec1-f233-419e-bad2-8fda560dff75\",\n \"status\": \"PENDING\",\n \"trainingJobProgressUrl\": \"https://server/api/v1/training/41eac90e-de55-4d60-8b10-8a57ee27db2e/progress\"\n }\n"
Get training job detail by key
| trainingJobKey
required
|
string <uuid>
|
{- "trainingJobKey": "3a1a9a83-d549-4f25-b7ed-8174e0c955de",
- "datasetKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
- "status": "PENDING",
- "scalingFactor": 19,
- "targetPerformance": 0.85
}
Get training job progress by its key. The status indicates the progress.
| trainingJobKey
required
|
string <uuid>
|
"{\n \"status\": \"COMPLETED\",\n \"jobs\": [\n {\n \"trainingJobKey\": \"bc9aa412-8d4f-4b00-89af-0e6e3dcf8e57\",\n \"algorithm\": \"BSEV01\",\n \"status\": \"NOT_COMPLETED\",\n \"latestPerformance\": 0.6198999,\n \"recentPerformances\": [\n 0.6198999,\n 0.6198999,\n 0.6198999,\n 0.6198999,\n 0.6198999\n ]\n },\n {\n \"trainingJobKey\": \"d9ab4260-6416-45d6-91e4-7b0b8a0ad56b\",\n \"algorithm\": \"BFIF01\",\n \"status\": \"NOT_COMPLETED\",\n \"latestPerformance\": 0.7714181,\n \"recentPerformances\": [\n 0.7712554,\n 0.7712554,\n 0.77137744,\n 0.7714181\n ]\n },\n {\n \"trainingJobKey\": \"e7edd490-3ee2-49f8-a49c-e9f9759965e8\",\n \"algorithm\": \"BSIX01\",\n \"status\": \"NOT_COMPLETED\",\n \"latestPerformance\": 0.859485,\n \"recentPerformances\": [\n 0.85789835,\n 0.86017656,\n 0.8659534,\n 0.8659534,\n 0.859485\n ]\n },\n {\n \"trainingJobKey\": \"f2275fde-73ed-43d0-948a-17ff8f586874\",\n \"algorithm\": \"BSEV02\",\n \"status\": \"COMPLETED\",\n \"latestPerformance\": 0.92002606,\n \"recentPerformances\": [\n 0.9193752,\n 0.91998535,\n 0.91998535,\n 0.9197413,\n 0.92002606\n ]\n }\n ]\n"
Creates a new prediction using the specified model and blind CSV file
| modelKey
required
|
string <uuid>
The key of the model to use for the prediction |
| file
required
|
string <binary>
|
{- "predictionKey": "b4c94095-79f1-452d-801c-e3b9bcf6faa2",
- "status": "PENDING",
- "predictionCreationProgressUrl": "string"
}
Creates a new prediction using the specified model and blind CSV file
| modelKey
required
|
string <uuid>
|
| file
required
|
string <binary>
|
{- "predictionKey": "b4c94095-79f1-452d-801c-e3b9bcf6faa2",
- "status": "PENDING",
- "predictionCreationProgressUrl": "string"
}
List all predictions done using a given model, paginated
| modelKey
required
|
string <uuid>
|
| offset
required
|
integer <int64>
>= 0
|
| limit
required
|
integer <int64>
>= 1
|
" {\n \"predictions\": [{\n \"predictionKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"modelKey: \"4a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"datasetKey: \"2a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"COMPLETED\",\n \"predictionResultUrl\": \"https://server/prediction-result.csv\"\n }]\n \"offset\": 10,\n \"nextOffset\": 172881991,\n \"limit\": 10,\n \"total\": 20\n }\n"
{- "predictionKey": "3a1a9a83-d549-4f25-b7ed-8174e0c955de",
- "modelKey": "4a1a9a83-d549-4f25-b7ed-8174e0c955da",
- "datasetKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
- "targetPerformance": 0.87,
- "achievedPerformance": 0.89,
- "status": "COMPLETED",
}
List all predictions done for a given dataset, paginated
| datasetKey
required
|
string <uuid>
|
| offset
required
|
integer <int64>
>= 0
|
| limit
required
|
integer <int64>
>= 1
|
" {\n \"predictions\": [{\n \"predictionKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"modelKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"status\": \"COMPLETED\",\n \"predictionResultUrl\": \"https://server/prediction-result.csv\"\n }]\n \"offset\": 10,\n \"limit\": 10\n }\n"
List all models trained against a given dataset, paginated.
| datasetKey
required
|
string <uuid>
|
| offset
required
|
integer <int64>
>= 0
|
| limit
required
|
integer <int64>
>= 1
|
" {\n \"models\": [{\n \"modelKey\": \"3a1a9a83-d549-4f25-b7ed-8174e0c955de\",\n \"version\": 1,\n \"achievedPerformance\": \"0.85\"\n }]\n \"offset\": 10,\n \"limit\": 10\n }\n"
{- "modelKey": "3a1a9a83-d549-4f25-b7ed-8174e0c955de",
- "version": 1,
- "datasetKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
- "trainingJobKey": "2b0a9a83-d549-4f25-b7ed-8174e0c955cd",
- "achievedPerformance": "0.85",
- "createdOn": "2024-08-07T06:38:20Z"
}