integration_instructions
API Recipes
These are some examples on how to use the API to perform typical workflows within Butterfly AI. See also API overview or full API reference for more details on the API.
Auth and smoke test link
Login to obtain a token
List datasets to test out access
curl -i -XPOST 'https://butterfly-ai-api.mathficast.com/api/login' \
--header 'Content-Type: application/json' \
--data-raw '{ "username": "email@email.com", "password": "password"}'
{"access_token":"eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJidXR0ZX...","token_type":"Bearer","expires_in":3600,"username":"email@email.com"}
curl -i -XGET 'https://butterfly-ai-api.mathficast.com/api/v1/datasets?offset=0&limit=10' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'
HTTP/2 200
{"datasets":[{"datasetKey":"feebb52a-f098-4e0e-b1eb-459d833e5aa4","datasetName":"oilplantv4","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-24T08:35:36Z"},{"datasetKey":"c2bd4dbd-b9e9-4fd2-9e52-36652b082060","datasetName":"oilplantv3","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-02T18:09:48Z"},{"datasetKey":"c8203752-33a0-44ae-a1ed-d29811cbcdf7","datasetName":"Example1Crop","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-02T18:09:06Z"},{"datasetKey":"a2a53149-dce5-4bcf-a790-d510902cc488","datasetName":"OilPlantAnomalyV2","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-09-29T05:49:31Z"},{"datasetKey":"f611a02e-024b-4302-a48a-8cfe9222365d","datasetName":"airq4","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-12T06:35:19Z"},{"datasetKey":"3569559c-1b4c-416c-a63a-f32a1d4bd5fa","datasetName":"airQ3","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-11T06:43:06Z"},{"datasetKey":"3de96540-a270-4786-b445-7d298aab950a","datasetName":"airQ2","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-11T06:04:59Z"},{"datasetKey":"77f8445a-68ee-4f9f-a49b-39365418db1c","datasetName":"airQ1","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-11T05:46:57Z"},{"datasetKey":"6d59e328-3165-4663-8d81-fc5a47b114fe","datasetName":"parkinsonspeechlocalv4","numberOfBuckets":40,"status":"COMPLETED","createdOn":"2025-08-02T09:47:20Z"},{"datasetKey":"0eeba1c3-b58b-4946-9b4b-f6a78a62534b","datasetName":"ParkinsonSpeechlocalv3","numberOfBuckets":40,"status":"FAILED","createdOn":"2025-08-02T09:27:55Z"}],"offset":0,"nextOffset":1754126588,"limit":10,"total":219}
Create a Dataset link
First, create an empty dataset
Then, add CSV data to the created dataset
Lastly, poll for progress until COMPLETED
curl -i -XPOST 'https://{baseUrl}/api/v1/datasets' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--data '{
"datasetName": "OilPlantAnomalyV22",
"numberOfBuckets": 10
}'
{
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"numberOfBuckets": 10,
"status": "PENDING",
"datasetCreationProgressUrl": "https://{baseUrl}/api/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648",
"datasetUploadInfo": {
"uploadUrl": "https://storage.googleapis.com/mathfi-test-data/inputs/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648/OilPlantAnomalyV22.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=butterfly-ai-runtime-dev%40mathficast-dev.iam.gserviceaccount.com%2F20251008%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251008T214325Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range&X-Goog-Signature=834adb48fbbbf18d82d3bc352af19b6e2c63cd8102ddb923467a5970001cde4e944f5f2714c50075c0a59fa4159882f0fd5b3f819e1867bd2f23910842705c829bf9def31e8d12d704395c37cb2e36ec9049fd500ec486213d7787971babd74890f3dfe5886ccca222a5c36325e45e053b9d541e6293ac36ac0cf31e65d2877a2ee86b7dd19a23202971f1476d70d68ad587ed9103b956c76cc8c6e62223759e8ec971a20a3e9f916d906f32557a7a26487d4fa189941183408e1f788369f410c85f93cf161799908866adfda0243b0e5984c5a3ba048a60fd18a7951d4fe188f8d4cd8651217805cbbc524dcf0c6dca9822eb95c70a9a7f750b999825cd5dc8",
"extraHeaders": "X-Goog-Content-Length-Range:10,534773760"
}
}
Pick the datasetUploadInfo > uploadUrl URL and craft a PUT request with the extraHeaders with your CSV data:
curl -i -XPUT 'https://storage.googleapis.com/mdata/inputs/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648/OilPlantAnomalyV22.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=butterfly-ai-runtime-dev%40mathficast-dev.iam.gserviceaccount.com%2F20251008%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251008T214325Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range&X-Goog-Signature=834adb48fbbbf18d82d3bc352af19b6e2c63cd8102ddb923467a5970001cde4e944f5f2714c50075c0a59fa4159882f0fd5b3f819e1867bd2f23910842705c829bf9def31e8d12d704395c37cb2e36ec9049fd500ec486213d7787971babd74890f3dfe5886ccca222a5c36325e45e053b9d541e6293ac36ac0cf31e65d2877a2ee86b7dd19a23202971f1476d70d68ad587ed9103b956c76cc8c6e62223759e8ec971a20a3e9f916d906f32557a7a26487d4fa189941183408e1f788369f410c85f93cf161799908866adfda0243b0e5984c5a3ba048a60fd18a7951d4fe188f8d4cd8651217805cbbc524dcf0c6dca9822eb95c70a9a7f750b999825cd5dc8' \
--header 'X-Goog-Content-Length-Range: 10,534773760' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer $TOKEN \
--data-binary '@/path/to/dataset-anomaly-gas-oil-plant.csv'
Finally, poll the dataset creation for progress, until COMPLETED (or FAILED):
curl --location 'https://{baseUrl}/api/v1/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'
{
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"datasetName": "OilPlantAnomalyV22",
"status": "PROCESSING",
"createdOn": "2025-10-08T21:43:25Z",
"numberOfBuckets": 10
}
...
{
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"datasetName": "OilPlantAnomalyV22",
"status": "COMPLETED",
"createdOn": "2025-10-08T21:43:25Z",
"numberOfBuckets": 10
}
Key values to note for next stages:
The datasetKey uniquely identifies the dataset
The datasetName is a descriptive name to distinguish it from others or versions of of the same underlying data
The numberOfBuckets is key hyperparameter that determines how data is shaped for training
Train a Model link Once the dataset is created and in COMPLETED state, training can be executed with the following commands and endpoints.
Target 0.99 performance
Pass 19 as the scaling factor
curl -i -XGET 'https://{baseUrl}/api/v1/training/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--data '{
"performanceThreshold": 0.99,
"scalingFactor": 19
}'
{
"trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"status": "PENDING",
"trainingJobProgressUrl": "https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb"
}
Training progress monitoring can be done by polling the trainingProgressUrl directly or appending /progress to it for a more detailed view:
curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'
{
"trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"status": "RUNNING",
"scalingFactor": 19,
"targetPerformance": 0.90
}
curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb/progress' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'
{
"status": "RUNNING",
"jobs": [
{
"trainingJobKey": "23de1e06-a126-4719-90a8-8874d9f63a92",
"algorithm": "BSEV02",
"status": "COMPLETED",
"latestPerformance": 0.9000122,
"recentPerformances": [
0.8674287,
0.89866984,
0.89997154,
0.89997154,
0.9000122
]
},
{
"trainingJobKey": "54af946f-1f89-4560-99ef-f4934bfab237",
"algorithm": "BSEV01",
"status": "RUNNING",
"latestPerformance": 0.59744537,
"recentPerformances": [
0.59744537,
0.59744537,
0.59744537,
0.59744537,
0.59744537
]
},
{
"trainingJobKey": "c46820a8-f5fc-49ab-aa34-bdf71186eff1",
"algorithm": "BFIF01",
"status": "RUNNING",
"latestPerformance": 0.7716215,
"recentPerformances": [
0.7712554,
0.7712554,
0.77137744,
0.7715808,
0.7716215
]
},
{
"trainingJobKey": "ea4bd529-e886-4167-b800-8a97a147396b",
"algorithm": "BSIX01",
"status": "RUNNING",
"latestPerformance": 0.8725845,
"recentPerformances": [
0.8725845,
0.8725845,
0.8725845,
0.8725845,
0.8725845
]
}
]
}
As it can be observed in the output, there are 4 algorithms (wrapped into trainingJobs) and one overall status for whole training process launched.
The job/algorithm with the highest achieved performance will be the winner (first getting to COMPLETED status)
When a job doesn’t reach the target performance in the timeout time (1h currently), it’s marked with TIMEOUT status
When a job doesn’t progress at least 0.02 for a given period (5 minutes) it’s stopped and marked as NOT_COMPLETED (stalled)
When a job has an irrecoverable failure is marked as FAILED and stopped
The overall process stops only when all jobs are out of PROCESSING (either COMPLETED, NOT_COMPLETED, TIMEOUT or FAILED). As long as 1 job completes successfully, the overall process is COMPLETED
A champion model is created from the successful training, marking it as the best performing model so far for the dataset. If training is repeated with different hyperparameters (performance threshold, scaling factor or the dataset recreated with different number of buckets) and better performance is achieved, the champion model is overriden with the best one
The completed process response from progress monitoring endpoint looks like this:
{
"status": "COMPLETED",
"jobs": [
{
"trainingJobKey": "23de1e06-a126-4719-90a8-8874d9f63a92",
"algorithm": "BSEV02",
"status": "COMPLETED",
"latestPerformance": 0.9000122,
"recentPerformances": [
0.8674287,
0.89866984,
0.89997154,
0.89997154,
0.9000122
]
},
{
"trainingJobKey": "54af946f-1f89-4560-99ef-f4934bfab237",
"algorithm": "BSEV01",
"status": "NOT_COMPLETED",
"latestPerformance": 0.6260017,
"recentPerformances": [
0.6260017,
0.625961,
0.6260017,
0.625961,
0.6260017
]
},
{
"trainingJobKey": "c46820a8-f5fc-49ab-aa34-bdf71186eff1",
"algorithm": "BFIF01",
"status": "NOT_COMPLETED",
"latestPerformance": 0.7718656,
"recentPerformances": [
0.7715808,
0.7716215,
0.77174354,
0.77178425,
0.7718656
]
},
{
"trainingJobKey": "ea4bd529-e886-4167-b800-8a97a147396b",
"algorithm": "BSIX01",
"status": "NOT_COMPLETED",
"latestPerformance": 0.87388635,
"recentPerformances": [
0.87388635,
0.87388635,
0.87388635,
0.87388635,
0.87388635
]
}
]
}
The winner is BSEV02, the latest performance is 0.90000, the training detail shows the created model and further training performance details:
curl -i -XGET 'https://{baseUrl}/api/v1/training/7e36fc7e-ffea-4051-8872-8b021370a72b' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'
{
"trainingJobKey": "7e36fc7e-ffea-4051-8872-8b021370a72b",
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"status": "COMPLETED",
"scalingFactor": 19,
"targetPerformance": 0.9,
"achievedPerformance": 0.89604133,
"trainingPerformance": 0.9000122,
"testPerformance": 0.8920705,
"modelKey": "8a86bd31-ff0c-47c1-bdb8-d08331904508"
}
The model with modelKey=8a86bd31-ff0c-47c1-bdb8-d08331904508 can then be used to run predictions on unseen data.
Run Predictions link After getting the desired _champion model_trained, it’s time to run predictions with it. Before running a prediction we should check the modelKey returned is there with the desired performance:
curl -i -XGET 'https://{baseUrl}/api/v1/models/8a86bd31-ff0c-47c1-bdb8-d08331904508' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'
{
"modelKey": "8a86bd31-ff0c-47c1-bdb8-d08331904508",
"datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
"version": 1,
"achievedPerformance": 0.89604133,
"trainingPerformance": 0.9000122,
"testPerformance": 0.8920705,
"createdOn": "2025-10-08T22:21:06Z"
}
Then, create a prediction using this model. Remember that, as of now, the blind CSV file to run prediction on must not be more than 32MB in size:
curl -i -XGET 'https://{baseUrl}/api/v1/predictions/models/8a86bd31-ff0c-47c1-bdb8-d08331904508' \
--header 'Content-Type: text/csv' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--form '_file=@"/path/to/blind-anomaly-gas-oil-plant.csv"'
{
"predictionKey": "c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34",
"status": "PENDING",
"predictionCreationProgressUrl": "https://{baseUrl}/api/v1/predictions/c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34"
}
Poll the url for progress on the prediction result:
curl -i -XGET 'https://{baseUrl}/api/v1/predictions/a58a2ae6-b294-4888-a970-3f033a1f8210/progress' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN
Once the status moves to COMPLETED, the result CSV can be downloaded:
curl --location 'https://{baseUrl}/api/v1/predictions/a58a2ae6-b294-4888-a970-3f033a1f8210/download' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer $TOKEN'
Hyperparameter Tuning link Hyperparameter tuning is done by combining the above endpoints in order to incrementally obtain better performance by changing the hyper-parameters:
Create new versions of dataset with increased/decreased number of buckets
Re-train using the training endpoints and progress monitoring observation with more or less scaling factor until the result is COMPLETED successfully
Gradually increase or reduce the performance threshold until no more improvements are visible while obtaining COMPLETED results