to navigate

to select

to close

On this page

API Recipes

These are some examples on how to use the API to perform typical workflows within Butterfly AI. See also API overview or full API reference for more details on the API.

Auth and smoke test

Login to obtain a token
List datasets to test out access

  curl -i -XPOST 'https://butterfly-ai-api.mathficast.com/api/login' \
--header 'Content-Type: application/json' \
--data-raw '{ "username": "email@email.com", "password": "password"}'

  {"access_token":"eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJidXR0ZX...","token_type":"Bearer","expires_in":3600,"username":"email@email.com"}

  curl -i -XGET 'https://butterfly-ai-api.mathficast.com/api/v1/datasets?offset=0&limit=10' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

HTTP/2 200

  {"datasets":[{"datasetKey":"feebb52a-f098-4e0e-b1eb-459d833e5aa4","datasetName":"oilplantv4","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-24T08:35:36Z"},{"datasetKey":"c2bd4dbd-b9e9-4fd2-9e52-36652b082060","datasetName":"oilplantv3","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-02T18:09:48Z"},{"datasetKey":"c8203752-33a0-44ae-a1ed-d29811cbcdf7","datasetName":"Example1Crop","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-02T18:09:06Z"},{"datasetKey":"a2a53149-dce5-4bcf-a790-d510902cc488","datasetName":"OilPlantAnomalyV2","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-09-29T05:49:31Z"},{"datasetKey":"f611a02e-024b-4302-a48a-8cfe9222365d","datasetName":"airq4","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-12T06:35:19Z"},{"datasetKey":"3569559c-1b4c-416c-a63a-f32a1d4bd5fa","datasetName":"airQ3","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-11T06:43:06Z"},{"datasetKey":"3de96540-a270-4786-b445-7d298aab950a","datasetName":"airQ2","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-11T06:04:59Z"},{"datasetKey":"77f8445a-68ee-4f9f-a49b-39365418db1c","datasetName":"airQ1","numberOfBuckets":20,"status":"COMPLETED","createdOn":"2025-08-11T05:46:57Z"},{"datasetKey":"6d59e328-3165-4663-8d81-fc5a47b114fe","datasetName":"parkinsonspeechlocalv4","numberOfBuckets":40,"status":"COMPLETED","createdOn":"2025-08-02T09:47:20Z"},{"datasetKey":"0eeba1c3-b58b-4946-9b4b-f6a78a62534b","datasetName":"ParkinsonSpeechlocalv3","numberOfBuckets":40,"status":"FAILED","createdOn":"2025-08-02T09:27:55Z"}],"offset":0,"nextOffset":1754126588,"limit":10,"total":219}

   api_path = api

Create a Dataset

First, create an empty dataset
Then, add CSV data to the created dataset
Lastly, poll for progress until COMPLETED

  curl -i -XPOST 'https://{baseUrl}/api/v1/datasets' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--data '{
  "datasetName": "OilPlantAnomalyV22",
  "numberOfBuckets": 10
}'

  {
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "numberOfBuckets": 10,
    "status": "PENDING",
    "datasetCreationProgressUrl": "https://{baseUrl}/api/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648",
    "datasetUploadInfo": {
        "uploadUrl": "https://storage.googleapis.com/mathfi-test-data/inputs/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648/OilPlantAnomalyV22.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=butterfly-ai-runtime-dev%40mathficast-dev.iam.gserviceaccount.com%2F20251008%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251008T214325Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range&X-Goog-Signature=834adb48fbbbf18d82d3bc352af19b6e2c63cd8102ddb923467a5970001cde4e944f5f2714c50075c0a59fa4159882f0fd5b3f819e1867bd2f23910842705c829bf9def31e8d12d704395c37cb2e36ec9049fd500ec486213d7787971babd74890f3dfe5886ccca222a5c36325e45e053b9d541e6293ac36ac0cf31e65d2877a2ee86b7dd19a23202971f1476d70d68ad587ed9103b956c76cc8c6e62223759e8ec971a20a3e9f916d906f32557a7a26487d4fa189941183408e1f788369f410c85f93cf161799908866adfda0243b0e5984c5a3ba048a60fd18a7951d4fe188f8d4cd8651217805cbbc524dcf0c6dca9822eb95c70a9a7f750b999825cd5dc8",
        "extraHeaders": "X-Goog-Content-Length-Range:10,534773760"
    }
}

Pick the datasetUploadInfo > uploadUrl URL and craft a PUT request with the extraHeaders with your CSV data:

  curl -i -XPUT 'https://storage.googleapis.com/mdata/inputs/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648/OilPlantAnomalyV22.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=butterfly-ai-runtime-dev%40mathficast-dev.iam.gserviceaccount.com%2F20251008%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20251008T214325Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range&X-Goog-Signature=834adb48fbbbf18d82d3bc352af19b6e2c63cd8102ddb923467a5970001cde4e944f5f2714c50075c0a59fa4159882f0fd5b3f819e1867bd2f23910842705c829bf9def31e8d12d704395c37cb2e36ec9049fd500ec486213d7787971babd74890f3dfe5886ccca222a5c36325e45e053b9d541e6293ac36ac0cf31e65d2877a2ee86b7dd19a23202971f1476d70d68ad587ed9103b956c76cc8c6e62223759e8ec971a20a3e9f916d906f32557a7a26487d4fa189941183408e1f788369f410c85f93cf161799908866adfda0243b0e5984c5a3ba048a60fd18a7951d4fe188f8d4cd8651217805cbbc524dcf0c6dca9822eb95c70a9a7f750b999825cd5dc8' \
--header 'X-Goog-Content-Length-Range: 10,534773760' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer $TOKEN \
--data-binary '@/path/to/dataset-anomaly-gas-oil-plant.csv'

  200 OK

Finally, poll the dataset creation for progress, until COMPLETED (or FAILED):

  curl --location 'https://{baseUrl}/api/v1/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

  {
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "datasetName": "OilPlantAnomalyV22",
    "status": "PROCESSING",
    "createdOn": "2025-10-08T21:43:25Z",
    "numberOfBuckets": 10
}

...

{
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "datasetName": "OilPlantAnomalyV22",
    "status": "COMPLETED",
    "createdOn": "2025-10-08T21:43:25Z",
    "numberOfBuckets": 10
}

   api_path = api

Key values to note for next stages:

The datasetKey uniquely identifies the dataset
The datasetName is a descriptive name to distinguish it from others or versions of of the same underlying data
The numberOfBuckets is key hyperparameter that determines how data is shaped for training

Train a Model

Once the dataset is created and in COMPLETED state, training can be executed with the following commands and endpoints.

Target 0.99 performance
Pass 19 as the scaling factor

  curl -i -XGET 'https://{baseUrl}/api/v1/training/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--data '{
  "performanceThreshold": 0.99,
  "scalingFactor": 19
}'

  {
    "trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "status": "PENDING",
    "trainingJobProgressUrl": "https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb"
}

Training progress monitoring can be done by polling the trainingProgressUrl directly or appending /progress to it for a more detailed view:

  curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

  {
    "trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "status": "RUNNING",
    "scalingFactor": 19,
    "targetPerformance": 0.90
}

  curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb/progress' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

  {
    "status": "RUNNING",
    "jobs": [
        {
            "trainingJobKey": "23de1e06-a126-4719-90a8-8874d9f63a92",
            "algorithm": "BSEV02",
            "status": "COMPLETED",
            "latestPerformance": 0.9000122,
            "recentPerformances": [
                0.8674287,
                0.89866984,
                0.89997154,
                0.89997154,
                0.9000122
            ]
        },
        {
            "trainingJobKey": "54af946f-1f89-4560-99ef-f4934bfab237",
            "algorithm": "BSEV01",
            "status": "RUNNING",
            "latestPerformance": 0.59744537,
            "recentPerformances": [
                0.59744537,
                0.59744537,
                0.59744537,
                0.59744537,
                0.59744537
            ]
        },
        {
            "trainingJobKey": "c46820a8-f5fc-49ab-aa34-bdf71186eff1",
            "algorithm": "BFIF01",
            "status": "RUNNING",
            "latestPerformance": 0.7716215,
            "recentPerformances": [
                0.7712554,
                0.7712554,
                0.77137744,
                0.7715808,
                0.7716215
            ]
        },
        {
            "trainingJobKey": "ea4bd529-e886-4167-b800-8a97a147396b",
            "algorithm": "BSIX01",
            "status": "RUNNING",
            "latestPerformance": 0.8725845,
            "recentPerformances": [
                0.8725845,
                0.8725845,
                0.8725845,
                0.8725845,
                0.8725845
            ]
        }
    ]
}

   api_path = api

As it can be observed in the output, there are 4 algorithms (wrapped into trainingJobs) and one overall status for whole training process launched.

The job/algorithm with the highest achieved performance will be the winner (first getting to COMPLETED status)
When a job doesn’t reach the target performance in the timeout time (1h currently), it’s marked with TIMEOUT status
When a job doesn’t progress at least 0.02 for a given period (5 minutes) it’s stopped and marked as NOT_COMPLETED (stalled)
When a job has an irrecoverable failure is marked as FAILED and stopped
The overall process stops only when all jobs are out of PROCESSING (either COMPLETED, NOT_COMPLETED, TIMEOUT or FAILED). As long as 1 job completes successfully, the overall process is COMPLETED
A champion model is created from the successful training, marking it as the best performing model so far for the dataset. If training is repeated with different hyperparameters (performance threshold, scaling factor or the dataset recreated with different number of buckets) and better performance is achieved, the champion model is overriden with the best one

The completed process response from progress monitoring endpoint looks like this:

  {
    "status": "COMPLETED",
    "jobs": [
        {
            "trainingJobKey": "23de1e06-a126-4719-90a8-8874d9f63a92",
            "algorithm": "BSEV02",
            "status": "COMPLETED",
            "latestPerformance": 0.9000122,
            "recentPerformances": [
                0.8674287,
                0.89866984,
                0.89997154,
                0.89997154,
                0.9000122
            ]
        },
        {
            "trainingJobKey": "54af946f-1f89-4560-99ef-f4934bfab237",
            "algorithm": "BSEV01",
            "status": "NOT_COMPLETED",
            "latestPerformance": 0.6260017,
            "recentPerformances": [
                0.6260017,
                0.625961,
                0.6260017,
                0.625961,
                0.6260017
            ]
        },
        {
            "trainingJobKey": "c46820a8-f5fc-49ab-aa34-bdf71186eff1",
            "algorithm": "BFIF01",
            "status": "NOT_COMPLETED",
            "latestPerformance": 0.7718656,
            "recentPerformances": [
                0.7715808,
                0.7716215,
                0.77174354,
                0.77178425,
                0.7718656
            ]
        },
        {
            "trainingJobKey": "ea4bd529-e886-4167-b800-8a97a147396b",
            "algorithm": "BSIX01",
            "status": "NOT_COMPLETED",
            "latestPerformance": 0.87388635,
            "recentPerformances": [
                0.87388635,
                0.87388635,
                0.87388635,
                0.87388635,
                0.87388635
            ]
        }
    ]
}

The winner is BSEV02, the latest performance is 0.90000, the training detail shows the created model and further training performance details:

  
curl -i -XGET 'https://{baseUrl}/api/v1/training/7e36fc7e-ffea-4051-8872-8b021370a72b' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

  {
    "trainingJobKey": "7e36fc7e-ffea-4051-8872-8b021370a72b",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "status": "COMPLETED",
    "scalingFactor": 19,
    "targetPerformance": 0.9,
    "achievedPerformance": 0.89604133,
    "trainingPerformance": 0.9000122,
    "testPerformance": 0.8920705,
    "modelKey": "8a86bd31-ff0c-47c1-bdb8-d08331904508"
}

TBA

The model with modelKey=8a86bd31-ff0c-47c1-bdb8-d08331904508 can then be used to run predictions on unseen data.

Run Predictions

After getting the desired _champion model_trained, it’s time to run predictions with it. Before running a prediction we should check the modelKey returned is there with the desired performance:

  curl -i -XGET 'https://{baseUrl}/api/v1/models/8a86bd31-ff0c-47c1-bdb8-d08331904508' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

  {
    "modelKey": "8a86bd31-ff0c-47c1-bdb8-d08331904508",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "version": 1,
    "achievedPerformance": 0.89604133,
    "trainingPerformance": 0.9000122,
    "testPerformance": 0.8920705,
    "createdOn": "2025-10-08T22:21:06Z"
}

TBA

Then, create a prediction using this model. Remember that, as of now, the blind CSV file to run prediction on must not be more than 32MB in size:

  curl -i -XGET 'https://{baseUrl}/api/v1/predictions/models/8a86bd31-ff0c-47c1-bdb8-d08331904508' \
--header 'Content-Type: text/csv' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--form '_file=@"/path/to/blind-anomaly-gas-oil-plant.csv"'

  {
    "predictionKey": "c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34",
    "status": "PENDING",
    "predictionCreationProgressUrl": "https://{baseUrl}/api/v1/predictions/c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34"
}

TBA

Poll the url for progress on the prediction result:

  curl -i -XGET 'https://{baseUrl}/api/v1/predictions/a58a2ae6-b294-4888-a970-3f033a1f8210/progress' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN

  {
    "status": "RUNNING"
}

TBA

Once the status moves to COMPLETED, the result CSV can be downloaded:

  curl --location 'https://{baseUrl}/api/v1/predictions/a58a2ae6-b294-4888-a970-3f033a1f8210/download' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer $TOKEN'

   api_path = api

Hyperparameter Tuning

Hyperparameter tuning is done by combining the above endpoints in order to incrementally obtain better performance by changing the hyper-parameters:

Create new versions of dataset with increased/decreased number of buckets
Re-train using the training endpoints and progress monitoring observation with more or less scaling factor until the result is COMPLETED successfully
Gradually increase or reduce the performance threshold until no more improvements are visible while obtaining COMPLETED results

Changelog

Python SDK

API Recipes

Auth and smoke test link

Create a Dataset link

Train a Model link

Run Predictions link

Hyperparameter Tuning link

Auth and smoke test

Create a Dataset

Train a Model

Run Predictions

Hyperparameter Tuning