Skip to content

maestro

coming: when it's ready...

maestro is a tool designed to streamline and accelerate the fine-tuning process for multimodal models. It provides ready-to-use recipes for fine-tuning popular vision-language models (VLMs) such as Florence-2, PaliGemma, and Qwen2-VL on downstream vision-language tasks.

install

Pip install the supervision package in a Python>=3.8 environment.

pip install maestro

quickstart

CLI

VLMs can be fine-tuned on downstream tasks directly from the command line with maestro command:

maestro florence2 train --dataset='<DATASET_PATH>' --epochs=10 --batch-size=8

SDK

Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same arguments as the CLI example above:

from maestro.trainer.common import MeanAveragePrecisionMetric
from maestro.trainer.models.florence_2 import train, Configuration

config = Configuration(
    dataset='<DATASET_PATH>',
    epochs=10,
    batch_size=8,
    metrics=[MeanAveragePrecisionMetric()]
)

train(config)