Skip to content

JSONL

Overview

JSONL dataset is a simple, text-based format that makes it easy to work with multimodal data. Each line in a JSONL file is a valid JSON object. Each JSON object must contain the following keys:

  • image: A string specifying the image file name associated with the dataset item.
  • prefix: A string representing the prompt that will be sent to the model.
  • suffix: A string representing the expected model response.

Warning

suffix can be as simple as a single number or string, a full paragraph, or even a structured JSON output. Regardless of its content, ensure that the it is properly serialized to conform with JSON value requirements.

Tip

Use Roboflow's tools to annotate and export your multimodal datasets in JSONL format, streamlining data preparation for model training.

Dataset Structure

Divide your dataset into three subdirectories: train, valid, and test. Each subdirectory should contain its own annotations.jsonl file that holds the annotations for that particular split, along with the corresponding image files. Below is an example of the directory structure:

dataset/
├── train/
│   ├── annotations.jsonl
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ... (other image files)
├── valid/
│   ├── annotations.jsonl
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ... (other image files)
└── test/
    ├── annotations.jsonl
    ├── image1.jpg
    ├── image2.jpg
    └── ... (other image files)

JSONL Examples

JSONL is a versatile format that can represent datasets for a wide range of visual-language tasks. Its flexible structure supports multiple annotation styles, making it an ideal choice for integrating diverse data sources.

Object Character Recognition (OCR)

OCR extracts textual content from images, converting printed or handwritten text into machine-readable data.

{"image":"image1.jpg","prefix":"read equation in LATEX","suffix":"H = \\dot { x } _ { i } \\Pi _ { x ^ { i } } + \\Pi _ { x ^ { i } } \\dot { x } _ { i } ^ { * } + \\dot { \\psi } _ { i } \\Pi _ { \\psi _ { i } } - \\Pi _ { \\psi _ { i } ^ { * } } \\dot { \\psi } _ { i } ^ { * } +"}
{"image":"image2.jpg","prefix":"read equation in LATEX","suffix":"\\psi _ { j } ( C _ { r } ^ { \\vee } , t ) = \\frac { 4 \\sinh 2 j t ( \\cosh ( 2 w _ { 1 } t ) \\cosh ( 2 w _ { 2 } t ) - \\cos ^ { 2 } ( x t ) ) } { \\sinh 2 t \\cosh h t } ."}
{"image":"image3.jpg","prefix":"read equation in LATEX","suffix":"- \\frac { h ^ { 2 } } { 2 \\lambda } \\int d t d ^ { 2 } x d ^ { 2 } x ^ { \\prime } ( { \\tilde { J } } _ { k } - \\frac { J _ { k } ^ { 0 } } { \\rho _ { 0 } } { \\tilde { J } } _ { 0 } ) ( t , x ) \\Delta ^ { - 1 } ( x - x ^ { \\prime } ) ( { \\tilde { J } } _ { k } - \\frac { J _ { k } ^ { 0 } } { \\rho _ { 0 } } { \\tilde { J } } _ { 0 } ) ( t , x ^ { \\prime } ) ."}
Image

prefix:
read equation in LATEX

suffix:
H = \\dot { x } _ { i } \\Pi _ { x ^ { i } } + \\Pi _ { x ^ { i } } \\dot { x } _ { i } ^ { * } + \\dot { \\psi } _ { i } \\Pi _ { \\psi _ { i } } - \\Pi _ { \\psi _ { i } ^ { * } } \\dot { \\psi } _ { i } ^ { * } +

Image

prefix:
read equation in LATEX

suffix:
\\psi _ { j } ( C _ { r } ^ { \\vee } , t ) = \\frac { 4 \\sinh 2 j t ( \\cosh ( 2 w _ { 1 } t ) \\cosh ( 2 w _ { 2 } t ) - \\cos ^ { 2 } ( x t ) ) } { \\sinh 2 t \\cosh h t } .

Image

prefix:
read equation in LATEX

suffix:
- \\frac { h ^ { 2 } } { 2 \\lambda } \\int d t d ^ { 2 } x d ^ { 2 } x ^ { \\prime } ( { \\tilde { J } } _ { k } - \\frac { J _ { k } ^ { 0 } } { \\rho _ { 0 } } { \\tilde { J } } _ { 0 } ) ( t , x ) \\Delta ^ { - 1 } ( x - x ^ { \\prime } ) ( { \\tilde { J } } _ { k } - \\frac { J _ { k } ^ { 0 } } { \\rho _ { 0 } } { \\tilde { J } } _ { 0 } ) ( t , x ^ { \\prime } ) .

Visual Question Answering (VQA)

VQA tasks require models to answer natural language questions based on the visual content of an image.

{"image":"image1.jpg","prefix":"What is the ratio of yes to no?","suffix":"1.54"}
{"image":"image2.jpg","prefix":"What was the leading men's magazine in the UK from April 2019 to March 2020?","suffix":"GQ"}
{"image":"image3.jpg","prefix":"Which country has the greatest increase from 1975 to 1980?","suffix":"Taiwan"}
Image

prefix:
What is the ratio of yes to no?

suffix:
1.54

Image

prefix:
What was the leading men's magazine in the UK from April 2019 to March 2020?

suffix:
GQ

Image

prefix:
Which country has the greatest increase from 1975 to 1980?

suffix:
Taiwan

JSON Data Extraction

This task involves identifying and extracting structured information formatted as JSON from images or documents, facilitating seamless integration into data pipelines.

{"image":"image1.jpg","prefix":"extract document data in JSON format","suffix":"{\"route\": \"J414-YG-624\",\"pallet_number\": \"17\",\"delivery_date\": \"9/18/2024\",\"load\": \"1\",\"dock\": \"D08\",\"shipment_id\": \"P18941494362\",\"destination\": \"595 Navarro Radial Suite 559, Port Erika, HI 29655\",\"asn_number\": \"4690787672\",\"salesman\": \"CAROL FREDERICK\",\"products\": [{\"description\": \"159753 - BOX OF PAPER CUPS\",\"cases\": \"32\",\"sales_units\": \"8\",\"layers\": \"2\"},{\"description\": \"583947 - BOX OF CLOTH RAGS\",\"cases\": \"8\",\"sales_units\": \"2\",\"layers\": \"5\"},{\"description\": \"357951 - 6PK OF HAND SANITIZER\",\"cases\": \"2\",\"sales_units\": \"32\",\"layers\": \"4\"},{\"description\": \"847295 - CASE OF DISPOSABLE CAPS\",\"cases\": \"16\",\"sales_units\": \"4\",\"layers\": \"3\"}],\"total_cases\": \"58\",\"total_units\": \"46\",\"total_layers\": \"14\",\"printed_date\": \"12/05/2024 10:14\",\"page_number\": \"60\"}"}
{"image":"image2.jpg","prefix":"extract document data in JSON format","suffix":"{\"route\": \"V183-RZ-924\",\"pallet_number\": \"14\",\"delivery_date\": \"5/3/2024\",\"load\": \"4\",\"dock\": \"D20\",\"shipment_id\": \"P29812736099\",\"destination\": \"706 Meghan Brooks, Amyberg, IA 67863\",\"asn_number\": \"2211190904\",\"salesman\": \"RYAN GREEN\",\"products\": [{\"description\": \"293847 - ROLL OF METAL WIRE\",\"cases\": \"16\",\"sales_units\": \"8\",\"layers\": \"4\"},{\"description\": \"958273 - CASE OF SPRAY MOPS\",\"cases\": \"16\",\"sales_units\": \"8\",\"layers\": \"3\"},{\"description\": \"258963 - CASE OF MULTI-SURFACE SPRAY\",\"cases\": \"2\",\"sales_units\": \"4\",\"layers\": \"2\"}],\"total_cases\": \"34\",\"total_units\": \"20\",\"total_layers\": \"9\",\"printed_date\": \"12/05/2024 10:14\",\"page_number\": \"91\"}"}
{"image":"image3.jpg","prefix":"extract document data in JSON format","suffix":"{\"route\": \"A702-SG-978\",\"pallet_number\": \"19\",\"delivery_date\": \"4/7/2024\",\"load\": \"5\",\"dock\": \"D30\",\"shipment_id\": \"Y69465838537\",\"destination\": \"31976 French Wall, East Kimport, NY 87074\",\"asn_number\": \"4432967070\",\"salesman\": \"PATRICIA ROSS\",\"products\": [{\"description\": \"384756 - CASE OF BUCKET LIDS\",\"cases\": \"32\",\"sales_units\": \"4\",\"layers\": \"3\"},{\"description\": \"384756 - CASE OF BUCKET LIDS\",\"cases\": \"8\",\"sales_units\": \"32\",\"layers\": \"4\"},{\"description\": \"958273 - CASE OF SPRAY MOPS\",\"cases\": \"32\",\"sales_units\": \"2\",\"layers\": \"5\"},{\"description\": \"345678 - BOX OF DISPOSABLE GLOVES\",\"cases\": \"64\",\"sales_units\": \"16\",\"layers\": \"3\"}],\"total_cases\": \"136\",\"total_units\": \"54\",\"total_layers\": \"15\",\"printed_date\": \"11/29/2024 17:03\",\"page_number\": \"28\"}"}
Image

prefix:
extract document data in JSON format

suffix:
{"route": "J414-YG-624","pallet_number": "17","delivery_date": "9/18/2024","load": "1","dock": "D08","shipment_id": "P18941494362","destination": "595 Navarro Radial Suite 559, Port Erika, HI 29655","asn_number": "4690787672","salesman": "CAROL FREDERICK","products": [{"description": "159753 - BOX OF PAPER CUPS","cases": "32","sales_units": "8","layers": "2"},{"description": "583947 - BOX OF CLOTH RAGS","cases": "8","sales_units": "2","layers": "5"},{"description": "357951 - 6PK OF HAND SANITIZER","cases": "2","sales_units": "32","layers": "4"},{"description": "847295 - CASE OF DISPOSABLE CAPS","cases": "16","sales_units": "4","layers": "3"}],"total_cases": "58","total_units": "46","total_layers": "14","printed_date": "12/05/2024 10:14","page_number": "60"}

Image

prefix:
extract document data in JSON format

suffix:
{"route": "V183-RZ-924","pallet_number": "14","delivery_date": "5/3/2024","load": "4","dock": "D20","shipment_id": "P29812736099","destination": "706 Meghan Brooks, Amyberg, IA 67863","asn_number": "2211190904","salesman": "RYAN GREEN","products": [{"description": "293847 - ROLL OF METAL WIRE","cases": "16","sales_units": "8","layers": "4"},{"description": "958273 - CASE OF SPRAY MOPS","cases": "16","sales_units": "8","layers": "3"},{"description": "258963 - CASE OF MULTI-SURFACE SPRAY","cases": "2","sales_units": "4","layers": "2"}],"total_cases": "34","total_units": "20","total_layers": "9","printed_date": "12/05/2024 10:14","page_number": "91"}

Image

prefix:
extract document data in JSON format

suffix:
{"route": "A702-SG-978","pallet_number": "19","delivery_date": "4/7/2024","load": "5","dock": "D30","shipment_id": "Y69465838537","destination": "31976 French Wall, East Kimport, NY 87074","asn_number": "4432967070","salesman": "PATRICIA ROSS","products": [{"description": "384756 - CASE OF BUCKET LIDS","cases": "32","sales_units": "4","layers": "3"},{"description": "384756 - CASE OF BUCKET LIDS","cases": "8","sales_units": "32","layers": "4"},{"description": "958273 - CASE OF SPRAY MOPS","cases": "32","sales_units": "2","layers": "5"},{"description": "345678 - BOX OF DISPOSABLE GLOVES","cases": "64","sales_units": "16","layers": "3"}],"total_cases": "136","total_units": "54","total_layers": "15","printed_date": "11/29/2024 17:03","page_number": "28"}

Object Detection

This task involves detecting and localizing multiple objects within an image by drawing bounding boxes around them. Each Vision-Language Model (VLM) may require a different text representation of these bounding boxes to interpret the spatial data correctly. The annotations below are compatible with PaliGemma and PaliGemma 2.

Tip

We are rolling out support for COCO and YOLO formats soon, and will handle conversion between bounding box representations and the format required by each supported VLM.

{"image":"image1.jpg","prefix":"detect figure ; table ; text","suffix":"<loc0412><loc0102><loc0734><loc0920> figure ; <loc0744><loc0102><loc0861><loc0920> text ; <loc0246><loc0102><loc0404><loc0920> text ; <loc0085><loc0102><loc0244><loc0920> text"}
{"image":"image2.jpg","prefix":"detect figure ; table ; text","suffix":"<loc0516><loc0114><loc0945><loc0502> text ; <loc0084><loc0116><loc0497><loc0906> figure ; <loc0517><loc0518><loc0945><loc0907> text"}
{"image":"image3.jpg","prefix":"detect figure ; table ; text","suffix":"<loc0784><loc0174><loc0936><loc0848> text ; <loc0538><loc0174><loc0679><loc0848> table ; <loc0280><loc0177><loc0533><loc0847> figure ; <loc0068><loc0174><loc0278><loc0848> figure ; <loc0686><loc0174><loc0775><loc0848> text"}
Image

prefix:
detect figure ; table ; text

suffix:
<loc0412><loc0102><loc0734><loc0920> figure ; <loc0744><loc0102><loc0861><loc0920> text ; <loc0246><loc0102><loc0404><loc0920> text ; <loc0085><loc0102><loc0244><loc0920> text

Image

prefix:
detect figure ; table ; text

suffix:
<loc0516><loc0114><loc0945><loc0502> text ; <loc0084><loc0116><loc0497><loc0906> figure ; <loc0517><loc0518><loc0945><loc0907> text

Image

prefix:
detect figure ; table ; text

suffix:
<loc0784><loc0174><loc0936><loc0848> text ; <loc0538><loc0174><loc0679><loc0848> table ; <loc0280><loc0177><loc0533><loc0847> figure ; <loc0068><loc0174><loc0278><loc0848> figure ; <loc0686><loc0174><loc0775><loc0848> text