JSONL
Overview¶
JSONL dataset is a simple, text-based format that makes it easy to work with multimodal data. Each line in a JSONL file is a valid JSON object. Each JSON object must contain the following keys:
image
: A string specifying the image file name associated with the dataset item.prefix
: A string representing the prompt that will be sent to the model.suffix
: A string representing the expected model response.
Warning
suffix
can be as simple as a single number or string, a full paragraph, or even a structured JSON output.
Regardless of its content, ensure that the it is properly serialized to conform with JSON value requirements.
Tip
Use Roboflow's tools to annotate and export your multimodal datasets in JSONL format, streamlining data preparation for model training.
Dataset Structure¶
Divide your dataset into three subdirectories: train
, valid
, and test
. Each subdirectory should contain its own
annotations.jsonl
file that holds the annotations for that particular split, along with the corresponding image
files. Below is an example of the directory structure:
dataset/
├── train/
│ ├── annotations.jsonl
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ... (other image files)
├── valid/
│ ├── annotations.jsonl
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ... (other image files)
└── test/
├── annotations.jsonl
├── image1.jpg
├── image2.jpg
└── ... (other image files)
JSONL Examples¶
JSONL is a versatile format that can represent datasets for a wide range of visual-language tasks. Its flexible structure supports multiple annotation styles, making it an ideal choice for integrating diverse data sources.
Object Character Recognition (OCR)¶
OCR extracts textual content from images, converting printed or handwritten text into machine-readable data.
{"image":"image1.jpg","prefix":"read equation in LATEX","suffix":"H = \\dot { x } _ { i } \\Pi _ { x ^ { i } } + \\Pi _ { x ^ { i } } \\dot { x } _ { i } ^ { * } + \\dot { \\psi } _ { i } \\Pi _ { \\psi _ { i } } - \\Pi _ { \\psi _ { i } ^ { * } } \\dot { \\psi } _ { i } ^ { * } +"}
{"image":"image2.jpg","prefix":"read equation in LATEX","suffix":"\\psi _ { j } ( C _ { r } ^ { \\vee } , t ) = \\frac { 4 \\sinh 2 j t ( \\cosh ( 2 w _ { 1 } t ) \\cosh ( 2 w _ { 2 } t ) - \\cos ^ { 2 } ( x t ) ) } { \\sinh 2 t \\cosh h t } ."}
{"image":"image3.jpg","prefix":"read equation in LATEX","suffix":"- \\frac { h ^ { 2 } } { 2 \\lambda } \\int d t d ^ { 2 } x d ^ { 2 } x ^ { \\prime } ( { \\tilde { J } } _ { k } - \\frac { J _ { k } ^ { 0 } } { \\rho _ { 0 } } { \\tilde { J } } _ { 0 } ) ( t , x ) \\Delta ^ { - 1 } ( x - x ^ { \\prime } ) ( { \\tilde { J } } _ { k } - \\frac { J _ { k } ^ { 0 } } { \\rho _ { 0 } } { \\tilde { J } } _ { 0 } ) ( t , x ^ { \\prime } ) ."}
|
prefix: suffix: |
|
prefix: suffix: |
|
prefix: suffix: |
Visual Question Answering (VQA)¶
VQA tasks require models to answer natural language questions based on the visual content of an image.
{"image":"image1.jpg","prefix":"What is the ratio of yes to no?","suffix":"1.54"}
{"image":"image2.jpg","prefix":"What was the leading men's magazine in the UK from April 2019 to March 2020?","suffix":"GQ"}
{"image":"image3.jpg","prefix":"Which country has the greatest increase from 1975 to 1980?","suffix":"Taiwan"}
|
prefix: suffix: |
|
prefix: suffix: |
|
prefix: suffix: |
JSON Data Extraction¶
This task involves identifying and extracting structured information formatted as JSON from images or documents, facilitating seamless integration into data pipelines.
{"image":"image1.jpg","prefix":"extract document data in JSON format","suffix":"{\"route\": \"J414-YG-624\",\"pallet_number\": \"17\",\"delivery_date\": \"9/18/2024\",\"load\": \"1\",\"dock\": \"D08\",\"shipment_id\": \"P18941494362\",\"destination\": \"595 Navarro Radial Suite 559, Port Erika, HI 29655\",\"asn_number\": \"4690787672\",\"salesman\": \"CAROL FREDERICK\",\"products\": [{\"description\": \"159753 - BOX OF PAPER CUPS\",\"cases\": \"32\",\"sales_units\": \"8\",\"layers\": \"2\"},{\"description\": \"583947 - BOX OF CLOTH RAGS\",\"cases\": \"8\",\"sales_units\": \"2\",\"layers\": \"5\"},{\"description\": \"357951 - 6PK OF HAND SANITIZER\",\"cases\": \"2\",\"sales_units\": \"32\",\"layers\": \"4\"},{\"description\": \"847295 - CASE OF DISPOSABLE CAPS\",\"cases\": \"16\",\"sales_units\": \"4\",\"layers\": \"3\"}],\"total_cases\": \"58\",\"total_units\": \"46\",\"total_layers\": \"14\",\"printed_date\": \"12/05/2024 10:14\",\"page_number\": \"60\"}"}
{"image":"image2.jpg","prefix":"extract document data in JSON format","suffix":"{\"route\": \"V183-RZ-924\",\"pallet_number\": \"14\",\"delivery_date\": \"5/3/2024\",\"load\": \"4\",\"dock\": \"D20\",\"shipment_id\": \"P29812736099\",\"destination\": \"706 Meghan Brooks, Amyberg, IA 67863\",\"asn_number\": \"2211190904\",\"salesman\": \"RYAN GREEN\",\"products\": [{\"description\": \"293847 - ROLL OF METAL WIRE\",\"cases\": \"16\",\"sales_units\": \"8\",\"layers\": \"4\"},{\"description\": \"958273 - CASE OF SPRAY MOPS\",\"cases\": \"16\",\"sales_units\": \"8\",\"layers\": \"3\"},{\"description\": \"258963 - CASE OF MULTI-SURFACE SPRAY\",\"cases\": \"2\",\"sales_units\": \"4\",\"layers\": \"2\"}],\"total_cases\": \"34\",\"total_units\": \"20\",\"total_layers\": \"9\",\"printed_date\": \"12/05/2024 10:14\",\"page_number\": \"91\"}"}
{"image":"image3.jpg","prefix":"extract document data in JSON format","suffix":"{\"route\": \"A702-SG-978\",\"pallet_number\": \"19\",\"delivery_date\": \"4/7/2024\",\"load\": \"5\",\"dock\": \"D30\",\"shipment_id\": \"Y69465838537\",\"destination\": \"31976 French Wall, East Kimport, NY 87074\",\"asn_number\": \"4432967070\",\"salesman\": \"PATRICIA ROSS\",\"products\": [{\"description\": \"384756 - CASE OF BUCKET LIDS\",\"cases\": \"32\",\"sales_units\": \"4\",\"layers\": \"3\"},{\"description\": \"384756 - CASE OF BUCKET LIDS\",\"cases\": \"8\",\"sales_units\": \"32\",\"layers\": \"4\"},{\"description\": \"958273 - CASE OF SPRAY MOPS\",\"cases\": \"32\",\"sales_units\": \"2\",\"layers\": \"5\"},{\"description\": \"345678 - BOX OF DISPOSABLE GLOVES\",\"cases\": \"64\",\"sales_units\": \"16\",\"layers\": \"3\"}],\"total_cases\": \"136\",\"total_units\": \"54\",\"total_layers\": \"15\",\"printed_date\": \"11/29/2024 17:03\",\"page_number\": \"28\"}"}
|
prefix: suffix: |
|
prefix: suffix: |
|
prefix: suffix: |
Object Detection¶
This task involves detecting and localizing multiple objects within an image by drawing bounding boxes around them. Each Vision-Language Model (VLM) may require a different text representation of these bounding boxes to interpret the spatial data correctly. The annotations below are compatible with PaliGemma and PaliGemma 2.
Tip
We are rolling out support for COCO and YOLO formats soon, and will handle conversion between bounding box representations and the format required by each supported VLM.
{"image":"image1.jpg","prefix":"detect figure ; table ; text","suffix":"<loc0412><loc0102><loc0734><loc0920> figure ; <loc0744><loc0102><loc0861><loc0920> text ; <loc0246><loc0102><loc0404><loc0920> text ; <loc0085><loc0102><loc0244><loc0920> text"}
{"image":"image2.jpg","prefix":"detect figure ; table ; text","suffix":"<loc0516><loc0114><loc0945><loc0502> text ; <loc0084><loc0116><loc0497><loc0906> figure ; <loc0517><loc0518><loc0945><loc0907> text"}
{"image":"image3.jpg","prefix":"detect figure ; table ; text","suffix":"<loc0784><loc0174><loc0936><loc0848> text ; <loc0538><loc0174><loc0679><loc0848> table ; <loc0280><loc0177><loc0533><loc0847> figure ; <loc0068><loc0174><loc0278><loc0848> figure ; <loc0686><loc0174><loc0775><loc0848> text"}
|
prefix: suffix: |
|
prefix: suffix: |
|
prefix: suffix: |