New Open-Source Tool 'Lift' Extracts Structured JSON from PDFs with 90% Accuracy, Outperforming Azure and NuExtract3

Jun 19, 2026

GitHub

Article image for New Open-Source Tool 'Lift' Extracts Structured JSON from PDFs with 90% Accuracy, Outperforming Azure and NuExtract3

Summary

A powerful new open-source tool called 'Lift' launches on GitHub, achieving 90.2% accuracy in extracting structured JSON data from PDFs and images, outperforming Azure Content Understanding and NuExtract3 across 225 benchmark documents, with a managed API version pushing accuracy even further to 95.9%.

Key Points

A new open-source tool called 'lift' is now available on GitHub, enabling fast and accurate extraction of structured JSON data from PDFs and images using a 9B vision model with schema-constrained decoding.
Benchmark tests across 225 documents show lift achieving 90.2% field accuracy, outperforming competitors like Azure Content Understanding and NuExtract3, while a managed Datalab API version reaches 95.9% accuracy with added features like citations and confidence scores.
The tool supports easy installation via pip, offers CLI and Python API usage, includes a Schema Studio app for building and testing schemas, and provides a vLLM server option for production and batch processing deployments.

New Open-Source Tool 'Lift' Extracts Structured JSON from PDFs with 90% Accuracy, Outperforming Azure and NuExtract3

Summary

Key Points

Tags