Intro

With extrai, you can extract data from text documents with LLMs, which will be formatted into a given SQLModel and registered in your database.

The core of the library is its Consensus Mechanism. We make the same request multiple times, using the same or different providers, and then select the values that meet a certain threshold.

extrai also has other features, like generating SQLModels from a prompt and documents, and generating few-shot examples. For complex, nested data, the library offers Hierarchical Extraction, breaking down the extraction into manageable, hierarchical steps. It also includes built-in analytics to monitor performance and output quality.

Worflow Overview

The library is built around a few key components that work together to manage the extraction workflow. The following diagram illustrates the high-level workflow (see Architecture Overview for more details):

        graph TD
    A[Unstructured Text] --> B(WorkflowOrchestrator);
    C[SQLModel Definition] --> B;
    B --> D{LLM Client};
    D --> E[Multiple JSON Outputs];
    E --> F(SQLAlchemyHydrator);
    F --> G(JSONConsensus);
    G --> H[Consolidated JSON];
    H --> I(SQLAlchemyHydrator);
    I --> J[Structured Data in DB];
    

Key Features