Result Processing & Hydration

After the consensus phase produces a clean JSON object, the ResultProcessor is responsible for converting it back into Python objects and persisting them.

Hydration Strategies

The processor employs two strategies depending on the extraction mode:

  1. Direct Hydration (Structured Output) When using use_structured_output=True (default), the JSON structure exactly mirrors the Pydantic models derived from your SQLModels. * Mechanism: Recursive Pydantic instantiation (Model(**data)). * Pros: Fast, type-safe, handles nested objects automatically. * Cons: Requires the LLM to strictly follow the schema.

  2. Graph Reconstruction (Flat/Legacy) When working with flat JSON output (used in some legacy modes or specific prompt configurations), the processor must reconstruct the graph. * Mechanism: It identifies objects by their keys and reconstructs relationships manually.

Foreign Key Recovery

In Hierarchical Extraction, child objects (like Employees) are extracted in separate steps from their parents (Departments).

  • The Problem: The child objects generated by the LLM don’t know the real database IDs of their parents (since the parents might have just been inserted).

  • The Solution: 1. The BatchPipeline tracks the “Parent Context” for every child batch. 2. When hydrating the child, the ResultProcessor injects the correct parent_id based on this context. 3. This ensures that even though they were extracted separately, the database relationships are correctly preserved.