LogoLogo
Documentation
Documentation
  • Getting Started
    • Introduction
    • Sign up to Developer Edition
    • Build Your First Agent
    • Developer Support
  • Core Concepts
    • Agent
      • Knowledge
      • Webhook
    • PII Masking
    • Sub-Agent
    • Intent
    • Workflow
      • Node
        • Input
        • Output
        • Loader
        • Display
        • API Node
        • Web Crawler
        • Table Write
        • Table Read
        • Ruleset
        • Upload Document
        • Javascript
        • Workflow
        • Loop
        • Document To Image
        • External Database
        • Storage Write
        • Storage Read
        • Fetch Document
        • Prompt
        • RAG Query
        • Vector Search
        • Emit Event
    • RAG
    • Model Hub
      • Entity Recognizers
    • Data Gateway
    • Rulesets
    • Code Snippets
    • Tables
    • Storage
    • Widget
  • Overview of GenAI
    • Introduction
    • Key concepts
      • Intent Classification
      • Inference
      • Generative AI Models
      • Large Language Models (LLMs)
      • Prompt Engineering
      • AI Agents
      • RAG (Retrieval Augmented Generation)
      • AI Workflow Automation
      • AI Agents vs LLM-based APPs
Powered by GitBook
On this page
  • Overview
  • Configurations
  • Example Use-Cases
  • Key Takeaways for Developers
Export as PDF
  1. Core Concepts
  2. Workflow
  3. Node

Document To Image

PreviousLoopNextExternal Database

Last updated 3 months ago

Overview

The Document to Image Node abstracts the process of converting any document (PDF, DOCX, Excel, etc.) into images, making it easier to pass structured data to LLMs for extraction.

This node is critical in workflows where: ✅ Documents need to be summarized before being processed. ✅ Text extraction accuracy needs improvement (reducing formatting errors). ✅ OCR and AI-driven tools require images for better text recognition. ✅ Structured data from Excel sheets needs to be extracted accurately (e.g., financial tables).

Instead of directly processing a raw document, converting it into images first improves clarity for AI models, ensuring higher extraction accuracy when passed to Prompt Nodes for further analysis.

Watch the ” video to see how this node is used in real-world document processing workflows.

Configurations

Field
Description

Document Id

The unique identifier of the document that needs to be converted into images. This documentId is generated by the Upload Document Node.

Execution Flow

1️⃣ Receives a documentId as input (from an Upload Document Node). 2️⃣ Converts the document into images (one per page for PDFs/DOCX, one per sheet for Excel). 3️⃣ Returns image metadata, including URLs, page numbers, and sheet names (if applicable). 4️⃣ The resulting image URLs can be passed to an LLM for data extraction (via a Prompt Node).

Output Format

The node returns a structured JSON output containing image metadata linked to the original document:

{
  "imagesResult": [
    {
      "images": [
        {
          "key": "/executions/192012/image1",
          "documentId": "985b9706-c3e0-48b3-b6f5-2cb873004e41",
          "imageUrl": "https://storage.googleapis.com/example/image1.jpg"
        }
      ],
      "pageNumber": 1
    },
    {
      "images": [
        {
          "key": "/executions/192012/image2",
          "documentId": "901bd6ce-0ee5-48bb-b4fc-5351c7a9d925",
          "imageUrl": "https://storage.googleapis.com/example/image2.jpg"
        }
      ],
      "pageNumber": 2
    }
  ]
}
  • documentId → The original document’s reference ID.

  • imageUrl → The generated image’s location (can be used for further processing).

  • pageNumber → Page index for multi-page documents (PDF/DOCX).

  • sheetName (For Excel) → Indicates which sheet the image corresponds to.


Example Use-Cases

Use-Case 1: Extracting Data from a Loan Agreement PDF

A loan processing workflow needs to extract borrower details and loan terms from a PDF document. Instead of directly processing the PDF, the document is converted to images for better OCR and AI-driven text extraction.

Configuration:

Field
Value

Document Id

fa5d0517-a479-49a5-b06e-9ed599f8e57a

Execution Process:

1️⃣ User uploads a PDF (Loan Agreement). 2️⃣ Document to Image Node converts each page into separate images. 3️⃣ The image URLs are passed to the Prompt Node, where an LLM extracts borrower details, interest rates, and loan conditions.

🔹 Why use this approach? ✔ Improves OCR accuracy (eliminates PDF formatting inconsistencies). ✔ Prepares structured image data for AI-based text extraction. ✔ Works with multi-page documents seamlessly.


Use-Case 2: Processing Financial Spreadsheets for AI Extraction

A workflow extracts financial summaries from an Excel sheet, ensuring accurate numeric extraction (e.g., revenue, expenses, and net profit values).

Configuration:

Field
Value

Document Id

fc9d0517-b479-49a5-b06e-8ed599a8c123

Execution Process:

1️⃣ User uploads an Excel file (Balance Sheet). 2️⃣ Document to Image Node converts each sheet into an image. 3️⃣ The images are processed through an LLM, extracting key financial data.

🔹 Why use this approach? ✔ Preserves numeric formatting (avoids misinterpretation of decimal points). ✔ Prepares structured tables for AI analysis. ✔ Enhances accuracy for finance-driven workflows.


Use-Case 3: Automating Identity Verification from Scanned Documents

A workflow automates KYC (Know Your Customer) verification by extracting text from scanned documents.

Configuration:

Field
Value

Document Id

ff5d0517-d179-42a5-a16e-3ed599f8e77b

Execution Process:

1️⃣ User uploads a scanned image of an ID (PDF format). 2️⃣ Document to Image Node extracts individual pages into images. 3️⃣ The image URLs are sent to an AI model, which verifies identity details.

🔹 Why use this approach? ✔ Ensures compatibility with OCR-driven KYC tools. ✔ Allows for multi-step validation (Face Match, ID Verification, etc.).


Key Takeaways for Developers

✅ Abstracts Document-to-Image Conversion – Developers don’t need to manually process PDFs, Excel sheets, or DOCX files. The node automates image conversion for seamless AI-based processing.

✅ Enhances LLM-Based Data Extraction – Converting documents to images improves AI accuracy, ensuring better text recognition and field extraction.

✅ Supports Multi-Format Inputs – Works with PDFs, Excel Sheets, and Scanned Documents, making it versatile across business use cases.

✅ Integrates with AI & OCR Processing – Images generated from documents can be passed to Prompt Nodes, enabling structured AI-driven data extraction.

✅ Used in Real-World AI Workflows – Watch the "Build Your First AI Agent" video to see how this node is applied for document-based automation.

By leveraging the Document to Image Node, developers can streamline AI-powered document processing, ensuring higher accuracy, efficiency, and seamless AI integration. 🚀

“Build Your First Agent