LogoLogo
Documentation
Documentation
  • Getting Started
    • Introduction
    • Sign up to Developer Edition
    • Build Your First Agent
    • Developer Support
  • Core Concepts
    • Agent
      • Knowledge
      • Webhook
    • PII Masking
    • Sub-Agent
    • Intent
    • Workflow
      • Node
        • Input
        • Output
        • Loader
        • Display
        • API Node
        • Web Crawler
        • Table Write
        • Table Read
        • Ruleset
        • Upload Document
        • Javascript
        • Workflow
        • Loop
        • Document To Image
        • External Database
        • Storage Write
        • Storage Read
        • Fetch Document
        • Prompt
        • RAG Query
        • Vector Search
        • Emit Event
    • RAG
    • Model Hub
      • Entity Recognizers
    • Data Gateway
    • Rulesets
    • Code Snippets
    • Tables
    • Storage
    • Widget
  • Overview of GenAI
    • Introduction
    • Key concepts
      • Intent Classification
      • Inference
      • Generative AI Models
      • Large Language Models (LLMs)
      • Prompt Engineering
      • AI Agents
      • RAG (Retrieval Augmented Generation)
      • AI Workflow Automation
      • AI Agents vs LLM-based APPs
Powered by GitBook
On this page
  • Overview
  • Configurations
  • Example Use-Case
  • Key Takeaways for Developers
Export as PDF
  1. Core Concepts
  2. Workflow
  3. Node

Web Crawler

Overview

The Web Crawler Node in UPTIQ Workbench is designed to extract relevant information from web pages in real-time. Unlike traditional web scrapers, this node is optimized for AI-driven workflows, where extracted content can be processed by Large Language Models (LLMs) to generate structured insights.

This node is particularly useful for retrieving dynamic, publicly available information, such as company overviews, industry trends, or competitor insights. The extracted data can be refined, summarized, and structured to fit business needs, making it a valuable component for AI-driven research and automation.

Configurations

URL (Required)

  • The fully-qualified web address from which data should be retrieved.

  • Example: https://www.uptiq.ai/about

Instructions (Required)

  • Defines how the extracted web content should be processed.

  • Instructs AI agents on what aspects of the data to analyze and summarize.

  • Example:

Output Format

  • The Web Crawler Node outputs structured data in JSON format.

  • Example output:

    {
      "Mission": "To empower businesses with AI-driven solutions for improved decision-making.",
      "Services": "Offers AI workbench for building and deploying intelligent agents.",
      "Achievements": "Recognized as a leader in low-code AI development platforms."
    }

Example Use-Case

1. Summarizing a Company's Information

Scenario: A user requests an overview of a company. The Web Crawler Node scrapes the company's "About" page and passes the extracted content to an LLM node, which generates a concise, structured summary.

Workflow Nodes Used in this Use-case

  1. Web Crawler - to scrape and summurize the information

  2. Display - to display the information to user.

Configurations:

Field
Value

URL

Instructions

You are an AI assistant tasked with summarizing company information from extracted web content. Analyze the provided data and produce a concise summary in JSON format. Each key in the JSON should represent one aspect of the company, and the corresponding value should be a brief summary of that aspect. Focus on critical details like the company's mission, services, achievements, and any other notable points.

Workflow Steps:

  1. Web Crawler Node scrapes https://www.uptiq.ai/about to extract relevant content.

  2. LLM Node processes the extracted content and generates a structured summary.

  3. Display Node presents the final output to the user.

Final Output:

{
  "Mission": "To empower businesses with AI-driven solutions for improved decision-making.",
  "Services": "Offers AI workbench for building and deploying intelligent agents.",
  "Achievements": "Recognized as a leader in low-code AI development platforms."
}

Key Takeaways for Developers

✅ Real-Time Data Extraction – The Web Crawler Node retrieves fresh, publicly available content from websites for AI processing.

✅ Structured AI-Driven Summarization – Extracted content is refined using LLMs, ensuring concise and contextually relevant outputs.

✅ Customizable Processing Instructions – Developers can tailor how extracted data is interpreted and structured by modifying the instructions field.

✅ JSON-Formatted Output – Ensures compatibility with other workflow components for seamless data handling.

By integrating the Web Crawler Node into workflows, developers can automate web-based data retrieval and AI-powered summarization, significantly enhancing information accessibility and decision-making. 🚀

PreviousAPI NodeNextTable Write

Last updated 3 months ago

https://www.uptiq.ai/about