Extension

Scraper

Extract structured data from URL with our service

by Dashkit

Download

Use requires purchase of credits.

  • Version 0.1.0
  • Operating systems Linux & Mac (ARM)

This SQLite extension helps you get web pages and pull out organized data. Dashkit servers do the work, extracting HTML tables, inline JSON, meta tags, and other details from the web pages.

Example

select value‐>>'code' as code, value‐>>'country_name' as name from scrape_list( 'https://en.wikipedia.org/wiki/ISO_3166-1_numeric' ); ┌──────┬──────────────────────────────┐ │ code │ name │ ├──────┼──────────────────────────────┤ │ 834 │ Tanzania, United Republic of │ │ 840 │ United States of America │ │ 850 │ Virgin Islands (U.S.) │ └──────┴──────────────────────────────┘

Key points

  • Requests made from secure Dashkit servers
  • Works with any SQLite program or programming language
  • Five minute cache for repeatedly fetched URLs
  • Purchase credits as you need them
  • Lifetime updates

Functions 1

scrape_list( string url ) ⭢ [json]

Extract best structured list from URL
Fetch given URL and return the best structured data from it. Data is returned as JSON rows, each normalized and typed fields.
-- From URL select * from scrape_list( 'https://en.wikipedia.org/wiki/ISO_3166-1_numeric' ); ┌───────────────────────────────────────────────────┐ │ value │ ├───────────────────────────────────────────────────┤ │ { │ │ "code":834, │ │ "country_name":"Tanzania, United Republic of", │ │ "notes":"" │ │ } │ └───────────────────────────────────────────────────┘

Configuration

Every new SQLite connection should call function scrape_config with authentication details. This information is not saved, and kept in-memory for the duration of the connection.

select scrape_config('key: ...');

  • key Credit key for scraper extension
  • prefix Prefix for function names, defaults to scrape_

FAQ

For any questions, support, or feedback regarding this extension, please contact support

Can you scrape dynamic web page content?
No, this service does not use headless browsers to scrape pages where JavaScript generates the content.
How do I scrape a URL behind a login?
This is not currently supported. We only make URL requests to publicly available URLs.
How are my credits consumed?
One credit is consumed for every real-time URL retrieval that returns some structured data. Thus, both cached URLs and failed scraping attempts are free.
What information is stored on your service?

We temporarily store responses from the URLs you fetch in secure storage on our Azure servers. We do not keep the URLs you fetch in plain text.

To ensure service quality, we may monitor usage according to our terms of service.

What other software do I need?

You will need either Linux or Mac (ARM) 64-bit operating system. Windows is currently not supported.

In addition, you should have SQLite version 3.45.1 or newer installed, with support for extensions.

How do I receive updates?

We will email you when an update to the same extension version has been released. It will be made available to you free of charge.