Skip to main content
Scrape Website with Clay

Scrape information like emails, phone numbers, and keywords from a webpage

Updated over 4 months ago

Introduction

This integration allows users to extract data from webpages using Clay’s Scrape Website action. With this action, you can retrieve any content on a web page, including body text, links, emails, phone numbers, and keywords from specified URLs. By leveraging this functionality, users can efficiently visit webpages and extract their entire content, gather and incorporate web-based data into their Clay workflows. This streamlines information extraction processes for various business and research purposes, making data collection more efficient and comprehensive.

Input

Name

Is Optional

Description

Type

Website URL

url

Scrape Delay in Seconds

true

The number of seconds to wait before scraping the website. This gives the javascript in the website time to load. If you find a bunch of results are missing, try adding a delay. Maximum of 10 seconds.

number

Keep Non-Text in Body

true

When scraping body text, we automatically remove any scripts, styles, or images that may be present in the returned text. However, in certain cases, you may want to keep this content. If so, set this to true.

boolean

Output Fields

true

Optionally, select the fields you want to receive in your output data.

text

Extract Custom Regex

true

Use this field to extract custom data from the website. For example, if you want to extract all of the wikipedia links from the website, you can use https?://([a-z]{2,3}.)?wikipedia.org/wiki/[a-zA-Z0-9_-]*

text

Output

Name

Type

Title

text

Keywords

text

Description

text

Favicon

url

Social Links

object

Extracted Keywords

array

Links

array

Emails

array

Phone Numbers

array

Images

array

Body Text

text

Did this answer your question?