Introduction
This integration allows users to extract data from webpages using Clay’s Scrape Website action. With this action, you can retrieve any content on a web page, including body text, links, emails, phone numbers, and keywords from specified URLs. By leveraging this functionality, users can efficiently visit webpages and extract their entire content, gather and incorporate web-based data into their Clay workflows. This streamlines information extraction processes for various business and research purposes, making data collection more efficient and comprehensive.
Input
Name | Is Optional | Description | Type |
Website URL | url | ||
Scrape Delay in Seconds | true | The number of seconds to wait before scraping the website. This gives the javascript in the website time to load. If you find a bunch of results are missing, try adding a delay. Maximum of 10 seconds. | number |
Keep Non-Text in Body | true | When scraping body text, we automatically remove any scripts, styles, or images that may be present in the returned text. However, in certain cases, you may want to keep this content. If so, set this to true. | boolean |
Output Fields | true | Optionally, select the fields you want to receive in your output data. | text |
Extract Custom Regex | true | Use this field to extract custom data from the website. For example, if you want to extract all of the wikipedia links from the website, you can use https?://([a-z]{2,3}.)?wikipedia.org/wiki/[a-zA-Z0-9_-]* | text |
Output
Name | Type |
Title | text |
Keywords | text |
Description | text |
Favicon | url |
Social Links | object |
Extracted Keywords | array |
Links | array |
Emails | array |
Phone Numbers | array |
Images | array |
Body Text | text |