Downloading the HTML For Every Page on a Group of Websites

In this walkthrough, we'll explain how you can use 80legs to download the full HTML source for every page on a group of websites.

The steps we'll take are:

  1. Upload a URL list containing each website we're interested in
  2. Create a crawl using the URL list and the CrawlInternalLinks 80app
  3. Download the results of the crawl

Using the Web Portal

1. Upload a URL list containing the websites

Login to the web portal and go to the "My URL Lists" page.  Click on "Create a URL list" and then "Type In a List".  Enter the websites you want to crawl, like so:

If you already have a text file containing all your websites, you can upload that file as well.  Click "Create URL list" when done.

2. Create a crawl using the URL list and the CrawlInternalLinks 80app

Go back to the "My Crawls" page and click "Create a new crawl".  Give your crawl a name.  Select the URL list you just uploaded, set your depth to 20, and your maximum URLs as high as possible for your account.  Select the CrawlInternalLinks 80app.  This 80app will crawl only links on the domain in your URL list.

Click "Create Crawl" when you're done.

3. Download the results of the crawl

Once the crawl completes, go to the dashboard for that crawl.  You'll see a link to download the results of this crawl.  And you're done!

Using the API

Here are those same steps using the API:

The last call will give you one or more links to download for your results.