Before accessing the Giant Web Crawl (GWC), you will need the following:
Step 1: Request the list of available results
As the GWC runs, it will post relevant results for your account to your account's results directory. That directory can be accessed by issuing a request like so:
Step 2: Download the available result files
The response you receive from your request in Step 1 will look something like:
This is a list of links to result files. Download the result files to get the data posted to your GWC account.
Each file will contain one or more records matching the specifications supplied to us for your GWC account. For example, if you requested to receive any emails found on URLs crawled by the GWC, your data will look something like:
Please note that we have added tabs and line-breaks to this example to make it more readable. The actual data will not contain such separation.
Result files will expire after 7 days. Once they have expired, they cannot be retrieved. Because of this, you should make sure to constantly check for newly-available files.