Once you've started to run crawls, you may want to start customizing how your crawler behaves.  By creating and using your own custom 80app, you can:

  • Tell the crawler which links to crawl next from the URL it's currently crawling
  • Tell the crawler what data to return from the URL it's currently crawling

The steps we'll take to create a new 80app are:

  1. Clone the existing public 80app repo.
  2. Copy an existing 80app to a new file and make changes.
  3. Upload the new 80app to your 80legs account.

1. Clone the exiting public 80app repo

You can do a git clone git@github.com:datafiniti/EightyApps.git to get the public 80app repo.  The URL for the repo is https://github.com/datafiniti/EightyApps.

2. Copy an existing 80app to a new file and make changes

Once you have the repo in your local file system, go to the directory and do a cp apps/SimpleEightyApp.js apps/NewEightyApp.js to create a new 80app file.

Open NewEightyApp.js and make changes as you see fit.  You can see examples for various functionality at https://github.com/datafiniti/EightyApps/tree/master/apps. Use the README instructions on the public repo for testing your new 80app.

Note: The 80legs web crawler uses an extended version of Cheerio, a lighter weight version of jQuery. Though it implements many of the core jQuery functions, it does not have functionality for all of them. Please go here to learn more about how this affects how you build 80apps.

3. Test your 80app

Use our testing site to test your 80app.

4. Upload the new 80app to your 80legs account

After you're satisfied with your new code, it's time to make it available to your 80legs crawls.  To do this, you'll make a call to the API to upload the 80app to your account like so:

Once the new 80app has uploaded successfully, you can use it when creating a new crawl.