As you become more proficient in building custom web scrapers using 80apps, you'll probably want to learn more about how to build 80apps so they're as efficient as possible. This will help your crawls run more quickly, since the processing time to run an 80app can take a significant portion of total crawl run time.
We'll be using this URL to illustrate each approach: http://www.supplyhouse.com/Zurn-GT2700-50-100-Grease-Trap-50gpm-4385000-p
Here's the specific part of the page we'll be focusing on:
We want to find the fastest way to specifically scrape the length, width and height. Dimensions will be an object containing the length, width and height properties, each with its respective values.
Our goal is to produce this object:
We've found 4 different ways that you can approach this:
Time Taken: 4181 milliseconds
This approach is very slow because the DOM has to be parsed multiple times. Using the contains pseudo-selector is also something you want to avoid because it searches both innerHTML and text.
Time Taken: 913 milliseconds
In the first approach our starting point was always #feature_list. We cache #feature_list by saving the result of $html.find("#feature_list") into a variable and just parse that selection instead of parsing the entire DOM for #feature_list everytime. Caching sections of the DOM and parsing that as opposed to the entire document is significantly faster.
Time Taken: 2 milliseconds
Instead of searching through "#feature_list" multiple times, this approach iterates through the table rows in that div, checking if they match our desired property names.
Time Taken: 1 millisecond
This approach is similar to the iterative approach, except it uses Cheerio's .each method to loop through the table rows. It shaves off 1 millisecond.