Utter the word “ScrapeBox” around a group of white hat SEOs and you’ll notice a sea of icy glares pointing in your direction. At some level, this ire is understandable: most of those annoying, spammy blog comments you see in your WordPress Akismet spam folder likely stemmed from ScrapeBox. But like any tool, ScrapeBox is all about how you use it. In fact, many white hat SEO agencies consider the software one of their secret weapons. And in this chapter I’m going to teach you how to use ScrapeBox for good…not evil.
For those of you new to this tool, ScrapeBox essentially does two things: scrapes search engine results and posts automatic blog comments. We’re going to ignore blog commenting feature because that’s a spammy tactic that doesn’t work. Don’t be fooled by its simplicity: ScrapeBox is very powerful. You can easily streamline dozens of monotonous white hat link building processes with this tool.
But before we get into that, let me give you a quick primer on how the tool works.
There are 4 boxes in the ScrapeBox user interface. Here’s what they do:
We’re going to ignore the bottom right corner as this is only used for automatically posting blog comments.
There’s one other important area to point out: manage lists.
This is where you can easily sort and filter the results that ScrapeBox finds for you.
How to Harvest Results
Let’s start with the “Harvester” area.
There are two important sections here:
The footprint is what you include if you want to look for something that tends to appear on certain sites. For example, “Powered by WordPress” is a common footprint used to find WordPress blogs.
Let’s say you wanted to find .gov websites that have pages about nutrition. First, you’d put site:.gov in the footprint field.
And you’d include any keywords that you want to combine with the footprint. For example, if you enter the keyword “weight loss,” ScrapeBox would automatically search for: site:.gov weight loss.
You can add hundreds of keywords and ScrapeBox will automatically combine them with your footprint. When you’re ready to scrape, head down to the search engine and proxy settings.
And choose which search engines you want to use and how many results you want to find. I usually stick to Google and scrape 500-1000 results (after about 200 results, most of the results that you get are either irrelevant or from sites without much authority).
When you have a few search strings set up, click “Start Harvesting”:
You’ll now have a list of URLs in your “URL’s Harvested” area:
To get the most from your scraped list, you should check the PR of each page that you found. Under “Manage Lists”, choose “Check PageRank”:
Choose “Check URL PageRank”.
Now you can easily sort the pages by PR.
And delete pages that fall below a threshold. Let’s say you don’t want pages with a PR below 3. Scroll until you see pages with a PR2.
Click and scroll to highlight those results (you can also hold shift and use the direction buttons on your keyboard to select):
Right click and choose “Remove selected URL from list”
Filtering Your Results
If you scrape from multiple search engines you’ll probably get a few duplicate results in your list. You can easily remove them from your list by clicking the “Remove/Filter” button.
And choose “Remove Duplicate URLs.”
If you don’t want the same domain showing up in your results, you can delete duplicate domains by choosing “Remove Duplicate Domains” from the Remove/Filter options:
Now you have a clean list sorted by PR. You can export that information to Excel or a text file by clicking the “Export URL List” button. And choosing the export option that works best for you (I personally like Excel).
If you’re going to be using ScrapeBox on a regular basis, proxies are a must. If you scrape from your personal IP regularly, Google will likely ban it. Meaning no more Google searches. Fortunately, you can find free, working public proxies fairly easily.
And you don’t need any technical skills to set them up.
Using ScrapeBox’s Built In Service
ScrapeBox has a cool feature that actually finds and adds free proxies for you.
Head over to the “Select Engines and Proxies” box. Hit “Manage”:
In the next window, choose “Harvest Proxies.” Choose all of the supported sources. Click “Start.”
It’s important to test the proxies before using them. If you use non-functional proxies, ScrapeBox simply won’t work. Hit the “Test Proxies” button.
Choose “Test all proxies.”
Wait for ScrapeBox to test the proxies (it can take a while). When it’s finished you’ll see something like this:
Hit the filter button and choose “Keep Google Proxies.”
Hit the save button and choose “Save selected proxies to ScrapeBox.”
This will save the working, Google-approved proxies.
Now that you have a handle on how it works it’s time to use ScrapeBox to help you build incredible backlinks.
Resource Page Link Building
Resource page link building is one of the most under-utilized white hat link building strategies on the planet. Where else can you find pages that exist for the sole purpose of linking out to other sites? However, most people shy away from this strategy because it’s extremely time consuming to find resource pages, hunt for contact information and reach out to webmasters. Fortunately, you can dramatically streamline the resource page link building process with ScrapeBox.
First, enter one of these tested footprints into ScrapeBox:
In conjunction with niche-related keywords.
And hit “Start Harvesting.”
Sort your pages by PR to focus on the highest-value targets.
Now export your list, check for broken links, or just email site owners and beg for a link!
Competitor Backlink Analysis
There’s nothing better than reverse engineering your competition. It’s one of the only ways to quickly find an incredible list of high-value, niche related sites to get links from. While OSE, Majestic and Ahrefs are fantastic tools, they’re hard to use for sites with thousands of links. Enter ScrapeBox.
Open ScrapeBox and click Addons ? Show available addons.
Choose ScrapeBox Backlink Checker 2:
And click “Install Addon.”
For the addon to work, you need to have your competitor’s homepage in the harvester results area. To do this, just enter the site’s name:
Set the results to 10.
And scrape the results.
Delete any pages that you’re not interested in grabbing backlink information from.
Go back to the Addon menu and select the Backlink checker addon.
Click “Load URL List.” Choose “Load from ScrapeBox Harvester.”
When it’s done, choose “Download Backlinks.”
And save the file as a .txt file.
Close the Backlink checker and head back to the ScrapeBox main menu. Under “Manage Lists” choose “Import URL List.”
And upload the text file you saved.
Check the PR of the links in your list.
Now you can sort by PR so that you spend your time on backlink targets that meet your page PR or homepage PR threshold:
Find Guest Post Opportunities
Searching for relevant, authoritative sites to guest post on is one the most monotonous link building tasks on the planet. Armed with ScrapeBox you can find thousands of potential guest post targets — and weed out low PR sites — in a matter of minutes.
Start off with a few footprints that sites which accept guest posts tend to have, such as:
- allintitle:guest post guidelines
- intitle:write for us
- “guest blogger”
And combine them with your target keywords.
Harvest your results. But this time, you want to delete duplicate domains. After all, you only need to see one published guest post or list of guest blogger guidelines to know that they accept guest posts.
Click “Remove/Filter” and choose “Remove Duplicate Domains.”
Check the PR. Because the PR of the guest post guidelines page doesn’t matter, choose the “Get Domain PageRank” option. This will show you the site’s homepage PR.
Now sort by PR and get crackin’!
Outbound Link Checker
You already know that PageRank is finite. And it’s a waste to work your tail off to land a backlink on a high PR page if it’s going to be surrounded by hundreds of others. Fortunately, using ScrapeBox, you can instantly find the number of outbound links of any page (or pages).
Click Addons ? Show available addons. Choose the ScrapeBox Outbound Link Checker.
Click “Install Addon”
If you have a list of domains loaded into ScrapeBox from a harvest you can use those. Open the program from the addon menu and click “Load List.” Click “Load from ScrapeBox.”
If you’d prefer, you can upload the list of URLs from a text file. Copy and paste your target pages into a text file. Then click “Load List” from the addon and “Load from File.”
When the URLs display in the addon, click “Start.”
And the addon will display the number of internal and external links.
If you want to maximize the link juice you get from each link you may want to limit your targets to pages with 50-100 or fewer external links. To do that, click the “Filter” button.
And choose your threshold:
And the addon will automatically delete any URLs with 100 or more external links.
Find and Help Malware Infected Sites
A labor-intensive, but effective, white hat link building strategy is to help webmasters with infected sites. Some site owners neglect their sites for months at a time — leaving them ripe for hackers. If you can swoop in and save the day, they’ll often be more than happy to reward you with a link. You can find dozens of niche-relevant infected sites using ScrapeBox.
There’s no footprint to use for malware infected sites. However, the CMS Pligg tends to have an unusual amount of infections. You can find Pligg sites using these two footprints:
- “Five character minimum”
Once the URLs are loaded up, install the Malware and Phishing Filter addon.
Start the addon, and choose “Load URLs from Harvester.”
The tool will show you if your list has any infected sites.
If you find any, do not visit the sites! They can (and will) infect your PC with malware.
Instead, choose “Save Bad URL’s to File”.
And save the list.
Were’ going to use another ScrapeBox addon to get the contact information of the infected site owners: ScrapeBox Whois scraper. This tool allows you to find the Whois information for the infected sites without having to actually visit them.
Once installed, open the addon. Load your file of infected sites.
Once finished you’ll see a list of names, emails, etc.
Save the file. Now put on your cape, reach out to the infected site owners and go save the day!
Local SEO Citation Reverse Engineering
If you do local SEO, already you know that citations are the lifeblood of your campaign. However, reverse engineering your competition’s local citations using OSE or other tools doesn’t always work. Why? Because NAP (Name, Address, and Phone Numbers) citations aren’t always backlinks and therefore don’t show up in link analysis tools. And without being able to reverse engineer, local community pages and directories almost impossible. But not with ScrapeBox.
For this example, let’s assume you’re trying to rank a dentist in Pawtucket, Rhode Island. First, conduct a local search in Google:
And visit the site of one of the top results.
Look for their address on the sidebar or contact us page.
And copy that address into the keyword area of ScrapeBox.
Important: Make sure the address is on a single line.
And put the street address in quotes (if you don’t, the search engines will sometimes return results that don’t have that exact street address on the page).
And add a few variations of the street name. This way, if the citation is listed as “ave.” instead of “avenue” or “rd.” instead of “road,” you’ll still find it.
Finally, you don’t want to see pages of the business you’re reverse engineering in the results. And if the site has their address listed in the sidebar or footer (as many local businesses do), you’ll find that your results are littered with hundreds of pages from that domain. You can avoid this by adding the -site: operator to your keyword search. This operator prevents any results from that site from showing up in your search.
Add this to the end of the keywords that you already entered into ScrapeBox.
Hit “Start harvesting.” And you should find tons of otherwise impossible-to-find citation targets:
White Hat Blog Commenting
While ScrapeBox is infamous for its automatic blog commenting feature, it is surprisingly useful for white hat blog commenting. You can use ScrapeBox to quickly find tons of authoritative, niche-targeted pages to drop manual blog comments on.
First, enter one of these example footprints (there are hundreds) for finding pages that allow blog comments into the ScrapeBox harvester:
- site:.edu “you must be logged in to comment” (This is a good one because sites that require login usually don’t get spammed to death.)
- site:.edu “post a comment”
- “post new comment”
Then, enter a few niche keywords into the keyword field.
Click “Start Harvesting.” When you get your results you should sort them by PR and delete any that fall below a certain threshold (for blog commenting. it’s best to sort by PAGE PR, not the homepage PR).
Sort by PR:
And delete any that don’t seem worthwhile.
If you have a large list and want to choose your targets carefully, you might also want to check the number of outbound links.
This time, load the list from ScrapeBox:
And filter out any that seem to have too many external links.
Now you can save the results and use that as your working list.