Advanced Data Research

The Advanced
Guide to SEO

Chapter 05

By Neil Patel
and
Sujan Patel

After the first 5 sections, you should have a rock solid website. But there's way more to SEO than speed, indexation and metadata.

We're going to begin our off-site SEO techniques with some ImportXML!

Intro To Import XML

What is ImportXML? Import XML is a way to retrieve information from file types such as html, xml, csv and more using xpath.

This can be incredibly useful for scraping and sourcing information off of websites, as it imports it right into a Google Docs Spreadsheet, and you can also run some advanced searches to scrape information that would otherwise be hard to collect.

I'm going to walk you through a few example uses of ImportXML

Basic Syntax

ImportXML is just like any other Excel or Google Doc formula - it uses a pretty straightforward syntax;

=importXML(URL, Query)

URL
the url you will be scraping
Query
the xpath query to run on the url

Basic Example

Scraping Quicksprout for H1 Tags

  1. Create a new Google Doc Spreadsheet

    Intro to Import XML #1
  2. Set up your URL

    Intro to Import XML #2
  3. Create a basic xpath function to grab the H1 of the page. (Obviously we could do this via Screaming Frog or otherwise crawling the site — but we're just using this as a simple example)

  4. Add the importxml function to cell B2

    Intro to Import XML #3

Note that we're referencing cell A2, where the URL is.

The query gets wrapped in quotes.

Then the xpath defines what portion of the file should be returned. //h1 tells it to return the contents of every h1 on the page (this is what the "//" part does - asks for every occurrence of the h1 "path" no matter how many levels deep or how its nested).

Here's what it returns;

Intro to Import XML #4

Cool! We've got the H1 of the post pulled right into Google Docs. So let's get into some useful examples of importxml for Google Docs.

ImportXML — Quora/Twitter

We're going to use Quora again to source users who may be influential or authoritative for their Twitter URLs.

Here's the final product, so you can see what we'll be building:

ImportXML - Quora/Twitter #1

Find a Group or Topic

Let's use the blogging topic in this example - http://www.quora.com/Blogging/followers

ImportXML - Quora/Twitter #2

Enter The Quora URL In Column A

We're going to be referencing this cell in the function that's going in Column B.

ImportXML - Quora/Twitter #3

Create the importxml Function To Scrape Usernames

ImportXML - Quora/Twitter #4

the function is:

=importxml(A2, "//h2/a/@href")

Let's break that down, so you understand and can create your own.

=importxml()
this is the empty function
A2)
this is the "URL" field which references the Quora URL
//h2)
this references every h2 from that URL
/a)
this references a tags nested within the h2's
/@href)
finally, this references only the links contained in the anchor tags

As you can see, this returns a list of the top 20 users from the Blogging Topic.

ImportXML - Quora/Twitter #5

Create Full URLs

As you may have noticed, Quora links with relative URLs, so we need to convert them to absolutes.

A simple concatenation function will do the trick;

ImportXML - Quora/Twitter #6

In case you're not sure, the concatenate function is this;

=CONCATENATE("http://quora.com",B2)

Let's break that down as well

=CONCATENATE()
The empty concatenate function (combines multiple strings of text into one)
"http://quora.com"
The beginning of the Quora URL (anything not referencing another cell needs to be in quotes)
B2
References the cell with the incomplete user URL

Once you do that you need to grab and drag the formula down the rest of the columns;

ImportXML - Quora/Twitter #7

Scrape For Twitter URLs

Now for the last step, let's get those Twitter URLs!

ImportXML - Quora/Twitter #8

Here's the function. Its a long one so we'll break it out piece by piece;

=ImportXML(C2,"//div[contains(@class,'profile_action_links_section')]
//a[contains(@href,'twitter.com')]/@href")

=importxml()
The empty importxml function
C2
The cell of the complete Quora profile URL we're referencing
//div
Referencing any div tag in the HTML
[contains()]
Contains will allow us to narrow down the div tag
(@class,'profile_action
_links_section')
Here we're selecting class element

A screenshot of Quora's code shows it in the HTML

ImportXML - Quora/Twitter #9
//a[contains(@href,'twitter.com')]
Select anchor text which includes a link to Twitter
/@href
Do the actual scrape of the link within the anchor text

Don't forget to grab and drag the formula down through the rest of the columns;

And now you can instantly get lists of 20 Twitter users at a time! This being a technical guide (a "How-To") it's of course your decision how you can use such a list, but I'm sure you can think of many applications :-)

ImportXML - Quora/Twitter #10

Scraping Ubersuggest for Keyword Ideas

  1. Create a new Google Doc spreadsheet

    Click Create > Spreadsheet

    Scraping Ubersuggest for Keyword Ideas #1
  2. In cell A1, type in something that you want to query Ubersuggest for

    In this example, we typed in "how to ..." to start the query.

    Scraping Ubersuggest for Keyword Ideas #2
  3. In cell A2, type in the following formula and press enter

    =ImportXML("http://ubersuggest.org/?query="&A1& "&format=html&language=English%2FUSA&source=web&submit=Suggest", "//li/span")

    Scraping Ubersuggest for Keyword Ideas #3

    The spreadsheet will fill up with Ubersuggest's answers:

    Scraping Ubersuggest for Keyword Ideas #4

Finding any HTML in a List of Web Pages

I'm going to show you a really fast way to prospect 100 (or more) sites at one time. You can do this with expensive link prospecting plans. But if you're on a budget or want to minimize your tools, this is a fantastic method, just as easy. And fun!

This works when you're looking for HTML in a list of documents, that is not part of the content — its code. In this example we're going to look for the presence of a ‘rel=author' tag — because this means two things. One, the website owner is likely to be "on top of things" from a marketing standpoint if they have taken the time to set this up. Second, they (or someone helping) must have some amount of technical skill — so they may be an easier prospect to work with.

Google Searches

The type of results you're trying to get, is a list of possible sites you could get a link from. Let's say you're a food blogger, and you want to find other blogs to guest post on. You might do a search like;

  • food inurl:blog intitle:submit post
  • food inurl:blog intitle:contribute post

or you might get more specific with keywords;

  • gourmet food inurl:blog intitle:submit post
  • eclectic desserts inurl:blog intitle:submit post

When you nail down a good search, you should see a number of potential sites in the results — as well as not too many results. For example;

Finding any HTML in a List of Web Pages #1

The above is an excellent example of a query to start with.

Scrape the URLs from the Google Results

We need to get all of those Google results into a text document to prep for running through Screaming Frog.

  1. To prep for scraping, set Google to return 100 results per page.

    1. Go to search settings

    2. Set to 100 results per page

      Finding any HTML in a List of Web Pages #2
    3. Go back to Google and run the results again.

    4. Then use the SERP redux bookmarklet — click the link.

      Finding any HTML in a List of Web Pages #3
    5. Copy the list of URLs on to your clipboard

      Finding any HTML in a List of Web Pages #4
    6. Paste them into your text editor

      Finding any HTML in a List of Web Pages #5
    7. Save as a .txt file.

Filter the URLs through Screaming Frog

  1. Set Screaming Frog to list mode.

    Finding any HTML in a List of Web Pages #6
  2. Select your text file and open

    Finding any HTML in a List of Web Pages #7
  3. Go To Custom Settings

    Finding any HTML in a List of Web Pages #8
  4. Enter HTML to filter

    The way we're doing this part is key.

    We want four lists created;

    1. Contains rel=author
    2. Doesn't contain rel=author
    3. Contains rel=me
    4. Doesn't contain rel=me
    Finding any HTML in a List of Web Pages #9
  5. View Your Custom Results

    Click the ‘custom' tab - and then you can select filters 1-4

    Finding any HTML in a List of Web Pages #10

In this case, this particular list only found one rel=author blog. But that's ok! That's actually good. Imagine having to sift manually through all of those results to find the one with authorship? Now you have one much more targeted prospect - and you can easily get many more by running through this process.

Use Citation Finder To Find Link Opportunities

This section covers using a tool that is paid. There is a free version, but it does not have all the features. You can probably try some of the things in this section with the free version though. I am not affiliated with the tool in any way.

Before we start, go to - https://www.whitespark.ca/local-citation-finder/ Register for your free or paid account.

Search
(without a project)

  1. Go to the first tab and enter your info in the fields

    Citation Finder To Gind Link Opportunities #1
  2. You have to wait a few minutes:

    Citation Finder To Gind Link Opportunities #2
  3. You should receive an email alert though when your report is ready:

    Citation Finder To Gind Link Opportunities #3
  4. Next, you're going to see a report like this. Click ‘compare citations for these businesses'

    Citation Finder To Gind Link Opportunities #4
  5. Then you should export as a csv

    Citation Finder To Gind Link Opportunities #5
  6. You can open up and save as an Excel file — and we're going to customize it a little so you can easily see WHO has the most citations.

    >We're going to use a little excel formula

    =COUNTIF(A2:A111,"*Y*")

    like this (assuming you're in column B):

    Citation Finder To Gind Link Opportunities #6

    And of course you can autofilter to see just the Y's or N's

    Citation Finder To Gind Link Opportunities #7

Part one is great for general prospecting, but what if your business isn't included in the report? The you can use the "search by phone number" feature.

  1. You can use your phone number OR your business name (title is a little deceptive, although phone number works best).

    Enter your info, and we're also going to add this to a project:

    Citation Finder To Gind Link Opportunities #8
  2. Make sure you have a project created

    Citation Finder To Gind Link Opportunities #9
  3. The report you see will tell you all citation sources not tied to any keyword. Its just a raw list.

    Citation Finder To Gind Link Opportunities #10
  4. Click on the little plus to see all pages with the citation (usually meaning a phone number)

    Citation Finder To Gind Link Opportunities #11
  5. And you have a few more options when it comes to exporting the data:

    Citation Finder To Gind Link Opportunities #12
    Re-run and append
    runs the same report again, except adds in any NEW results that weren't there before.
    Export CSV
    Exports the data, but without individual URLs, just the name of the website
    Export CSV w/URLs
    includes the URLs (what you see when clicking the plus signs) in the full report.

Harvesting email addresses

For this section of advanced scraping we're going to use the Citation Labs Contact Finder — http://citationlabs.com/tools/ — you should register to create an account before we begin.

This tool is amazing if you have a list of prospective URLs — you can then quickly gather most of the email addresses needed for outreach.

Gather URLs

I'm going to assume you'll either have a list of prospect already, or (with the help of this guide!) you'll know how to get a list quickly.

For this example, I'm going to take a list of scraped Google URLs — let's say I was a food blogger and wanted to submit recipes. I might use a search like:

recipe inurl:submit

Harvesting email address #1

Once you have your search, navigate to the contact finder

Harvesting email address #2

Then fill out the form

Harvesting email address #3

You can experiment with regular expressions (regex) to fine tune your results. This expression;

^(Contact|About|Email|Submit)

will look for results that begin with the words contact, about, email or submit.

I also do not limit anchor text to number of words.

Click on the contact tab to get your results (you might have to wait a few minutes for processing).

Harvesting email address #4

As you can see, there are a few types of results;

Emails
email addresses found
Forms
form submissions found
Contact Pages
pages with contact information but an email addresses was not findable
Empty
no results of any kind

You then have the option of downloading any report, or all, into a CSV

Harvesting email address #5

As you can see in the above report, out of 100 URLs, it captured

  • 38 emails
  • 47 form URLs
  • 7 contact pages
  • only 8 empty

Social Listening: Advanced Listening to Twitter

Before we set up your searches in some different tools, the first step is to develop lists of advanced searches to follow.

Let's say your main topic is interior decorating — you'd want to create a list of as many of those variations as possible — much of this won't look different than keyword research;

  • interiordecorating
  • interior decorating
  • #interiordecorating
  • interiordesign
  • interior design
  • #interiordesign

These are your core words. Then you can have a list of words to gauge intent. Like;

  • need
  • help
  • trouble
  • looking for
  • tips
  • question

And if you're looking to target anything location based;

  • Los Angeles
  • CA
  • California
  • LA

Don't forget some of your brand (mine would be);

  • Quicksprout
  • neil patel
  • kissmetrics
  • kiss metrics
  • crazy egg
  • crazyegg
  • I'm kind of a big deal

These keywords and search combinations will give you any mention of these keywords by anyone. More on specific user monitoring below.

You can create and test your own here https://twitter.com/#!/search-advanced

Social Listening: Advanced Listening to Twitter #1

A good search may have a few good results in the last 24-48 hours.

Social Listening: Advanced Listening to Twitter #2

Create an IFTTT Recipe

Next, when you find your searches you want to monitor, you can create and IFTTT recipe to watch for them. The beauty of IFTTT is that you can receive your alert across a few dozen different platforms. We're going to set it to send you an email or a text message when an alert is triggered.

NOTE: These work great for less frequent results.

Create an account (it's free) and make a new recipe

Social Listening: Advanced Listening to Twitter #3

Use Twitter as the "trigger"

Social Listening: Advanced Listening to Twitter #4

Fill out the search field — if it's just a simple search you can use plain text. But you may need to use advanced operators.

Social Listening: Advanced Listening to Twitter #5

Select either email or Gmail as the "Action" channel.

Social Listening: Advanced Listening to Twitter #6

Fill out the fields and customize as needed

Social Listening: Advanced Listening to Twitter #7

If you want to receive a text message And set your fields

Social Listening: Advanced Listening to Twitter #8

And wait for the emails to come in!

Social Listening: Advanced Listening to Twitter #9

Bonus: Set Up Email Filters

Take your listening to the next level with some Gmail filtering. Create a filter to get all your alerts sent to a folder:

Social Listening: Advanced Listening to Twitter #10

Advanced Twitter Search Syntax

Fortunately, if you use Twitter's advanced search creator, it will come up with the search for you:

Just go to: https://twitter.com/#!/search-advanced and run the search - the results will include the search syntax with operators etc in the results:

Social Listening: Advanced Listening to Twitter #11

Twitter for Influencer Listening

Let's continue with the interior design niche. Let's say you want to connect with more interior designers who are also bloggers. You'll want to know when they need help with something.

First — find people who you can listen to with a tool like followerwonk http://followerwonk.com

Then, create an advanced search for when she mentions something you can help with. Maybe you're a computer guru. You could do a search like this:

Social Listening: Advanced Listening to Twitter #12

She may only tweet about that once a year. But if you're trying to connect with really high authority people, it will be worth creating an IFTTT recipe to know when she needs help via text message;

Again, twitter gives you the syntax for the search when you run it via the Twitter search page

Social Listening: Advanced Listening to Twitter #13

Track With Monitter

Monitter is a great free tool to listen for tweets that contain a certain keyword in large volume. Think of it as a live Twitter monitor.

Go to http://monitter.com and create an account

Start creating some columns with your search terms

Social Listening: Advanced Listening to Twitter #14

Here we've added four streams for four different interior design type searches:

Social Listening: Advanced Listening to Twitter #15

Then, you can set advanced settings to track tweets online from a certain geographic location

Social Listening: Advanced Listening to Twitter #16

When you spot a tweet to respond to, you can do so right within Monitter

Social Listening: Advanced Listening to Twitter #17

More Twitter Tools

There are dozens of other tools to monitor Twitter

http://ifttt.com
connect multiple online platforms together to automate things
http://monitter.com/
set up multiple columns and track twitter searches live
http://tweetmeme.com/
view popular articles being shared
http://trendsmap.com/
to see what's trending in particular locations - nice visual setup
http://tweetbeep.com/
get all mentions of your brand, you or anything else emailed to you (like what IFTTT can do)
http://www.ubervu.com/
(paid tool)

Browser Plugins

Browser plugins can greatly speed up your work flow and efficiency. I'm going to show you some plugins for Google Chrome, and a little bit about how to use them in more advanced ways.

This section of browser plugins revolves around the ones that help optimize your sites accessibility and indexation.

First, here's the list.

I'm going to show you how to use some of these in an advanced way.

Broken Links Checker

Not only is the broken links checker a great plugin to find broken links quickly on your site, but you can use it in creative ways on other people's sites to get ideas for linkbuilding and prospecting.

For example, try running it on the sitemap of a competitor's website. Here's how:

  1. Find a competitor with an HTML sitemap. For this example I'm going to randomly use www.bizchair.com and their sitemap is http://www.bizchair.com/site-map.html

  2. Run the Link Checker

    Click the icon for the extension

    Browser Plugins #1

    Wait for it to find the broken links — in this case there are quite a few.

    Browser Plugins #2

    A great one to immediately notice is the "resources" page. Its often easier to recreate resource content or otherwise use it to get some links.

Chrome Sniffer

This plugin automatically shows you the CMS or script library a website uses. Extremely handy if you are looking to reach out to only WordPress site owners, for example.

As you browse the web, the icon to the far right of the URL will change to match which CMS or library is being used.

For example, you can see that my site is built on WordPress

Browser Plugins #3

Here is a site built with Drupal

Browser Plugins #4

Redirect Path Checker

This plugin will automatically alert you if you were taken to a page via any kind of redirect. Can be very useful when browsing your site, in the case that you are internally linking to outdated URLs (or externally for that matter)

For example, I just found on my site this link to Gizmodo 302 redirects:

Browser Plugins #5

How did I know? Because the plugin alerted me to the 302.

Browser Plugins #6

And then you can click on the icon and it will show you the redirect (or series of redirects) that the browser took to get to a page.

Browser Plugins #7

The SEOmoz Toolbar & Plugin

You can do many things with the Moz plugin. A few of the more advanced things you might use it to look for are:

Quickly finding followed vs nofollowed links

Browser Plugins #8

Or finding the country and IP address for the website

Browser Plugins #9

Using a Proxy

What is a proxy and why would you want to use one?

A proxy acts like the middle man between you and other servers. In other words, they make you anonymous on the web. You appear to be using the IP address of the proxy, and not yours. And this is perfect for rank checking if you use local software like Rank Tracker. Run too many automated Google searches to check rankings from your location, and you run the risk of sending a red flag to Google. Note that some people use proxies for less than ethical means, and I do not recommend doing so. But it is a fantastic way to check your rankings without sending unusual activity to Google from your IP address.

So how to you use a proxy? I have a simple but little known method for you to find and check dozens of free proxy addresses all at once.

Go to http://www.rosinstrument.com/proxy/

You will see a list of free public proxy IP addresses. These change often, so be sure to refresh your browser if you have had the window open for a while.

Using a Proxy #1

Copy and paste proxies into Scrapebox to Test Them

This is the magic step! Since proxy addresses go bad so quickly and often, its a huge waste of time to try them each all individually.

Hit "Manage", then "Test". After a few minutes, your proxies will have been tested. Keep following the steps, and you'll have a clean list of dozens of proxies to choose from.

Using a Proxy #2

Return Good Proxies Back To Main List

Select "Transfer Good Proxies to Main List" under "Export". You will then be left with a clean list of working proxies.

Using a Proxy #3

Copy Proxy Address Into Your Rank Checking Software

In Rank Tracker, you can enter the proxy address like this:

Using a Proxy #4

Since the addresses to go bad after a while, you may want to retest your list and/or cut and paste more from http://www.rosinstrument.com/proxy/

Bonus:
Want an alternative to a proxy?

The above method is free, which is the best part, but if you want something more robust? You can get a "Virtual Private Server" (VPS). Most web hosting companies offer this. Its like having your own private dedicated IP address. It may be a small monthly fee, but as something more robust than public proxies, it may be worth it for you!

I'd be surprised if you're not an extreme data collection expert now! But we're not done! On to some less-traveled paths to keyword research.

Well done! You made it through chapter five! Are you ready for chapter six:
Key Word Research?