Advanced Data Research
The Advanced
Guide to SEO
Chapter 05
After the first 5 sections, you should have a rock solid website. But there's way more to SEO than speed, indexation and metadata.
We're going to begin our off-site SEO techniques with some ImportXML!
Intro To Import XML
What is ImportXML? Import XML is a way to retrieve information from file types such as html, xml, csv and more using xpath.
This can be incredibly useful for scraping and sourcing information off of websites, as it imports it right into a Google Docs Spreadsheet, and you can also run some advanced searches to scrape information that would otherwise be hard to collect.
I'm going to walk you through a few example uses of ImportXML
Basic Syntax
ImportXML is just like any other Excel or Google Doc formula - it uses a pretty straightforward syntax;
=importXML(URL, Query)
- URL
- the url you will be scraping
- Query
- the xpath query to run on the url
Basic Example
Scraping Quicksprout for H1 Tags
-
Create a new Google Doc Spreadsheet
-
Set up your URL
-
Create a basic xpath function to grab the H1 of the page. (Obviously we could do this via Screaming Frog or otherwise crawling the site — but we're just using this as a simple example)
-
Add the importxml function to cell B2
Note that we're referencing cell A2, where the URL is.
The query gets wrapped in quotes.
Then the xpath defines what portion of the file should be returned. //h1 tells it to return the contents of every h1 on the page (this is what the "//" part does - asks for every occurrence of the h1 "path" no matter how many levels deep or how its nested).
Here's what it returns;
Cool! We've got the H1 of the post pulled right into Google Docs. So let's get into some useful examples of importxml for Google Docs.
ImportXML — Quora/Twitter
We're going to use Quora again to source users who may be influential or authoritative for their Twitter URLs.
Here's the final product, so you can see what we'll be building:
Find a Group or Topic
Let's use the blogging topic in this example - http://www.quora.com/Blogging/followers
Enter The Quora URL In Column A
We're going to be referencing this cell in the function that's going in Column B.
Create the importxml Function To Scrape Usernames
the function is:
=importxml(A2, "//h2/a/@href")
Let's break that down, so you understand and can create your own.
- =importxml()
- this is the empty function
- A2)
- this is the "URL" field which references the Quora URL
- //h2)
- this references every h2 from that URL
- /a)
- this references a tags nested within the h2's
- /@href)
- finally, this references only the links contained in the anchor tags
As you can see, this returns a list of the top 20 users from the Blogging Topic.
Create Full URLs
As you may have noticed, Quora links with relative URLs, so we need to convert them to absolutes.
A simple concatenation function will do the trick;
In case you're not sure, the concatenate function is this;
=CONCATENATE("http://quora.com",B2)
Let's break that down as well
- =CONCATENATE()
- The empty concatenate function (combines multiple strings of text into one)
- "http://quora.com"
- The beginning of the Quora URL (anything not referencing another cell needs to be in quotes)
- B2
- References the cell with the incomplete user URL
Once you do that you need to grab and drag the formula down the rest of the columns;
Scrape For Twitter URLs
Now for the last step, let's get those Twitter URLs!
Here's the function. Its a long one so we'll break it out piece by piece;
=ImportXML(C2,"//div[contains(@class,'profile_action_links_section')]
//a[contains(@href,'twitter.com')]/@href")
- =importxml()
- The empty importxml function
- C2
- The cell of the complete Quora profile URL we're referencing
- //div
- Referencing any div tag in the HTML
- [contains()]
- Contains will allow us to narrow down the div tag
- (@class,'profile_action
_links_section') - Here we're selecting class element
A screenshot of Quora's code shows it in the HTML
- //a[contains(@href,'twitter.com')]
- Select anchor text which includes a link to Twitter
- /@href
- Do the actual scrape of the link within the anchor text
Don't forget to grab and drag the formula down through the rest of the columns;
And now you can instantly get lists of 20 Twitter users at a time! This being a technical guide (a "How-To") it's of course your decision how you can use such a list, but I'm sure you can think of many applications :-)
Scraping Ubersuggest for Keyword Ideas
-
Create a new Google Doc spreadsheet
Click Create > Spreadsheet
-
In cell A1, type in something that you want to query Ubersuggest for
In this example, we typed in "how to ..." to start the query.
-
In cell A2, type in the following formula and press enter
=ImportXML("http://ubersuggest.org/?query="&A1& "&format=html&language=English%2FUSA&source=web&submit=Suggest", "//li/span")
The spreadsheet will fill up with Ubersuggest's answers:
Finding any HTML in a List of Web Pages
I'm going to show you a really fast way to prospect 100 (or more) sites at one time. You can do this with expensive link prospecting plans. But if you're on a budget or want to minimize your tools, this is a fantastic method, just as easy. And fun!
This works when you're looking for HTML in a list of documents, that is not part of the content — its code. In this example we're going to look for the presence of a ‘rel=author' tag — because this means two things. One, the website owner is likely to be "on top of things" from a marketing standpoint if they have taken the time to set this up. Second, they (or someone helping) must have some amount of technical skill — so they may be an easier prospect to work with.
We're going to follow a few step process;
Google Searches
The type of results you're trying to get, is a list of possible sites you could get a link from. Let's say you're a food blogger, and you want to find other blogs to guest post on. You might do a search like;
- food inurl:blog intitle:submit post
- food inurl:blog intitle:contribute post
or you might get more specific with keywords;
- gourmet food inurl:blog intitle:submit post
- eclectic desserts inurl:blog intitle:submit post
When you nail down a good search, you should see a number of potential sites in the results — as well as not too many results. For example;
The above is an excellent example of a query to start with.
Scrape the URLs from the Google Results
We need to get all of those Google results into a text document to prep for running through Screaming Frog.
-
To prep for scraping, set Google to return 100 results per page.
-
Go to search settings
-
Set to 100 results per page
-
Go back to Google and run the results again.
-
Then use the SERP redux bookmarklet — click the link.
-
Copy the list of URLs on to your clipboard
-
Paste them into your text editor
-
Save as a .txt file.
-
Filter the URLs through Screaming Frog
-
Set Screaming Frog to list mode.
-
Select your text file and open
-
Go To Custom Settings
-
Enter HTML to filter
The way we're doing this part is key.
We want four lists created;
- Contains rel=author
- Doesn't contain rel=author
- Contains rel=me
- Doesn't contain rel=me
-
View Your Custom Results
Click the ‘custom' tab - and then you can select filters 1-4
In this case, this particular list only found one rel=author blog. But that's ok! That's actually good. Imagine having to sift manually through all of those results to find the one with authorship? Now you have one much more targeted prospect - and you can easily get many more by running through this process.
Use Citation Finder To Find Link Opportunities
This section covers using a tool that is paid. There is a free version, but it does not have all the features. You can probably try some of the things in this section with the free version though. I am not affiliated with the tool in any way.
Before we start, go to - https://www.whitespark.ca/local-citation-finder/ Register for your free or paid account.
Search
(without a project)
-
Go to the first tab and enter your info in the fields
-
You have to wait a few minutes:
-
You should receive an email alert though when your report is ready:
-
Next, you're going to see a report like this. Click ‘compare citations for these businesses'
-
Then you should export as a csv
-
You can open up and save as an Excel file — and we're going to customize it a little so you can easily see WHO has the most citations.
>We're going to use a little excel formula
=COUNTIF(A2:A111,"*Y*")like this (assuming you're in column B):
And of course you can autofilter to see just the Y's or N's
Part one is great for general prospecting, but what if your business isn't included in the report? The you can use the "search by phone number" feature.
-
You can use your phone number OR your business name (title is a little deceptive, although phone number works best).
Enter your info, and we're also going to add this to a project:
-
Make sure you have a project created
-
The report you see will tell you all citation sources not tied to any keyword. Its just a raw list.
-
Click on the little plus to see all pages with the citation (usually meaning a phone number)
-
And you have a few more options when it comes to exporting the data:
- Re-run and append
- runs the same report again, except adds in any NEW results that weren't there before.
- Export CSV
- Exports the data, but without individual URLs, just the name of the website
- Export CSV w/URLs
- includes the URLs (what you see when clicking the plus signs) in the full report.
Harvesting email addresses
For this section of advanced scraping we're going to use the Citation Labs Contact Finder — http://citationlabs.com/tools/ — you should register to create an account before we begin.
This tool is amazing if you have a list of prospective URLs — you can then quickly gather most of the email addresses needed for outreach.
Gather URLs
I'm going to assume you'll either have a list of prospect already, or (with the help of this guide!) you'll know how to get a list quickly.
For this example, I'm going to take a list of scraped Google URLs — let's say I was a food blogger and wanted to submit recipes. I might use a search like:
recipe inurl:submit
Once you have your search, navigate to the contact finder
Then fill out the form
You can experiment with regular expressions (regex) to fine tune your results. This expression;
^(Contact|About|Email|Submit)
will look for results that begin with the words contact, about, email or submit.
I also do not limit anchor text to number of words.
Click on the contact tab to get your results (you might have to wait a few minutes for processing).
As you can see, there are a few types of results;
- Emails
- email addresses found
- Forms
- form submissions found
- Contact Pages
- pages with contact information but an email addresses was not findable
- Empty
- no results of any kind
You then have the option of downloading any report, or all, into a CSV
As you can see in the above report, out of 100 URLs, it captured
- 38 emails
- 47 form URLs
- 7 contact pages
- only 8 empty
Social Listening: Advanced Listening to Twitter
Before we set up your searches in some different tools, the first step is to develop lists of advanced searches to follow.
Let's say your main topic is interior decorating — you'd want to create a list of as many of those variations as possible — much of this won't look different than keyword research;
- interiordecorating
- interior decorating
- #interiordecorating
- interiordesign
- interior design
- #interiordesign
These are your core words. Then you can have a list of words to gauge intent. Like;
- need
- help
- trouble
- looking for
- tips
- question
And if you're looking to target anything location based;
- Los Angeles
- CA
- California
- LA
Don't forget some of your brand (mine would be);
- Quicksprout
- neil patel
- kissmetrics
- kiss metrics
- crazy egg
- crazyegg
- I'm kind of a big deal
These keywords and search combinations will give you any mention of these keywords by anyone. More on specific user monitoring below.
You can create and test your own here https://twitter.com/#!/search-advanced
A good search may have a few good results in the last 24-48 hours.
Create an IFTTT Recipe
Next, when you find your searches you want to monitor, you can create and IFTTT recipe to watch for them. The beauty of IFTTT is that you can receive your alert across a few dozen different platforms. We're going to set it to send you an email or a text message when an alert is triggered.
NOTE: These work great for less frequent results.
Create an account (it's free) and make a new recipe
Use Twitter as the "trigger"
Fill out the search field — if it's just a simple search you can use plain text. But you may need to use advanced operators.
Select either email or Gmail as the "Action" channel.
Fill out the fields and customize as needed
If you want to receive a text message And set your fields
And wait for the emails to come in!
Bonus: Set Up Email Filters
Take your listening to the next level with some Gmail filtering. Create a filter to get all your alerts sent to a folder:
Advanced Twitter Search Syntax
Fortunately, if you use Twitter's advanced search creator, it will come up with the search for you:
Just go to: https://twitter.com/#!/search-advanced and run the search - the results will include the search syntax with operators etc in the results:
Twitter for Influencer Listening
Let's continue with the interior design niche. Let's say you want to connect with more interior designers who are also bloggers. You'll want to know when they need help with something.
First — find people who you can listen to with a tool like followerwonk http://followerwonk.com
Then, create an advanced search for when she mentions something you can help with. Maybe you're a computer guru. You could do a search like this:
She may only tweet about that once a year. But if you're trying to connect with really high authority people, it will be worth creating an IFTTT recipe to know when she needs help via text message;
Again, twitter gives you the syntax for the search when you run it via the Twitter search page
Track With Monitter
Monitter is a great free tool to listen for tweets that contain a certain keyword in large volume. Think of it as a live Twitter monitor.
Go to http://monitter.com and create an account
Start creating some columns with your search terms
Here we've added four streams for four different interior design type searches:
Then, you can set advanced settings to track tweets online from a certain geographic location
When you spot a tweet to respond to, you can do so right within Monitter
More Twitter Tools
There are dozens of other tools to monitor Twitter
- http://ifttt.com
- connect multiple online platforms together to automate things
- http://monitter.com/
- set up multiple columns and track twitter searches live
- http://tweetmeme.com/
- view popular articles being shared
- http://trendsmap.com/
- to see what's trending in particular locations - nice visual setup
- http://tweetbeep.com/
- get all mentions of your brand, you or anything else emailed to you (like what IFTTT can do)
- http://www.ubervu.com/
- (paid tool)
Browser Plugins
Browser plugins can greatly speed up your work flow and efficiency. I'm going to show you some plugins for Google Chrome, and a little bit about how to use them in more advanced ways.
This section of browser plugins revolves around the ones that help optimize your sites accessibility and indexation.
First, here's the list.
- Broken Link Checker https://chrome.google.com/webstore/detail/ojkcdipcgfaekbeaelaapakgnjflfglf
- Web Developer http://chrispederick.com/work/web-developer/
- Redirect Path Checker https://chrome.google.com/webstore/detail/aomidfkchockcldhbkggjokdkkebmdll
- SEOmoz Toolbar https://chrome.google.com/webstore/detail/eakacpaijcpapndcfffdgphdiccmpknp
- Chrome Sniffer https://chrome.google.com/webstore/detail/homgcnaoacgigpkkljjjekpignblkeae
- Google Analytics Debugger https://chrome.google.com/webstore/detail/jnkmfdileelhofjcijamephohjechhna
- Microformats for Chrome https://chrome.google.com/webstore/detail/oalbifknmclbnmjlljdemhjjlkmppjjl
- Rulers Guides and Eyedropper Color Picker https://chrome.google.com/webstore/detail/bjpngjgkahhflejneemihpbnfdoafoeh
- Word Count https://chrome.google.com/webstore/detail/kmndjoipobjfjbhocpoeejjimchnbjje
- Source Kit https://chrome.google.com/webstore/detail/iieeldjdihkpoapgipfkeoddjckopgjg?hl=en-US
I'm going to show you how to use some of these in an advanced way.
Broken Links Checker
Not only is the broken links checker a great plugin to find broken links quickly on your site, but you can use it in creative ways on other people's sites to get ideas for linkbuilding and prospecting.
For example, try running it on the sitemap of a competitor's website. Here's how:
-
Find a competitor with an HTML sitemap. For this example I'm going to randomly use www.bizchair.com and their sitemap is http://www.bizchair.com/site-map.html
-
Run the Link Checker
Click the icon for the extension
Wait for it to find the broken links — in this case there are quite a few.
A great one to immediately notice is the "resources" page. Its often easier to recreate resource content or otherwise use it to get some links.
Chrome Sniffer
This plugin automatically shows you the CMS or script library a website uses. Extremely handy if you are looking to reach out to only WordPress site owners, for example.
As you browse the web, the icon to the far right of the URL will change to match which CMS or library is being used.
For example, you can see that my site is built on WordPress
Here is a site built with Drupal
Redirect Path Checker
This plugin will automatically alert you if you were taken to a page via any kind of redirect. Can be very useful when browsing your site, in the case that you are internally linking to outdated URLs (or externally for that matter)
For example, I just found on my site this link to Gizmodo 302 redirects:
How did I know? Because the plugin alerted me to the 302.
And then you can click on the icon and it will show you the redirect (or series of redirects) that the browser took to get to a page.
The SEOmoz Toolbar & Plugin
You can do many things with the Moz plugin. A few of the more advanced things you might use it to look for are:
Quickly finding followed vs nofollowed links
Or finding the country and IP address for the website
Using a Proxy
What is a proxy and why would you want to use one?
A proxy acts like the middle man between you and other servers. In other words, they make you anonymous on the web. You appear to be using the IP address of the proxy, and not yours. And this is perfect for rank checking if you use local software like Rank Tracker. Run too many automated Google searches to check rankings from your location, and you run the risk of sending a red flag to Google. Note that some people use proxies for less than ethical means, and I do not recommend doing so. But it is a fantastic way to check your rankings without sending unusual activity to Google from your IP address.
So how to you use a proxy? I have a simple but little known method for you to find and check dozens of free proxy addresses all at once.
Go to http://www.rosinstrument.com/proxy/
You will see a list of free public proxy IP addresses. These change often, so be sure to refresh your browser if you have had the window open for a while.
Copy and paste proxies into Scrapebox to Test Them
This is the magic step! Since proxy addresses go bad so quickly and often, its a huge waste of time to try them each all individually.
Hit "Manage", then "Test". After a few minutes, your proxies will have been tested. Keep following the steps, and you'll have a clean list of dozens of proxies to choose from.
Return Good Proxies Back To Main List
Select "Transfer Good Proxies to Main List" under "Export". You will then be left with a clean list of working proxies.
Copy Proxy Address Into Your Rank Checking Software
In Rank Tracker, you can enter the proxy address like this:
Since the addresses to go bad after a while, you may want to retest your list and/or cut and paste more from http://www.rosinstrument.com/proxy/
Bonus:
Want an alternative to a proxy?
The above method is free, which is the best part, but if you want something more robust? You can get a "Virtual Private Server" (VPS). Most web hosting companies offer this. Its like having your own private dedicated IP address. It may be a small monthly fee, but as something more robust than public proxies, it may be worth it for you!
I'd be surprised if you're not an extreme data collection expert now! But we're not done! On to some less-traveled paths to keyword research.