Do you think that every SEO should know some scripting language? It is mandatory for Technical SEOs to have a basic knowledge of some scripting or…just play for fun and see what you can automate. Check out this Python SEO guide for your first automation steps (code included to try this on your own :)).
Writing the scenarios for scripts, discovering use cases and having them delivered to you can open a world of opportunities to automate your repetitive daily stuff.
“Delegating” some of your daily SEO work to your scripts can help you to quickly do work for your company, clients and get things in place faster so you can start getting results!
Python is a language of choice for scripts and its capabilities go beyond basic scripts, you can use it for both repetitive tasks (like comparing canonical URLs from a big file to another, checking your website content) to doing competitive analysis (collection and comparisons of key metrics for a given set of competing URLs).
Remember that if you started being excited about automation as much as I do, just relax, it comes down to figuring out which repetitive parts of the work you already do can be automated.
This is half of the work done!
What do you need to get started with Python?
So far you might have been convinced that Python can help you to automate some of your daily work or you might want to use Python for SEO or data analysis.
Every single journey starts with a single step. Let’s install Python together and go through the steps which you need to follow in order to successfully run the script.
- Go to Python official website and download Python 3.5
This is not a rocket science, just install it as you would install a regular Windows program. Your Python exe should be in your C hard disk.
- Go to your hard disk, find your Python folder and create an empty scripts folder. You can just name it Scripts or whatever you want.
- Download the meta-data zip and unzip the files in the Scripts folder or the respective folder that you have created.
- Note that this script will work without simulating any browser experience.
However, I would recommend you to download Chromedriver https://sites.google.com/a/chromium.org/chromedriver/ and place it in the same folder where you have your script..
5. .Command Line
Open the command prompt by going to your start windows menu and typing cmd
6. Alternatively, just go to your Python folder where your script is placed in and type CMD.
As long as you the screen below, you are good to go:
Tip for the newbie: I would recommend you to hit cmd from the folder where a script is placed in. It is just easier. Remember that if you don’t this, you would need to change your folder later in the Command Line and it could get tricky. It is a good practice to go to the folder with your script in it, hit CMD and stay calm 🙂
Script in action
The present example assumes that you have either attended my presentation at BrightonSEO 🙂 or you have heard about automating some of your daily SEO checks with Python.
In any case, note that the code is shared, you can test the examples and my notes for the configuration could be used for any Python script that you will write or use.
Additionally, don’t be afraid, take a deep breath, I am still a newbie and I keep learning. You don’t need to be a Python pro to try this at home. I have shared some of the resources that I have found and learnt from.
I will keep sharing, because in a world of resources, it’s difficult to find where to start and separate the wheat from the chaff.
Bear with me and let’s try the following example together, I promise that it will be easy as long as you follow the steps:
Problem: Multiple content editors change meta data for key product pages. SEO doesn’t receive any notifications. Or you simply want to run your Python script to record your new meta data values in an Excel spreadsheet. You can schedule the running of your script. This makes it different from just running your favourite SEO crawling software.
Result: SEO team is aware of any changes.
Your files needed, download here.
Let’s have a look at the config file, why you would need it in the first place and what you will need to fill in it.
Go to the folder where you have saved the script, select edit with Notepad++ editor and open the file (you can also use any other code editor of your choice, including the built-in Python IDLE).
This is what you will expect to find:
Now if your excel file is placed in the same folder as your script is, don’t worry, just type the following and delete the first 2 lines in the file:
EXCEL_FILE = ‘input.xlsx’ # input.xlsx is the name of your file
Your Excel file should look like this:
The purpose of the config file is to reference where your Excel file is located, your email address (if you want to send and receive email notifications).
You need just to include your gmail (I would recommend you to use a non-personal gmail account which you would be comfortable to change the settings for. You may just create a dummy Gmail account to execute scripts), the password for that email (so that the script will be able to login your account), the email of the recipient and the name of the file if it is different from input.xlsx.
This is how the config file looks like:
Remember that # indicates a note in your Python code, you can include as many # as you want, Python will not follow them, they are just for your own help 🙂
Next step is to make sure that your GMAIL account allows less-secure applications (in this case Python) to access your account and perform the desired action (send the email from this email address). This is what you need to do:
- Go to your Gmail
- Login
- Go to https://myaccount.google.com/u/0/security?hl=en&pli=1#activity
- Select Device Activity and Notifications under Sign-in & Security on the left-hand side
- Scroll-down to “Allow less secure apps: ON”
- Make sure that the setting Allow less-secure apps is turned ON.
This is it 🙂
Now if you have downloaded PY 3.5, what you need to do to start the script is just type in py nameofscript.py. In this specific case, you will type py meta.py.
The script is started and it starts going through each step. It reads your input data from the Excel file.
Next, it will start executing the script which means that it will open the urls and will start parsing the HTML.
In order for the script to identify the discrepancy between the values which you expect to see (recorded in the Excel file) and the actual values (on your pages), the script parses the HTML and records the real-time values. This is also useful if you want to keep any historic data for meta data. Even if you do not want to receive any notification, you can run the script to check the values and let the script record them for you along with the data.
These are the results 🙂
How do you import Python libraries
Python is extensible and it does work with libraries, how do you know which libraries to import?
Libraries are included in the beginning of your Python code. Look at the below (quoted from the code):
- from datetime import datetime
- import smtplib
- from selenium import webdriver
- from openpyxl import load_workbook
- import config_script_check as config
Are you ready to make your Python extensible? You should be, this is the power of Python, its libraries will allow you to do amazing stuff!
If you are using Python 3.5, please, open the Command Line as shown below and type (note that the command can slightly vary, depending on your Windows version and your current Python version):
py -m pip install nameofyourlibrary # I have typed py -m pip install selenium
Easy isn’t it? Don’t overthink, just install the libraries from my script. Here are the libraries which you will need:
- datetime
- smtplib
- selenium
- openpyxl
- Config
Again, you should just repeat the above for every new library, but just replace selenium with the corresponding names of the libraries above.
Quick info what these libraries are used for (this will help you understand what libraries are used for):
- Datetime – allows the script to use the data and time
- Smtplib – this library is used for sending emails
- Selenium – this is the library to use the Chromedriver to open your browser and manipulate the data in it
- Openpyxl – it is used to read, write and record Excel files –
- Config – you guess it, this is used to read the config file 🙂
Resources
- Automate the Boring Stuff with Python. I have explored many resources, but this free book worth checking out. I have also passed through Al’s Udemy course. While it will not make you an expert in Python, it is quite interesting and Al does his best to present complex concepts in an easy to understand manner.
- Get familiar with Beautiful Soup. It is not a delicious soup, it is a Python library for pulling data out of HTML and XML files. Read a bit about it here.
- urllib2 – a Python module that can be used for fetching URLs.
Reference book to Learn Python:
You can use it to find all the links of a website or just find links on your (or your competitor’s website) which match a specific pattern. It allows to find every “a” element that has an href attribute and check the HTTP header of pages.
- It never hurts to read a line or two for decomposition. Remember that it comes down to breaking a complex task in smaller, manageable chunks / pieces which a program can execute.
- Still haven’t tried Paul Shapiro’s Python indexation checker, now that you know how to set up the Python environment, you can check it out here.
- Python basics don’t have to be hard. Go through some resources in the free online book at Learn Python the Hard Way.
Do you have ideas? Have you done something cool with Python? Do you have any issues with following the above? Shoot me an email or leave a comment, I’d love to hear from you!