Web Scraping Using Selenium Python
February 2021
Introduction
Table of Contents
In this tutorial, we first provide an overview of some foundational concepts about the World-Wide-Web. We then lay out some common approaches to web scraping and compare their usage. With this background, we introduce several applications that use the Selenium Python package to scrape websites.
This tutorial is organized into the following parts:
- Basic concepts of the World-Wide-Web.
- Comparison of some common approaches to web scraping.
- Use-cases for when to use the Selenium WebDriver.
- Illustration of how to find web elements using Selenium WebDriver.
- Illustration of how to fill in web forms using Selenium WebDriver.
We plan to add more applications in the near future. The content of this tutorial is a work in progress, and we are happy to receive feedback! If you find anything confusing or think the guide misses important content, please email: help@iq.harvard.edu.
Custom Websites
We decided to build custom websites for many of the examples used in this tutorial instead of scraping live websites, so that we have full control over the web environment. This provides us stability —– live websites are updated more often than books, and by the time you try a scraping example, it may no longer work. Also, a custom website allows us to craft examples that illustrate specific skills and avoid distractions. Finally, the maintainers of a live website may not appreciate us using them to learn about web scraping and could try to block our scrapers. Using our own custom websites avoids these risks, however, the skills learnt in these examples can certainly still be applied to live websites.
Below I list the name and its link for each of the custom websites we have built for this tutorial: