Troubleshooting/FAQ
Using Chrome's Developer Tools
If you are using Chrome, an extremely helpful tool is Chromes' developer tools for web development. This will allow you to see what is going behind the scenes for webpages, including source code, stylings, and calls to external scripts.
To access this, right click on the webpage you wish to examine and click Inspect
. This will bring up a bunch of tabs to help you see what's going on. Alternatively, you can click View Page Source
to see the pure html of the page.
I’m unable to select an element.
Double check that you’re using the appropriate tag name, attribute, etc. of the element. Try printing soup.prettify()
to make sure the element exists and has the expected attributes. If the element doesn’t exist (but should), or looks different from what you expect, keep reading.
The output of soup.prettify()
is different from what I see in my inspection tool.
Many sites use JavaScript to modify the page after it’s loaded, so what you see in your inspection tool may be different from what BeautifulSoup sees.
To have your inspection tool show the same thing as BeautifulSoup, you can disable JavaScript in your browser:
- In Firefox:
- Type about:config in the address bar, then press enter
- Type javascript.enabled into the search box, then press enter
- Double-click the data row that appears
- In Chrome:
- Go to Settings
- In the Privacy section, click “Content settings”
- In the JavaScript section, click “Do not allow any site to run JavaScript”
How do I scrape web pages that require JavaScript?
This is beyond the scope of this book, but we’ll summarize two methods here.
Method 1: Use the inspection tool’s network monitor
Websites often load content in response to some event. For example, on the homepage of a news website, scrolling to the bottom of the page might cause more news stories to appear. This dynamically loaded content can be seen in the inspection tool’s network monitor. If you can find the request that corresponds to the content you want, then perform the same request using requests.
To access the inspection tool on Chrome, right click on the website and click inspect
. This should bring up the inspector tool. You can then click on the Network
tab.
Method 2: Use Selenium
Selenium is a browser automation tool. Point it to the web page you wish to scrape, and it will load it in Firefox. It executes all the JavaScript on the page, and it even lets you perform actions such as clicking and scrolling. Once the page is fully loaded, you can select the data you need. Learn more here.