Downloading the Source of the Website

We learned how to view the source code of a website using our browsers. How do we do this in Python?

Create a new file and name it webscraping.py and add the following code to it:

The requests module lets us download the source code of a website. The first thing we need to do is import it:

import requests

Now let's choose a website to scrape from:

url = "http://www.kosbie.net/cmu/spring-17/15-112/syllabus.html"

Great! Now let's ask the requests module to download the website:

website = requests.get(url)

And finally let's save the source into a variable. The website object we got above includes lots of information about the website we requested, but we only need the html code:

source = website.text

Let's print the source to see what we got:

print(source)

If everything went well, you should see a large blob of HTML code! This is all the content that is on the real website! If you're curious, you can copy this into a new file and save it as foo.html, open it in your browser and see the real website without the stylings (CSS) or functionality (JS).

Downloading the Source of a Website

Downloading the Source of the Website

results matching ""

No results matching ""