Downloading the Source of the Website

We learned how to view the source code of a website using our browsers. How do we do this in Python?


Create a new file and name it webscraping.py and add the following code to it:

The requests module lets us download the source code of a website. The first thing we need to do is import it:

import requests

Now let's choose a website to scrape from:

url = "http://www.kosbie.net/cmu/spring-17/15-112/syllabus.html"

Great! Now let's ask the requests module to download the website:

website = requests.get(url)

And finally let's save the source into a variable. The website object we got above includes lots of information about the website we requested, but we only need the html code:

source = website.text

Let's print the source to see what we got:

print(source)

If everything went well, you should see a large blob of HTML code! This is all the content that is on the real website! If you're curious, you can copy this into a new file and save it as foo.html, open it in your browser and see the real website without the stylings (CSS) or functionality (JS).

results matching ""

    No results matching ""