reading-notes2

https://m7madmomani2.github.io/reading-notes2

View the Project on GitHub M7madMomani2/reading-notes2

Web Scraping

Inspecting the Website

The first thing that we need to do is to figure out where we can locate the links to the files we want to download inside the multiple levels of HTML tags. Simply put, there is a lot of code on a website page and we want to find the relevant pieces of code that contains our data.

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

parse the html

soup = BeautifulSoup(response.text, “html.parser”)

## We use the method .findAll to locate all of our <a> tags.
  
soup.findAll('a') 

Semantic annotation recognizing