Python Web Scraping Using BeautifulSoup and lxml

Introduction

Web scraping is the process of extracting data from websites. Python is one of the best tools for web scraping, thanks to libraries like:

requests – to make HTTP requests
BeautifulSoup – to parse and navigate HTML
lxml – for faster XML/HTML parsing (used as a parser with BeautifulSoup)

Install Required Modules

pip install requests beautifulsoup4 lxml

Step-by-Step Web Scraping Example

Step 1: Import Modules

import requests
from bs4 import BeautifulSoup

Step 2: Make an HTTP Request

url = 'https://example.com'
response = requests.get(url)
print(response.status_code)  # 200 means OK

Step 3: Parse HTML with BeautifulSoup and lxml

soup = BeautifulSoup(response.content, 'lxml')  # using lxml parser
print(soup.title.text)

Common BeautifulSoup Functions

Task	Code
Get all <a> tags	soup.find_all('a')
Get all <p> tags	soup.find_all('p')
Get tag by ID	soup.find(id="main")
Get tag by class	soup.find_all(class_="product-title")
Get text only	tag.text
Get attribute value	tag['href']

Example: Scraping Article Titles

import requests
from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")

quotes = soup.find_all("span", class_="text")

for i, quote in enumerate(quotes, 1):
    print(f"{i}. {quote.text}")

Output:

1. "The world as we have created it is a process of our thinking."
2. "It is our choices that show what we truly are..."
...

Parsing Tables from HTML

table = soup.find('table')
rows = table.find_all('tr')

for row in rows:
    cols = row.find_all('td')
    data = [col.text.strip() for col in cols]
    print(data)

Using lxml Directly (Advanced)

If you want faster parsing and XPath support:

from lxml import html

url = "https://example.com"
response = requests.get(url)

tree = html.fromstring(response.content)

# Extract all links
links = tree.xpath('//a/@href')
print(links)

Handling Headers & User-Agent (To Avoid Blocks)

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
}

response = requests.get("https://example.com", headers=headers)

Exporting Scraped Data to CSV

import csv

with open("quotes.csv", "w", newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(["Quote"])
    for quote in quotes:
        writer.writerow([quote.text])

Ethical Note on Web Scraping

Always read a website's robots.txt before scraping.
Avoid overloading servers.
Do not scrape private or sensitive data.