Using Excel Spreadsheets in Python

4 mins read502 Views Comment

Updated on Apr 30, 2023 17:54 IST

Excel spreadsheets are a popular tool for data analysis, but what if you want to manipulate and analyze that data using Python? Luckily, there’s a library called openpyxl that allows you to work with Excel spreadsheets in Python.

In this article, we’re going to explore how to how to extract data from the following website and store in an Excel file. We will be covering the following sections:

What is Web Scraping?
- Web Scraping General Process Flow
What is BeautifulSoup?
Scraping IMDb Top 250 Movies Using BeautifulSoup

What is Web Scraping?

Web scraping refers to a technique used for extracting data from websites. It is a way to automate the process of gathering information that you might otherwise have to collect manually by visiting each website individually.

The process of web scraping involves using software programs called web scrapers or crawlers to collect data from websites. These programs can be customized to scrape specific types of data from specific websites, depending on your needs.

Web scraping is a useful technique for a variety of purposes. For example, it can be used to collect information on competitors, monitor prices on e-commerce sites, or gather data for academic research. It’s also commonly used in industries such as finance and marketing to gather data for analysis.

However, it is important to note that web scraping is a legally gray area, and it is important to ensure that you’re not violating any laws or terms of service agreements when scraping data from websites. Some websites may prohibit scraping, and in some cases, scraping data may be considered a violation of copyright laws. Additionally, web scraping can go by different names, such as web harvesting, web data extraction, or web data mining.

Openpyxl is a powerful library that allows you to read and write Excel files, as well as manipulate the data within them. With openpyxl, you can easily add, delete, and modify data in Excel spreadsheets using Python.

Another useful tool for data analysis is web scraping. It is the process of extracting data from websites. BeautifulSoup is a Python library that makes web scraping easy and efficient. It allows you to navigate HTML and XML documents, search for specific elements, and extract data from them.

Explore free Python courses

Web Scraping General Process Flow

Send an HTTP request to the webpage URL.
The server will respond with the HTML contents of the webpage.
Collect the relevant data from the response content.
Organize the data into a structured format
Eliminate unnecessary, redundant, missing data
Store the data in a format useful to the user.

Recommended online courses

Best-suited Python courses for you

Learn Python with these high-rated online courses

Programming Languages

Indian Institute of Hardware Technology, GurgaonCertificate

Total Fees

– / –

Duration

40 hours

Python Course

GKTCS InnovationsCertificate

Total Fees

– / –

Duration

5 days

Certificate in Jython

GKTCS InnovationsCertificate

Total Fees

– / –

Duration

3 days

PLC Programming

CRISPCertificate

Total Fees

₹3 K

Duration

3 weeks

Django - High Level Web framework

GKTCS InnovationsCertificate

Total Fees

– / –

Duration

4 days

Certificate in Python

Techdata Solution, PuneCertificate

Total Fees

– / –

Duration

20 hours

Scripting - Python

BlackBox Digital Lab - School of Visual EffectsCertificate

Total Fees

– / –

Duration

2 months

Certificate in Python

High Technologies Solutions (Delhi I Noida I Gurgaon) - HTS, KalkajiCertificate

Total Fees

– / –

Duration

1 year

Python Training

IIT BombayCertificate

4.2

Total Fees

Free

Duration

6 weeks

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

3 months

What is BeautifulSoup?

It is a Python library used for web scraping purposes. It provides a way to parse and extract data from HTML and XML files. The library was created by Leonard Richardson and is maintained by a community of contributors. It’s free and open-source, which means that anyone can use it and contribute to its development.

One of the biggest advantages of using BeautifulSoup is its ease of use. The library is designed to be intuitive and easy to learn, even for beginners. BeautifulSoup provides a variety of functions and methods that is used for extracting data from HTML and XML files, without requiring extensive knowledge of programming.

Another advantage of BeautifulSoup is its flexibility. It can be used with a variety of different parsers, including lxml, html5lib, and Python’s built-in html.parser. This means that you can choose the parser that best suits your needs and the requirements of the website you’re scraping.

Beautiful Soup is widely used in the web scraping community because of its versatility and ease of use. It can be used for applications, including data mining, web content extraction, and information retrieval.

Explore free data mining courses

Scraping IMDb Top 250 Movies Using BeautifulSoup

Let’s see how we can use Scrapy to fetch book titles from a website. For our demonstration, we will be using a faux website, called books.toscrape, which is specifically used for web scraping purposes. The prices and ratings mentioned here are random and hold no real meaning

We are using Google Colaboratory to execute our code.

#Load relevant packages

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
       
print(os.path.join(dirname, filename))

# You are allowed to write up to 20GB to current directory (/kaggle/working/). This will get preserved as output when you create a version using "Save & Run All" 
# You are also allowed to write temporary files to /kaggle/temp/. However, they won't be saved outside of the current session
Copy code

Step 1 – Install required libraries

Now, let’s install the required packages in our working environment.

pip install beautifulsoup4
pip install requests
pip install openpyxl
Copy code

Step 2 – Import the libraries

# Importing required libraries
from bs4 import BeautifulSoup
import requests, openpyxl
Copy code

Step 3 – Prepare the excel sheet

#Creating excel file
excel = openpyxl.Workbook()
sheet = excel.active

#Changing sheet name
sheet.title = 'IMDb Top 250 Movies'

#Adding a row
sheet.append(['Rank','Movie Name','Year of release','IMDB Rating'])
Copy code

Step 4 – Create the Web Sraper

# Scraping required data
try:
    html = requests.get('https://www.imdb.com/chart/top/')
    html.raise_for_status()

    soup = BeautifulSoup(html.text,'html.parser')
    movies = soup.find('tbody',class_='lister-list').find_all('tr')
    for movie in movies:
        name = movie.find('td',class_='titleColumn').a.text
        rank = movie.find('td',class_='titleColumn').get_text(strip=True).split('.')[0]
        year = movie.find('td',class_='titleColumn').span.text.strip('()')
        rating = movie.find('td',class_='ratingColumn imdbRating').strong.text
        print(rank,name, year, rating)
        sheet.append([rank,name, year, rating])
except Exception as e:
    print(e)
Copy code

Voila! We have successfully scraped the movie titles from the website using BeautifulSoup. Refer to the sample screenshot above.

Step 4 – Saving excel File

#Saving the excel file
excel.save('IMDb Top 250 Movies.xlsx')
Copy code

Endnotes

By combining openpyxl and BeautifulSoup, you can create a powerful data analysis pipeline that extracts data from websites and stores it in an Excel spreadsheet. This can be especially useful for automating data collection and analysis tasks. It is a powerful way to work with Excel spreadsheets and extract data from websites using Python. With these two libraries, you can easily automate data collection and analysis tasks, and create powerful data pipelines for your projects.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Using Excel Spreadsheets in Python

What is Web Scraping?

Web Scraping General Process Flow

Best-suited Python courses for you

Programming Languages

Python Course

Certificate in Jython

PLC Programming

Django - High Level Web framework

Certificate in Python

Scripting - Python

Certificate in Python

Python Training

Databases and SQL for Data Science with Python

What is BeautifulSoup?

Scraping IMDb Top 250 Movies Using BeautifulSoup

Endnotes

Top Picks & New Arrivals