Select a Subtopic
Day 16: Data Analysis with Pandas
Let's dive into **Day 16**, where we focus on **Data Analysis with Pandas**! I'll guide you through the essentials in an interactive way. By the end of this lesson, you'll have a solid understanding of data analysis using the **Pandas** library in Python.
Topics Covered:
- DataFrames and Series
- Reading/Writing CSV and Excel Files
- Data Cleaning and Manipulation
1. Introduction to Pandas
Pandas is one of the most powerful libraries in Python for data manipulation and analysis. It provides two main classes:
- Series: A one-dimensional array, like a list or a column in a table.
- DataFrame: A two-dimensional table, like a spreadsheet or SQL table, where data is organized in rows and columns.
Install Pandas
If you haven't installed Pandas yet, run this command in your terminal:
pip install pandas
2. Working with Series
A **Series** is essentially a list of data. Here's how to create a Series:
import pandas as pd
# Creating a Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
Output:
0 10
1 20
2 30
3 40
4 50
dtype: int64
Each item in the Series has an index (0, 1, 2...) and a value (10, 20, 30...).
Exercise:
Create a Series containing the names of five of your favorite movies.
3. Working with DataFrames
A **DataFrame** is like a table, with columns and rows. Let's create a simple DataFrame:
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
Next Steps:
In the next section, we'll dive deeper into **Data Visualization** with Matplotlib and Seaborn on **Day 17**!