In this Pandas tutorial, we will explore all about the Pandas **groupby()** method with the help of the examples. Pandas **groupby()** is very similar to the Group by in SQL that is used to group the result set according to similar values and apply aggregate functions like sum, min, max, avg, etc on a group of the result set.

If you are familiar with Group By in SQL then you can easily understand Group By in Pandas DataFrame.

Pandas provides a DataFrame **groupby()** method that is used to group the result set based on the passed columns. Before going to deep dive into this article, Let’s see all about the Pandas **groupby()** method.

Headings of Contents

## Pandas DataFrame groupby() Method

**groupby()** is a DataFrame method that is used to grouping the Pandas DataFrame by mapper or series of columns. The **groupby()** method splits the objects into groups, applying an aggregate function on the groups and finally combining the result set.

**Syntax of Pandas groupby() Method**:

This is the syntax of the Pandas **groupby()** method.

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)

**Parameter of groupby() Method:**

The Pandas **groupby()** Method accepts some keyword argument parameters that can be used as per requirement.

by:-Required, A labels, A list of labels or a function used to specifiy how to group the DataFrame.axis:-{0 for 'Index', 1 for 'column'}. Default is0. For the series, this parameter is unused and by default set to0.level:-int, level name, If the axis is multiplex, group by particular level or levels. Do not specify both by and level.as_index:-Boolean, Default isTrue.Return the objects with group lavels.sort:-Boolean, Default isTrue. It is used to sort the group keys.You can get better performance by turning thisoff.group_keys:-Boolean, Default isTrue. Set to False if the result should not add the groups to the index.observed:-It has been deprecated from Pandas version2.1.0.dropna:-Boolean, default isTrue. IfTrueand if group keys containsNAvalue,NAvalues together with rows and columns will be droped.

Now let’s see how to use group in Pandas DataFrame with the help of some examples.

## How to use GroupBy in Pandas DataFrame

For a demonstration of this article, I have prepared a sample CSV dataset along with some dummy data. Throughout this article, we are about to use this sample CSV dataset.

I have loaded the above Sample CSV dataset into Pandas DataFrame with the help of the Pandas **read_csv()** method.

### Group By Department in Pandas DataFrame

As you can see in the above Pandas DataFrame, we have one **emp_department** column that has a department for each employee. Now we want to group the Pandas DataFrame based on the department and perform some aggregate functions on each group.

To apply the aggregate function, we can use the **agg()** function along with the aggregate function otherwise we can use the corresponding aggregate function directly on **groupby()** object.

The most common aggregate functions are **sum()**, **mean()**, **count()**, **min()**, **max()**, **size(**), **median()** and **var()**.

**sum() aggregate function**:

The **sum()** function is used to add the total salary of all the employees within a department. For example, I want to get the total salary of employees in a particular department.

import pandas as pd df = pd.read_csv( '../../Datasets/employees.csv' ) x = df.groupby(['emp_department']).sum() x[['emp_salary']]

**count() aggregate function**:

The **count() **aggregate function is used to return the total number of rows in each group. For example, I want to get the total number of employees in each department.

import pandas as pd import numpy as np df = pd.read_csv( '../../Datasets/employees.csv' ) x = df.groupby(['emp_department']).count() x.rename({"emp_full_name": "Total Employees"}, axis=1,inplace=True) x[["Total Employees"]]

## How does Pandas Groupby work?

Pandas **groupby()** Method performs three operations behind the scenes. Three operations are listed below.

**Splitting:-**Splitting the original object into groups based on the defined criteria.**Applying:-**Applying aggregate function on each group.**Combining:-**Combining the results.

This process is also called a **split-apply-combine** chain.

Let’s understand all the above operations step by step with the help of the examples.

### Splitting the original objects into groups:

When we call the **Pandas groupby() method** on the top of the Pandas DataFrame.The** groupby()** method splits the objects into groups based on predefined criteria. The **groupby()** functions map the labels to the names of the groups.

For example, we are grouping the data based on the **emp_department**.

`groups = df.groupby('emp_department')`

We can also group the data based on the multiple-column names.

`groups = df.groupby(['emp_gender', 'emp_department'])`

The **groupby()** function always returns a **groupby** object which contains multiple groups.

We can get all the groups by iterating over the **groupby()** object.

Let’s use Python for loop to iterate each item of the groupby object. Here, I am trying to group the Pandas DataFrame based on the **emp_department** column.

import pandas as pd import numpy as np df = pd.read_csv( '../../Datasets/employees.csv' ) groups = df.groupby(['emp_department']) for group_name, data in groups: print("Group Name is:- ", group_name) print("-------------------------") print("Group Data:- \n\n", data)

**The Output will be:**

```
Group Name is:- ('BPO',)
-------------------------
Group Data:-
emp_full_name emp_email emp_gender emp_salary \
0 Mayank Kumar [email protected] Male 25000
10 Harshali Kumari [email protected] Female 21000
11 Vinay Singh [email protected] Male 18000
emp_department date_of_joining
0 BPO 11/1/2023
10 BPO 11/9/2023
11 BPO 11/10/2023
Group Name is:- ('IT',)
-------------------------
Group Data:-
emp_full_name emp_email emp_gender emp_salary \
1 Vishvajit Rao [email protected] Male 40000
4 Vishal Kumar [email protected] Male 60000
7 James Bond [email protected] Male 42000
12 Vinay Mehra [email protected] Male 45000
13 Akshara Singh [email protected] Female 55000
emp_department date_of_joining
1 IT 11/2/2023
4 IT 11/5/2023
7 IT 11/7/2023
12 IT 11/11/2023
13 IT 11/12/2023
Group Name is:- ('SEO',)
-------------------------
Group Data:-
emp_full_name emp_email emp_gender emp_salary \
3 Kavya Singh [email protected] Female 20000
5 Vaishali Mehta [email protected] Female 35000
6 Vaishali Mehta [email protected] Female 35000
emp_department date_of_joining
3 SEO 11/4/2023
5 SEO 11/6/2023
6 SEO 11/6/2023
Group Name is:- ('Sales',)
-------------------------
Group Data:-
emp_full_name emp_email emp_gender emp_salary \
2 Harshita Mathur [email protected] Female 20000
8 Mariya Katherine [email protected] Female 32000
9 Mariya Katherine [email protected] Female 40000
emp_department date_of_joining
2 Sales 11/3/2023
8 Sales 11/8/2023
9 Sales 11/8/2023
```

#### Access Specific Group:

Pandas **groupby()** object has a special method called **get_group()** which takes the group name parameter and returns the data in the form of Pandas DataFrame.

Each department name represents a group name, For example, I want to select the **IT** group, and Then we will pass the **IT** name in the **get_group()** method.

Let’s see.

import pandas as pd import numpy as np df = pd.read_csv( '../../Datasets/employees.csv' ) groups = df.groupby(['emp_department']) groups.get_group('IT')

This is how you can get single group information in the form of Pandas DataFrame by using the **get_group()** method. Now, let’s see how we can use aggregate functions.

### Applying Aggregate Functions

After splitting the original objects into groups, we can apply aggregate functions on top of the GroupBy object even if we can apply aggregate functions by each group.

Let’s apply aggregate functions on the GroupBy object.

import pandas as pd df = pd.read_csv( '../../Datasets/employees.csv' ) groups = df.groupby(['emp_department']) groups.max()

The max() aggregate function returns the maximum salary of each department.

aggregate functions can also be applied to a single group.

Let’s apply the **median()** aggregate function to the **IT** group.

import pandas as pd df = pd.read_csv( '../../Datasets/employees.csv' ) groups = df.groupby(['emp_department']) groups.get_group('IT')['emp_salary'].median()

This is the way you can use aggregate functions on GroupBy objects or single groups.

### Combining

This is the last step in the GroupBy process. After applying aggregate functions on each group it combines the result set of all the groups into a single DataFrame.This stage is performed by the Pandas itself.

This is how Pandas GroupBy works.

**Helpful Pandas Articles**

- How to convert Dictionary to CSV
- How to convert YML to Dictionary
- How to convert Excel to Dictionary
- How to Convert String to DateTime in Python
- How to Sort the List Of Dictionaries By Value in Python
- How To Add a Column in Pandas Dataframe
- How to Replace Column Values in Pandas DataFrame
- How to Convert Excel to JSON in Python
- How to Drop Duplicate Rows in Pandas DataFrame

- How to convert DataFrame to HTML in Python
- How to Delete a Column in Pandas DataFrame
- How to convert SQL Query Result to Pandas DataFrame
- How to Convert Dictionary to Excel in Python
- How to Convert Excel to Dictionary in Python
- How to Rename Column Name in Pandas DataFrame
- How to Get Day Name from Date in Pandas DataFrame
- How to Split String in Pandas DataFrame Column

ðŸ‘‰ Pandas DataFrame groupby() Method Documentation:-Click Here

## Conclusion

So during this article, we have seen ** how to use groupby in Pandas DataFrame** along with some examples and we have seen the working of

**Pandas groupby() method**. This is one of the useful features in Data Engineering and Data Analysis that we must know Pandas GroupBy method.

If you found this article helpful, Please share and keep visiting for further Pandas tutorials.