How to Count Unique Values in a Column in The Pandas DataFrame in 2024?

Author: Amresh Mishra | Published On: March 26, 2024

In the present post, we’ll figure out how to count and include distinct occurrences in columns of a Pandas DataFrame.

When dealing with machine learning and artificial intelligence or data analysis with pandas we are frequently expected to get the count of unique or distinct values from a solitary column or numerous columns.

You can get the number of unique values in the column of pandas DataFrame utilizing multiple ways like utilizing capacities Series.unique.size, Series.nunique(), Series.drop_duplicates().size(). Since the DataFrame column is inside addressed as a Series, you can utilize these functions to play out the activity.

Mostly, the information in every column addresses an alternate component of the data frame. It could be persistent, categorical, or something shocking like different texts. In the event that you don’t know about the idea of the qualities you’re managing, it very well may be a decent exploratory advance to be familiar with the count of particular qualities. In this instructional guide, we’ll take a gander at how to get the inclusion of novel qualities in every column of a pandas data frame.

What is Python Pandas?

Pandas is an open-source Data Analysis library that is made basically for working with social or marked information both effectively and instinctively. It gives different information designs and tasks for controlling/ manipulating numerical data and time series. This library is based on top of the NumPy library. Pandas is quick and it has elite execution and efficiency for clients.

count unique values of a column in pandas DataFrame

  • Speedy Examples of Count Unique Values in Column
  • By using Series.unique() – Count Unique Values
  • By using Series.nunique()
  • Count Unique Values in Multiple Columns

1] Speedy Examples of Count Unique Values in Column.

Following are fast instances of how to count unique values in columns.

Below are the fastest examples:


# Get Unique Count using Series.unique()
count = df.Courses.unique().size

# Using Series.nunique()
count = df.Courses.nunique()

# Get frequency of each value
frequence = df.Courses.value_counts()

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size

#Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print(count)

#Count unique on multiple columns
count = df[['Courses','Fee']].nunique()

#count unique values in each row
#df.nunique(axis=1)

We should make a DataFrame.

import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields beneath yield.


   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   Pandas  30000   50days      2000
4   Python  25000   40days      2300
5    Spark  20000   30days      1000
6   Pandas  30000   50days      2000

2] By using Series.unique() – Count Unique Values.

To get a count of unique values in a column use pandas, first, use Series.unique() function to get one-of-a-kind/ unique qualities from a column by eliminating duplicate values and afterward call the size to get the count. the unique() function returns an array with extraordinary worth arranged by appearance and the outcomes are not arranged/sorted.

Syntax: Series.unique()

For example:


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))

# Output
# Unique values count : 4

3] By using Series.nunique().

On the other hand, you can likewise have a go at utilizing Series.nunique(), this profits a number of remarkable components in the item barring NaN values. To incorporate NaN values use to drop a param to False.

Syntax: Series.nunique(dropna=True)

Example:

# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Outputs
# Unique values count : 4

4] Count Unique Values in Multiple Columns.

To get the count of one of the unique values on multiple columns use pandas DataFrame.drop_duplicates() which drops copy lines/rows from pandas DataFrame. This wipes out copies and returns DataFrame with unique rows.

On the outcome use the shape property to get the state of the DataFrame which preferably returns a tuple with rows and columns, and use shape[0] to get the row count.

What is a DataFrame?

A DataFrame is a data structure that coordinates information/data into a 2-dimensional table of rows and columns, similar to a spreadsheet. DataFrames are perhaps the most well-known datum structures utilized in present-day data analytics since they are an adaptable and natural approach to putting away and working with data.

# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Outputs
# Unique multiple columns : 5

Hope this post will beneficial for you, thank you have a good day.

Must Read:

FAQs About Counting Unique Values in Pandas DataFrame Columns:

Q: Can I count unique values in multiple columns simultaneously?

A: Yes, you can use the nunique() method on the entire DataFrame or specify multiple columns within the method to count unique values across those columns.

Q: Is there a way to ignore NaN (missing) values when counting unique values?

A: Absolutely! The dropna parameter within the nunique() method allows you to exclude NaN values from the count.

Q: Can I visualize the unique value counts?

A: Sure thing! You can use plotting libraries like Matplotlib or Seaborn to create visualizations of unique value counts for better insights.

Q: What if I want to count unique values based on certain conditions?

A: You can use boolean indexing to filter your DataFrame based on specific conditions and then apply the nunique() method to count unique values within that subset.

Conclusion:

Counting unique values in a Pandas DataFrame column is a breeze with the nunique() method. Whether you’re analyzing data for business or personal projects, having a clear understanding of unique value counts can provide valuable insights into your dataset.

Author: Amresh Mishra
Amresh Mishra is a passionate coder and technology enthusiast dedicated to exploring the vast world of programming. With a keen interest in web development, software engineering, and emerging technologies, Amresh is on a mission to share his knowledge and experience with fellow enthusiasts through his website, CodersCanteen.com.

Leave a Comment