Sitemap

Python Code Pills — Compare Two CSV Files with Pandas

2 min readNov 12, 2022

Thanks to the Pandas library in Python, data manipulation and comparison can be possible with only a few lines of code. Today's code pill is about comparing two similar CSV files with only a couple of lines.

Press enter or click to view image in full size
Photo by Hitesh Choudhary on Unsplash

Here we have two CSV files has an “Email” column and we will find the same and different emails from each other.

Here are sample CSV files

first.csv
second.csv

Python Code :

import pandas as pa


def compare_csv(first_file: str, second_file: str):
df_first = pa.read_csv(first_file, header=0, on_bad_lines="skip", )
df_second = pa.read_csv(second_file, header=0, on_bad_lines="skip")

# Access all lines where first contains same with second csv
result_same = df_first[df_first.apply(tuple, 1).isin(df_second.apply(tuple,1))]
print("Same")
print(result_same)

# Use Tilde to access all lines where first doesn't contain second csv
result_diff = df_first[~df_first.apply(tuple, 1).isin(df_second.apply(tuple,1))]
print("Difference")
print(result_diff)


if __name__ == '__main__':
compare_csv("first.csv", "second.csv")

We use isin function of pandas to search for similarities between two data frames and to find differences we use Tilde(~) operator.

Result:

result

In this code pills, we learned how to compare 2 similar CSV files. See you at the next code pills.

--

--

No responses yet