Begining of data science with Pandas

Srishty Suman * 28-June-2019
Data Science, Data Analytics, Pandas.

Introduction

In this blog, we will discuss about the basics of data analysis in Python using Pandas library including code samples.

Lets talk about Pandas library.

What is Pandas?

The Pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and  NumPy, the fundamental library for scientific computing in Python on which Pandas was built. Unlike NumPy library which provides objects for multi-dimensional arrays, Pandas provides in-memory 2-Dim table object called Dataframe. It is like a spreadsheet with column names and row labels.

There are two types of data structures in Pandas: Series and DataFrames.

Series: Pandas Series is a one dimensional data structure (“a one dimensional ndarray”) that can store values — and for every value it holds a unique index, too.

DataFrame: Pandas DataFrame is a two (or more) dimensional data structure – basically a table with rows and columns. The columns have names and the rows have indexes.

Why use Pandas?

Pandas has following advantages:

  • Easily handles missing data
  • It uses Series for one-dimensional data structure and Dataframe for multi-dimensional data structure
  • It provides an efficient way to slice the data
  • It provides a flexible way to merge, concatenate or reshape the data
  • It includes a powerful time series tool to work with

How to install Pandas?

To install Python Pandas, go to your command line/ terminal and type “pip install pandas” or else, if you have anaconda installed in your system, just type in “conda install pandas”. Once the installation is completed, go to your IDE (Jupyter, PyCharm etc.) and simply import it by typing: “import pandas as pd”.

How to open data files in pandas?

We might have our data in .csv files or SQL tables. Maybe Excel files. Or .tsv files. Or something else. But the goal is the same in all cases. If we want to analyse that data using pandas, the first step will be to read it into a data structure that’s compatible with pandas.

Loading a .csv file into a pandas DataFrame

Let’s load a .csv data file into pandas for this we have a function for it, called read_csv(). Import the pandas library. Read titanic.csv into a DataFrame.

Loading a file in other format into a pandas DataFrame

We can also load data in other format(.tsv, excel, sql and so on) by using following functions.

Export Pandas DataFrame to a CSV File and other format(.tsv, sql, excel…..)

Let’s export the pandas DataFrame by using the following functions:

Reference

Hope this blog is relevant to you. For getting more knowledge, you can go through following links:

Conclusion

Through this blog, we have learnt about the basics of pandas library.

Thumbnail Image

About Srishty Suman

 

Srishty Suman has 1+ years of experience in Data Engineering. She mostly worked in organising data model from unorganised data from multiple courses. She also prepared prepared statistical information reports and represented them graphically for better visualisation. In her free time, she likes writing blogs, traveling and exploring new places.

Recents (5)

Popular (5)

Topics (all)