Begining of data science with Pandas
Srishty Suman * 28-June-2019
Data Science, Data Analytics, Pandas.
In this blog, we will discuss about the basics of data analysis in Python using Pandas library including code samples.
Lets talk about Pandas library.
What is Pandas?
The Pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and NumPy, the fundamental library for scientific computing in Python on which Pandas was built. Unlike NumPy library which provides objects for multi-dimensional arrays, Pandas provides in-memory 2-Dim table object called Dataframe. It is like a spreadsheet with column names and row labels.
There are two types of data structures in Pandas: Series and DataFrames.
Series: Pandas Series is a one dimensional data structure (“a one dimensional ndarray”) that can store values — and for every value it holds a unique index, too.
DataFrame: Pandas DataFrame is a two (or more) dimensional data structure – basically a table with rows and columns. The columns have names and the rows have indexes.
Why use Pandas?
Pandas has following advantages:
- Easily handles missing data
- It uses Series for one-dimensional data structure and Dataframe for multi-dimensional data structure
- It provides an efficient way to slice the data
- It provides a flexible way to merge, concatenate or reshape the data
- It includes a powerful time series tool to work with
How to install Pandas?
To install Python Pandas, go to your command line/ terminal and type “pip install pandas” or else, if you have anaconda installed in your system, just type in “conda install pandas”. Once the installation is completed, go to your IDE (Jupyter, PyCharm etc.) and simply import it by typing: “import pandas as pd”.
How to open data files in pandas?
We might have our data in .csv files or SQL tables. Maybe Excel files. Or .tsv files. Or something else. But the goal is the same in all cases. If we want to analyse that data using pandas, the first step will be to read it into a data structure that’s compatible with pandas.
Loading a .csv file into a pandas DataFrame
Let’s load a .csv data file into pandas for this we have a function for it, called read_csv(). Import the pandas library. Read titanic.csv into a DataFrame.
Loading a file in other format into a pandas DataFrame
We can also load data in other format(.tsv, excel, sql and so on) by using following functions.
Export Pandas DataFrame to a CSV File and other format(.tsv, sql, excel…..)
Let’s export the pandas DataFrame by using the following functions:
Through this blog, we have learnt about the basics of pandas library.
About Srishty Suman
Srishty Suman has 1+ years of experience in Data Engineering. She mostly worked in organising data model from unorganised data from multiple courses. She also prepared prepared statistical information reports and represented them graphically for better visualisation. In her free time, she likes writing blogs, traveling and exploring new places.