Pandas AI: The Generative AI Python Library

Originally posted on kdnuggets.

The road to simpler Data Analysis for data scientists and analysts, powered by OpenAI.

Python Pandas is an open-source toolkit which provides data scientists and analysts with data manipulation and analysis capabilities using the Python programming language. The Pandas library is very popular in the preprocessing phase of machine learning and deep learning. But now you can do more with it…

Incoming a new data science library – Pandas AI. A Python library that integrates generative artificial intelligence capabilities into Pandas, making data frames conversational.

What is Pandas AI?


What does making data frames conversational mean?

This means exactly what it says – you can speak with your dataset. Yes, you heard it, you can talk to your data and get fast responses. As a data scientist or analyst, you won’t need to be staring at your dataset, skimming through rows and columns for endless hours anymore. Pandas AI does not replace Pandas, it just gives it a big push!

Data scientists and analysts spend a lot of time cleaning data for the analysis phase. They will now be able to take their data analysis to the next level. Data professionals look into different methods and processes that they can use to minimize the time spent on data preparation, and now they can with Pandas AI.

PandasAI is to be used hand-in-hand with Pandas, it is not a replacement for Pandas. Rather than having to skim through and answer questions about the dataset yourself, you can ask PandasAI these questions and it will return answers in the form of Pandas DataFrames.

With that being said, does this mean that people no longer need to be proficient in Python to achieve data analysis using tools such as the Pandas library?

With the help of OpenAI API, Pandas AI aims to achieve the goal of virtually talking with a machine to output the results you want rather than having to program the task yourself. The machine will output the result in their language – machine-interpretable code (DataFrame).


How Do I Use Pandas AI?


Installing Pandas AI using pip


pip install pandasai


Importing PandasAI with OpenAI


In order to make use of the new Pandas AI library, you will need an OpenAI key. Once you start on your notebook, you will need to import the following:

import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

llm = OpenAI(api_token=your_API_key)


If you do not have a unique OpenAI API key, you can create an account on the OpenAI platform and create an API key here. You will receive a $5 credit that can be used towards exploring and experimenting with the API.

Once you are all set up, you’re ready to start using Pandas AI.


Running the Model on Your Dataframe


First, you will need to run your OpenAI model to Pandas AI:

pandas_ai = PandasAI(openAImodel)


You will then need to run the model on the data frame, which consists of ??two parameters the data frame you’re working with and the question you want to ask:, prompt='the question you would like to ask?')


For example, you may be looking through your dataset and are interested in the rows where the value of a column is greater than 5. You can do this by using Pandas AI:

import pandas as pd
from pandasai import PandasAI

# Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]

# Instantiate a LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI()

pandas_ai = PandasAI(llm), prompt='Which are the 5 happiest countries?')


It will return a DataFrame output:

6            Canada
7         Australia
1    United Kingdom
3           Germany
0     United States
Name: country, dtype: object


It also has the ability to perform more complex queries, such as mathematical calculations and data visualizations.

A data visualization example:
    "Plot the histogram of countries showing for each the gpd, using different colors for each bar",


Data visualization output:



Pandas AI: The Generative AI Python Library
Image by PandasAI

Wrapping it up


Although Pandas AI does not replace Pandas, it is a good tool to have to boost your workflow. Although you can ask Pandas AI questions about your dataset, you will still need to be proficient in programming to correct and direct the library when it makes mistakes.

Source: kdnuggets