Pandas is an extremely important Python library for Data Scientists, Data Analysts, Machine Learning Engineers, and others. It is widely used for diverse types of data analysis. It can read different type of files like CSV, Excel, Json, HTML, XML, HDF5, Parquet, Feather, Stata, SQL, Python Pickle, Google BigQuery etc. It can also write files in all the above formats. For more details, please click on the
Link
It provides many functions to manipulate data and it is difficult to remember all the functions with various parameters. We provide a unique facility wherein you just need to write in plain English what you want, and it will generate Python code using Pandas library. Though it is not necessary to give column names, it will be useful to give file name and relevant column names. Below, we describe the format:
First you should give file <file_name_with_extension>, then columns = [<colum_1>, <colum_2>, <colum_3>, ...]. It is not necessary to give all the column names, giving relevant column names will do. However, if you enter all the column names, that is also fine. It is mandatory to put one blank line after the table description.
Below we give four examples:
Example 1:
csv file employees.csv separated by comma, columns = [employee_id, employee_name, employee_salary, employee_hiredate]
Create new columns from hiredate to day, month, year, weekday, day of the year. Then, write to file employee_new.csv
Example 2:
file shares.csv separated by tab, columns = [Open, High, Low, Close]
Compute new columns from 'Close', moving average for 5 days and for 10 days write to file shares_new.csv
Example 3:
csv file employees.csv separated by comma, columns = [employee_id, employee_name, employee_salary, employee_hiredate]
Sort in descending order of employee_salary and on ascending order of employee_name
Example 4:
csv file employees.csv separated by comma, columns = [dept_id, employee_id, employee_name, employee_salary, employee_hiredate]
Group by dept_id and compute mean, median and standard deviation of employee salary