[Data] 用Python Pandas完成类似Excel的数据处理

博客首页 » Data 用Python Pandas完成类似Excel的数据处理

发布于 11 Sep 2015 09:58
标签 blog
http://pbpython.com/excel-pandas-comp.html
这个文章里介绍了用Python Pandas完成类似Excel的数据处理

import pandas as pd
import numpy as np

#读取
df = pd.read_excel("excel-comp-data.xlsx")
df.head()

#合计
df["total"] = df["Jan"] + df["Feb"] + df["Mar"]
df.head()

#数据描述
df["Jan"].sum(),df["Jan"].mean(),df["Jan"].min(),df["Jan"].max()
(1462000, 97466.666666666672, 10000, 162000)

#合计
sum_row=df[["Jan","Feb","Mar","total"]].sum()
sum_row
Jan 1462000
Feb 1507000
Mar 717000
total 3686000
dtype: int64

#转置
df_sum=pd.DataFrame(data=sum_row).T
df_sum
Jan Feb Mar total
0 1462000 1507000 717000 3686000

#列补齐
df_sum=df_sum.reindex(columns=df.columns)
df_sum
account name street city state postal-code Jan Feb Mar total
0 NaN NaN NaN NaN NaN NaN 1462000 1507000 717000 3686000

#拼接
df_final=df.append(df_sum,ignore_index=True)
df_final.tail()

14 273274 McDermott PLC 8917 Bergstrom Meadow Kathryneborough Delaware 27933 150000 120000 70000 340000
15 NaN NaN NaN NaN NaN NaN 1462000 1507000 717000 3686000

#复杂的转换
process.extractOne("Minnesotta",choices=state_to_code.keys())
('MINNESOTA', 95)
process.extractOne("AlaBAMMazzz",choices=state_to_code.keys(),score_cutoff=80)

def convert_state(row):
abbrev = process.extractOne(row["state"],choices=state_to_code.keys(),score_cutoff=80)
if abbrev:
return state_to_code[abbrev[0]]
return np.nan

#添加列
df_final.insert(6, "abbrev", np.nan)
df_final.head()

account name street city state postal-code abbrev Jan Feb Mar total
0 211829 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 NaN 10000 62000 35000 107000

#apply执行
df_final['abbrev'] = df_final.apply(convert_state, axis=1)
df_final.tail()

account name street city state postal-code abbrev Jan Feb Mar total
11 231907 Hahn-Moore 18115 Olivine Throughway Norbertomouth NorthDakota 31415 ND 150000 10000 162000 322000

#分组统计 subtotal groupby
df_sub=df_final[["abbrev","Jan","Feb","Mar","total"]].groupby('abbrev').sum()
df_sub
Jan Feb Mar total
abbrev
AR 150000 120000 35000 305000
CA 162000 120000 35000 317000

#applymap
def money(x):
return "${:,.0f}".format(x)

formatted_df = df_sub.applymap(money)
formatted_df

Jan Feb Mar total
abbrev
AR $150,000 $120,000 $35,000 $305,000
CA $162,000 $120,000 $35,000 $317,000

#
sum_row=df_sub[["Jan","Feb","Mar","total"]].sum()
sum_row
Jan 1462000
Feb 1507000
Mar 717000
total 3686000
dtype: int64

df_sub_sum=pd.DataFrame(data=sum_row).T
df_sub_sum=df_sub_sum.applymap(money)
df_sub_sum
Jan Feb Mar total
0 $1,462,000 $1,507,000 $717,000 $3,686,000

#添加
final_table = formatted_df.append(df_sub_sum)
final_table
Jan Feb Mar total
AR $150,000 $120,000 $35,000 $305,000
CA $162,000 $120,000 $35,000 $317,000

#换名
final_table = final_table.rename(index={0:"Total"})
final_table
Jan Feb Mar total
AR $150,000 $120,000 $35,000 $305,000
CA $162,000 $120,000 $35,000 $317,000


本页面的文字允许在知识共享 署名-相同方式共享 3.0协议和GNU自由文档许可证下修改和再使用,仅有一个特殊要求,请用链接方式注明文章引用出处及作者。请协助维护作者合法权益。


系列文章

文章列表

  • Data 用Python Pandas完成类似Excel的数据处理

这篇文章对你有帮助吗,投个票吧?

rating: 0+x

留下你的评论

Add a New Comment