博客首页 » Data 用Python Pandas完成类似Excel的数据处理
发布于 11 Sep 2015 09:58
标签 blog
http://pbpython.com/excel-pandas-comp.html
这个文章里介绍了用Python Pandas完成类似Excel的数据处理
import pandas as pd
import numpy as np
#读取
df = pd.read_excel("excel-comp-data.xlsx")
df.head()
#合计
df["total"] = df["Jan"] + df["Feb"] + df["Mar"]
df.head()
#数据描述
df["Jan"].sum(),df["Jan"].mean(),df["Jan"].min(),df["Jan"].max()
(1462000, 97466.666666666672, 10000, 162000)
#合计
sum_row=df[["Jan","Feb","Mar","total"]].sum()
sum_row
Jan 1462000
Feb 1507000
Mar 717000
total 3686000
dtype: int64
#转置
df_sum=pd.DataFrame(data=sum_row).T
df_sum
Jan Feb Mar total
0 1462000 1507000 717000 3686000
#列补齐
df_sum=df_sum.reindex(columns=df.columns)
df_sum
account name street city state postal-code Jan Feb Mar total
0 NaN NaN NaN NaN NaN NaN 1462000 1507000 717000 3686000
#拼接
df_final=df.append(df_sum,ignore_index=True)
df_final.tail()
14 273274 McDermott PLC 8917 Bergstrom Meadow Kathryneborough Delaware 27933 150000 120000 70000 340000
15 NaN NaN NaN NaN NaN NaN 1462000 1507000 717000 3686000
#复杂的转换
process.extractOne("Minnesotta",choices=state_to_code.keys())
('MINNESOTA', 95)
process.extractOne("AlaBAMMazzz",choices=state_to_code.keys(),score_cutoff=80)
def convert_state(row):
abbrev = process.extractOne(row["state"],choices=state_to_code.keys(),score_cutoff=80)
if abbrev:
return state_to_code[abbrev[0]]
return np.nan
#添加列
df_final.insert(6, "abbrev", np.nan)
df_final.head()
account name street city state postal-code abbrev Jan Feb Mar total
0 211829 Kerluke, Koepp and Hilpert 34456 Sean Highway New Jaycob Texas 28752 NaN 10000 62000 35000 107000
#apply执行
df_final['abbrev'] = df_final.apply(convert_state, axis=1)
df_final.tail()
account name street city state postal-code abbrev Jan Feb Mar total
11 231907 Hahn-Moore 18115 Olivine Throughway Norbertomouth NorthDakota 31415 ND 150000 10000 162000 322000
#分组统计 subtotal groupby
df_sub=df_final[["abbrev","Jan","Feb","Mar","total"]].groupby('abbrev').sum()
df_sub
Jan Feb Mar total
abbrev
AR 150000 120000 35000 305000
CA 162000 120000 35000 317000
#applymap
def money(x):
return "${:,.0f}".format(x)
formatted_df = df_sub.applymap(money)
formatted_df
Jan Feb Mar total
abbrev
AR $150,000 $120,000 $35,000 $305,000
CA $162,000 $120,000 $35,000 $317,000
#
sum_row=df_sub[["Jan","Feb","Mar","total"]].sum()
sum_row
Jan 1462000
Feb 1507000
Mar 717000
total 3686000
dtype: int64
df_sub_sum=pd.DataFrame(data=sum_row).T
df_sub_sum=df_sub_sum.applymap(money)
df_sub_sum
Jan Feb Mar total
0 $1,462,000 $1,507,000 $717,000 $3,686,000
#添加
final_table = formatted_df.append(df_sub_sum)
final_table
Jan Feb Mar total
AR $150,000 $120,000 $35,000 $305,000
CA $162,000 $120,000 $35,000 $317,000
#换名
final_table = final_table.rename(index={0:"Total"})
final_table
Jan Feb Mar total
AR $150,000 $120,000 $35,000 $305,000
CA $162,000 $120,000 $35,000 $317,000
本页面的文字允许在知识共享 署名-相同方式共享 3.0协议和GNU自由文档许可证下修改和再使用,仅有一个特殊要求,请用链接方式注明文章引用出处及作者。请协助维护作者合法权益。
系列文章
文章列表
- Data 用Python Pandas完成类似Excel的数据处理
这篇文章对你有帮助吗,投个票吧?
留下你的评论