小白学 Python 数据分析（4）：Pandas （三）数据结构 DataFrame

2019/04/10 10:10

引言

DataFrame 是由多种类型的列构成的二维标签数据结构。

DataFrame 是最常用的 Pandas 对象，与 Series 一样，DataFrame 支持多种类型的输入数据：

• 一维 ndarray、列表、字典、Series 字典
• 二维 numpy.ndarray
• 结构多维数组或记录多维数组
• Series
• DataFrame

构建 DataFrame

Python < 3.6 或 Pandas < 0.23，且未指定 columns 参数时，DataFrame 的列按字典键的字母排序。

Series 字典或字典构建 DataFrame

``````d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)
``````

``````   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
``````

``````df1 = pd.DataFrame(d, index=['d', 'b', 'a'])
print(df1)
``````

``````   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0
``````

``````df2 = pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
print(df2)
``````

``````   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN
``````

多维数组字典构建 DataFrame

``````d1 = {'one': [1., 2., 3., 4.],
'two': [4., 3., 2., 1.]}

df3 = pd.DataFrame(d1)
print(df3)

df4 = pd.DataFrame(d1, index=['a', 'b', 'c', 'd'])
print(df4)
``````

``````   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0
``````

列表字典构建 DataFrame

``````d2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

df5 = pd.DataFrame(d2)
print(df5)

df6 = pd.DataFrame(d2, index=['first', 'second'], columns=['a', 'b'])
print(df6)
``````

``````   a   b     c
0  1   2   NaN
1  5  10  20.0

a   b
first   1   2
second  5  10
``````

元组字典构建 DataFrame

``````d3 = ({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})

df7 = pd.DataFrame(d3)
print(df7)
``````

``````       a              b
b    a    c    a     b
A B  1.0  4.0  5.0  8.0  10.0
C  2.0  3.0  6.0  7.0   NaN
D  NaN  NaN  NaN  NaN   9.0
``````

提取、添加、删除

提取

``````# 获取数据
print(df4)
# 按列获取
print(df4['one'])
# 按行获取
print(df4.loc['a'])
print(df4.iloc[0])

df4['three'] = df4['one'] * df4['two']
df4['flag'] = df4['one'] > 2
print(df4)
``````

``````   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0

a    1.0
b    2.0
c    3.0
d    4.0
Name: one, dtype: float64

one    1.0
two    4.0
Name: a, dtype: float64

one    1.0
two    4.0
Name: a, dtype: float64

one  two  three   flag
a  1.0  4.0    4.0  False
b  2.0  3.0    6.0  False
c  3.0  2.0    6.0   True
d  4.0  1.0    4.0   True
``````

删除

``````# 删除数据
del df4['two']
df4.pop('three')
print(df4)
``````

``````   one   flag
a  1.0  False
b  2.0  False
c  3.0   True
d  4.0   True
``````

增加

``````# 插入数据
df4['foo'] = 'bar'
print(df4)
``````

``````   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  4.0   True  bar
``````

``````df4['one_trunc'] = df4['one'][:2]
print(df4)
``````

``````   one   flag  foo  one_trunc
a  1.0  False  bar        1.0
b  2.0  False  bar        2.0
c  3.0   True  bar        NaN
d  4.0   True  bar        NaN
``````

``````df4.insert(1, 'bar', df4['one'])
print(df4)
``````

``````   one  bar   flag  foo  one_trunc
a  1.0  1.0  False  bar        1.0
b  2.0  2.0  False  bar        2.0
c  3.0  3.0   True  bar        NaN
d  4.0  4.0   True  bar        NaN
``````

参考

https://www.pypandas.cn/docs/getting_started/dsintro.html

0
0 收藏

0 评论
0 收藏
0