# 小白学 Python 数据分析（3）：Pandas （二）数据结构 Series

2019/04/10 10:10

## 引言

Pandas 有两种主要的数据结构： Series 和 DataFrame ，本文就先介绍第一种 Series 。

## 模块导入

``````import numpy as np
import pandas as pd
``````

## Series

Series 可以简单的理解为一维数组，可以存储整数、浮点数、字符串、Python 对象等类型的数据。

``````s = pd.Series(np.random.rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
print(s.index)

s1 = pd.Series(np.random.randn(5))
print(s1)
``````

``````a    0.218164
b    0.153201
c    0.572437
d    0.142784
e    0.710664
dtype: float64
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
0    0.255452
1    1.354357
2    2.092490
3    0.353899
4    1.692989
dtype: float64
``````

``````s = pd.Series(np.random.rand(6), index=['a', 'b', 'c', 'd', 'e'])
``````

``````ValueError: Length of passed values is 6, index implies 5
``````

## 字典实例化

Series 是可以使用字典进行实例化的，示例如下：

``````d = {'b': 1, 'a': 0, 'c': 2}
s2 = pd.Series(d)
print(s2)
``````

``````b    1
a    0
c    2
dtype: int64
``````

``````s3 = pd.Series(d, index=['b', 'c', 'd', 'a'])
print(s3)
``````

``````b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64
``````

## 标量值实例化

data 还支持标量值进行实例化，当 data 是标量值的时候，实例化 Series 的时候必须提供索引， Series 将会按照索引的长度重复这个标量值，如下：

``````s4 = pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
print(s4)
``````

``````a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64
``````

## 基于索引的操作方式

``````print(s[0])
print(s[:3])
print(s[s > s.median()])
print(s[[4, 3, 1]])
``````

``````0.481205137399224

a    0.481205
b    0.045604
c    0.108321
dtype: float64

d    0.495208
e    0.817171
dtype: float64

e    0.817171
d    0.495208
b    0.045604
dtype: float64
``````

``````print(s['a'])
s['e'] = 12.
print(s)
``````

``````0.481205137399224

a     0.481205
b     0.045604
c     0.108321
d     0.495208
e    12.000000
dtype: float64
``````

``````# 抛出 KeyError 异常
# print(s['f'])
``````

``````print('e' in s)
print('f' in s)
``````

``````True
False
``````

``````print(s.get('f'))
print(s.get('f', np.nan))
``````

``````None
nan
``````

## 常用方法

``````# 打印 e 的幂次方， e 是一个常数为 2.71828
print (np.exp(s))
# 打印 s 里每个元素的开方
print (np.sqrt(s))
print(s.dtype)
print(s.array)
print(s.to_numpy())
``````

``````a    1.618023
b    1.046659
c    1.114406
d    1.640840
e    2.264085
dtype: float64

a    0.693690
b    0.213550
c    0.329122
d    0.703711
e    0.903975
dtype: float64

float64

<PandasArray>
[  0.481205137399224, 0.04560362121419126, 0.10832121726528887,
0.49520848929233285,  0.8171705710254773]
Length: 5, dtype: float64

[0.48120514 0.04560362 0.10832122 0.49520849 0.81717057]
``````

``````print(s[1:] + s[:-1])
``````

``````a         NaN
b    0.091207
c    0.216642
d    0.990417
e         NaN
dtype: float64
``````

## 名称属性

Series 支持 `name` 属性，我们可以给我们自己定义的 Series 起一个自己喜欢的名字：

``````s5 = pd.Series(np.random.randn(5), name='my_series')
print(s5)
print(s5.name)
print(id(s5))
``````

``````0    0.491450
1    0.939965
2    0.868437
3   -0.099575
4    1.866875
Name: my_series, dtype: float64

my_series

1492397351368
``````

``````# 重命名 series
s6 = s5.rename("my_series_different")
print(id(s6))
``````

``````1492397351368
0    0.491450
1    0.939965
2    0.868437
3   -0.099575
4    1.866875
Name: my_series_different, dtype: float64

1492400065800
``````

## 参考

https://www.pypandas.cn/docs/getting_started/dsintro.html

0
0 收藏

0 评论
0 收藏
0