文档章节

自定义函数方法

leonhu
 leonhu
发布于 2017/07/09 15:37
字数 774
阅读 26
收藏 0

使用Pandas apply()函数自定义队数据进行操作

import pandas as pd
import numpy as np
# read titanic_train.csv
titanic_survival = pd.read_csv('titanic_train.csv')

# print fist 5 rows
print(titanic_survival.head())
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  
# get page column
# NaN stands for "not a number", to indicate a missing data
age = titanic_survival['Age']
print(age[:10])
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
5     NaN
6    54.0
7     2.0
8    27.0
9    14.0
Name: Age, dtype: float64
# we can use the pandas.isnull() function which takes a pandas series
# and returns a series of True and False values
age_is_null = pd.isnull(age)
print(age_is_null[:10])
0    False
1    False
2    False
3    False
4    False
5     True
6    False
7    False
8    False
9    False
Name: Age, dtype: bool
age_null_true = age[age_is_null]
print(age_null_true[:10])
5    NaN
17   NaN
19   NaN
26   NaN
28   NaN
29   NaN
31   NaN
32   NaN
36   NaN
42   NaN
Name: Age, dtype: float64
# print the number of age_null_count
age_null_count = len(age_null_true)
print(age_null_count)
177
# The result of this is that mean_age would be NaN.
# This is because any calculations we do with a null value also result in a null value
mean_age = sum(titanic_survival['Age']) / len(titanic_survival['Age'])
print(mean_age)
nan
good_ages = titanic_survival['Age'][age_is_null == False]
correct_mean_age = sum(good_ages) / len(good_ages)
print(correct_mean_age)
29.6991176471
# missing data is so common that many pandas methods automatically filter for it
correct_mean_age = titanic_survival['Age'].mean()
print(correct_mean_age)
29.69911764705882
# mean fare for each class
passgenger_class = [1, 2, 3]
fare = {}
for each_class in passgenger_class:
    mean_fee = titanic_survival['Fare'][titanic_survival['Pclass']==each_class].mean()
    fare[each_class] = mean_fee
print(fare)
{1: 84.15468749999992, 2: 20.66218315217391, 3: 13.675550101832997}
pclass_survived = titanic_survival.\
    pivot_table(index='Pclass', values='Survived', aggfunc=np.mean)
print(pclass_survived)
Pclass
1    0.629630
2    0.472826
3    0.242363
Name: Survived, dtype: float64
pclass_age = titanic_survival.pivot_table(index='Pclass', values=['Age', 'Survived'])
print(pclass_age)
              Age  Survived
Pclass                     
1       38.233441  0.629630
2       29.877630  0.472826
3       25.140620  0.242363
name_row_19 = titanic_survival.loc[19]['Name']
print(name_row_19)
Masselmani, Mrs. Fatima
new_titanic_survival = titanic_survival.sort_values('Age',ascending=False)
# print(new_titanic_survival)
new_titanic_survival = new_titanic_survival.reset_index(level='Pclass', drop=True)
print(new_titanic_survival.loc[0:2])
   level_0  index  PassengerId  Survived  Pclass  \
0        0    630          631         1       1   
1        1    851          852         0       3   
2        2    493          494         0       1   

                                   Name   Sex   Age  SibSp  Parch    Ticket  \
0  Barkworth, Mr. Algernon Henry Wilson  male  80.0      0      0     27042   
1                   Svensson, Mr. Johan  male  74.0      0      0    347060   
2               Artagaveytia, Mr. Ramon  male  71.0      0      0  PC 17609   

      Fare Cabin Embarked  
0  30.0000   A23        S  
1   7.7750   NaN        S  
2  49.5042   NaN        C  
def hundredth_row(column):
    # Extract the hundredth item
    hundredth_item = column.iloc[99]
    return hundredth_item
hundredth_row = titanic_survival.apply(hundredth_row)
print(hundredth_row)
PassengerId                  100
Survived                       0
Pclass                         2
Name           Kantor, Mr. Sinai
Sex                         male
Age                           34
SibSp                          1
Parch                          0
Ticket                    244367
Fare                          26
Cabin                        NaN
Embarked                       S
dtype: object
# By passing in the axis = 1 argument, we can use the DataFrame.apply method
# to iterate rows instead of columns
def which_class(row):
    pclass = row['Pclass']
    if pd.isnull(pclass):
        return 'Unknown'
    elif pclass == 1:
        return 'First Class'
    elif pclass == 2:
        return 'Second Class'
    elif pclass == 3:
        return 'ThirdClass'
classes = titanic_survival.apply(which_class, axis=1)
print(classes[:10])
0      ThirdClass
1     First Class
2      ThirdClass
3     First Class
4      ThirdClass
5      ThirdClass
6     First Class
7      ThirdClass
8      ThirdClass
9    Second Class
dtype: object
def is_minor(row):
    if row['Age'] < 18:
        return True
    else:
        return False
minors = titanic_survival.apply(is_minor, axis=1)
print(minors[:5])
0    False
1    False
2    False
3    False
4    False
dtype: bool
def generate_age_label(row):
    age = row['Age']
    if pd.isnull(age):
        return 'Unknown'
    elif age < 18:
        return 'minor'
    else:
        return 'adult'
age_label = titanic_survival.apply(generate_age_label, axis=1)
print(age_label[:5])
0    adult
1    adult
2    adult
3    adult
4    adult
dtype: object
titanic_survival['age_label'] = age_label
age_group_survival = titanic_survival.pivot_table(index='age_label', values='Survived')
print(age_group_survival)
age_label
Unknown    0.293785
adult      0.381032
minor      0.539823
Name: Survived, dtype: float64

© 著作权归作者所有

leonhu
粉丝 0
博文 38
码字总数 13436
作品 0
深圳
私信 提问
Java8 新语法习惯 (函数接口)

了解如何创建自定义函数接口,以及为什么应该尽量使用内置的接口。概览 lambda 表达式的类型是什么?一些语言使用函数值或函数对象来表示 lambda 表达式,但是 Java 语言没有这么做。Java 使...

晁东洋
2018/01/10
0
0
EntityFramework Core 2.0自定义标量函数两种方式

前言 上一节我们讲完原始查询如何防止SQL注入问题同时并提供了几种方式。本节我们继续来讲讲EF Core 2.0中的新特性自定义标量函数。 自定义标量函数两种方式 在EF Core 2.0中我们可以将方法映...

dotNET跨平台
2018/03/06
0
0
django为Form生成的label标签添加class

使用Form生成html标签的时候,虽然提供了widget的方法可以自定义标签的要是,但是只能给生成的input标签添加样式,对于生成的label标签无法添加样式。而很多场景下需要为label和input都添加c...

骑士救兵
2018/05/25
0
0
Dojo 的代码重用

引言 Dojo 工具包为程序员提供了很多功能丰富的控件,但是在实际应用中,很多时候程序员需要自定义控件来满足实际需求,如开发统一 UI 风格的控件库,开发具有通用逻辑组合的 Dojo 控件和更方...

IBMdW
2011/09/25
2.3K
0
Cocos2d-x 3.1 Director ActionManger Scheduler初步分析

Director游戏主循环显示Node DisplayLinkDirector继承Director override了以下方法 是游戏主循环,通过设置主循环每秒的调用次数。的代码: 在方法中,会动用scene的方法。 在方法中,如果n...

Nov_Eleven
2014/08/04
1K
2

没有更多内容

加载失败,请刷新页面

加载更多

Spring Boot + Mybatis-Plus 集成与使用(二)

前言: 本章节介绍MyBatis-Puls的CRUD使用。在开始之前,先简单讲解下上章节关于Spring Boot是如何自动配置MyBatis-Plus。 一、自动配置 当Spring Boot应用从主方法main()启动后,首先加载S...

伴学编程
昨天
7
0
用最通俗的方法讲spring [一] ──── AOP

@[TOC](用最通俗的方法讲spring [一] ──── AOP) 写这个系列的目的(可以跳过不看) 自己写这个系列的目的,是因为自己是个比较笨的人,我曾一度怀疑自己的智商不适合干编程这个行业.因为在我...

小贼贼子
昨天
7
0
Flutter系列之在 macOS 上安装和配置 Flutter 开发环境

本文为Flutter开发环境在macOS下安装全过程: 一、系统配置要求 想要安装并运行 Flutter,你的开发环境需要最低满足以下要求: 操作系统:macOS(64位) 磁盘空间:700 MB(不包含 IDE 或其余...

過愙
昨天
6
0
OSChina 周六乱弹 —— 早上儿子问我他是怎么来的

Osc乱弹歌单(2019)请戳(这里) 【今日歌曲】 @凉小生 :#今日歌曲推荐# 少点戾气,愿你和这个世界温柔以待。中岛美嘉的单曲《僕が死のうと思ったのは (曾经我也想过一了百了)》 《僕が死の...

小小编辑
昨天
2.7K
16
Excption与Error包结构,OOM 你遇到过哪些情况,SOF 你遇到过哪些情况

Throwable 是 Java 中所有错误与异常的超类,Throwable 包含两个子类,Error 与 Exception 。用于指示发生了异常情况。 Java 抛出的 Throwable 可以分成三种类型。 被检查异常(checked Exc...

Garphy
昨天
42
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部