灰大羊

## Getting the data from Web

``````if(!file.exists("./db")){
dir.create("./db")
}

``````

## Looking at a bit of the data

``````head(restData, n=3)
tail(restData, n=3)
``````

## Make summary

``````summary(restData)
``````

## More in depth information

``````str(restData)
``````

## Quantiles of quantitative variables

The generic function quantile produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.

``````> quantile(restData\$councilDistrict, na.rm = T)
0%  25%  50%  75% 100%
1    2    9   11   14
> quantile(restData\$councilDistrict, probs = c(0.5, 0.75, 0.9))
50% 75% 90%
9  11  12
``````
• `x` - numeric vector whose sample quantiles are wanted, or an object of a class for which a method has been defined (see also ‘details’). NA and NaN values are not allowed in numeric vectors unless na.rm is TRUE.
• `probs` - numeric vector of probabilities with values in [0,1]. (Values up to 2e-14 outside that range are accepted and moved to the nearby endpoint.)
• `na.rm` - logical; if true, any NA and NaN's are removed from x before the quantiles are computed.
• `names` - logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs.
• `type` - an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
• `...` - further arguments passed to or from other methods.

## Make table

``````> table(restData\$zipCode, useNA = "ifany")

-21226  21201  21202  21205  21206  21207  21208  21209  21210  21211  21212  21213  21214  21215  21216  21217  21218  21220
1    136    201     27     30      4      1      8     23     41     28     31     17     54     10     32     69      1

> table(restData\$councilDistrict, restData\$zipCode)

-21226 21201 21202 21205 21206 21207 21208 21209 21210 21211 21212 21213 21214 21215 21216 21217 21218 21220 21222 21223
1       0     0    37     0     0     0     0     0     0     0     0     2     0     0     0     0     0     0     7     0
2       0     0     0     3    27     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
3       0     0     0     0     0     0     0     0     0     0     0     2    17     0     0     0     3     0     0     0
4       0     0     0     0     0     0     0     0     0     0    27     0     0     0     0     0     0     0     0     0
5       0     0     0     0     0     3     0     6     0     0     0     0     0    31     0     0     0     0     0     0
6       0     0     0     0     0     0     0     1    19     0     0     0     0    15     1     0     0     0     0     0
``````

## Check for missing values

``````sum(is.na(restData\$councilDistrict))
any(is.na(restData\$councilDistrict))
all(restData\$zipCode > 0)
``````

## Row and column sums

``````colSums(is.na(restData))
all(colSums(is.na(restData)) == 0)
all(restData\$zipCode > 0)
``````

## Values with specific characteristics

``````> table(restData\$zipCode %in% c("21212"))

FALSE  TRUE
1299    28

> table(restData\$zipCode %in% c("21212", "21213"))

FALSE  TRUE
1268    59

> restData[restData\$zipCode %in% c("21212", "21213"), ]
name zipCode                neighborhood councilDistrict policeDistrict
29                      BAY ATLANTIC CLUB   21212                    Downtown              11        CENTRAL
39                            BERMUDA BAR   21213               Broadway East              12        EASTERN
92                              ATWATER'S   21212   Chinquapin Park-Belvedere               4       NORTHERN
111            BALTIMORE ESTONIAN SOCIETY   21213          South Clifton Park              12        EASTERN
187                              CAFE ZEN   21212                    Rosebank               4       NORTHERN
``````

## Cross tabs

``````data(UCBAdmissions)
DF
summary(DF)

xt <- xtabs(Freq ~ Gender + Admit, data = DF)   ## Freq must be a column which could be compute, like integer or numeric
xt
``````

## Flat tables

``````> warpbreaks\$replicate <- rep(1:9, len = 54)
> xt = xtabs(breaks ~., data = warpbreaks)        ## equals to xtabs(breaks ~ wool + tension + replicate, data = warpbreaks)
> xt
, , replicate = 1

tension
wool  L  M  H
A 26 18 36
B 27 42 20

, , replicate = 2

tension
wool  L  M  H
A 30 21 21
B 14 26 21

, , replicate = 3

tension
wool  L  M  H
A 54 29 24
B 29 19 24

> ftable(xt)
replicate  1  2  3  4  5  6  7  8  9
wool tension
A    L                 26 30 54 25 70 52 51 26 67
M                 18 21 29 17 12 18 35 30 36
H                 36 21 24 18 10 43 28 15 26
B    L                 27 14 29 19 29 31 41 20 44
M                 42 26 19 16 39 28 21 39 29
H                 20 21 24 17 13 15 15 16 28
``````

## Size of a data set

``````> fakeData = rnorm(1e5)
> object.size(fakeData)
800040 bytes
> print(object.size(fakeData), units = "Mb")
0.8 Mb
``````

### 灰大羊

R语言学习笔记之相关性矩阵分析及其可视化

R语言中文社区
2018/02/05
0
0
ggplot2学习笔记系列之利用ggplot2绘制误差棒及显著性标记

R语言中文社区
2018/02/12
0
0
R语言可视化学习笔记之相关矩阵可视化包ggcorrplot

R语言中文社区
2018/01/25
0
0
R语言data manipulation学习笔记之创建变量、重命名、数据融合

R语言中文社区
2018/03/26
0
0
R语言学习笔记之聚类分析

R语言中文社区
2018/01/16
0
0

OSChina 周二乱弹 —— 吾不好梦中插人

Osc乱弹歌单（2019）请戳（这里） 【今日歌曲】 @鱼豆腐233 ：#今日歌曲分享# 分享My Chemical Romance的单曲《I Don't Love You》: 《I Don't Love You》- My Chemical Romance 手机党少年们...

47分钟前
17
4
ss5 vpn 安装(linux版本)

1. 创建一个文件夹 /ss5 你也可以自定义,不过后续的地方需要注意自己的地址 2. 下载ss5文件(如果你的服务器没有安装wget请使用 yum -y install wget 命令安装 如果连yum都没安装自己查去)(下...

2
0

XuePeng77

5
0
mac系统下，brew 安装mysql，用终端可以连接，navicat却连接不上？

3
0

6
0