作为R语言新手,在使用R语言的过程中遇到了很多的问题。本文总结了几个常见问题的解决方法,希望对需要的人能有所帮助。
R语言几个解决方法总结
初学R语言会遇到很多问题,都是细细碎碎的小点,但是最困难的是,以前也没有系统的学过Matlab,对于类似语言的使用尤其是在一些思想上一时半会转变不过来。只好边学边用先积累了,这里把遇到的一些问题总结一下记录下来。每个问题都是独立的,彼此之间没有什么联系。
sprintf调用C函数sprintf,可以用来格式化字符串
> sprintf("%04d", 1)
[1] "0001"
> sprintf("%04d", 104)
[1] "0104"
> sprintf("%010d", 104)
[1] "0000000104"
安装data.table
data.table是个好用的包,安装方法可以参考这里:https://class.coursera.org/getdata-008/forum/thread?thread_id=58
Here is how I installed the data.table package:
Used my browser to download data.table_1.9.4.zip from page http://cran.r-project.org/web/packages/data.table/index.html
Put the downloaded file in my R working directory.
> install.packages("data.table_1.9.4.zip", repos=NULL)
> install.packages("plyr")
> install.packages("Rcpp")
> install.packages("rshape2")
> install.packages("chron")
Once done with that, I could do:
> library(data.table)
and everything else worked.
order的用法
对一个vector或者data.frame排序可以使用order函数,关于order和rank函数的使用和结果解释需要注意:
You can use the order() function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order) code:
R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 22 Med D 3 11 Hi A 8 13 Hi A 9 1
Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:
R> dd[ order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 22 Med D 3 11 Hi A 8 13 Hi A 9 1
R>
rather than using the name of the column (and with() for easier/more direct access).
关于order函数结果的解释:
The definition of order is that a[order(a)] is in increasing order. This works with your example, where the correct order is the fourth, second, first, then third element.
You may have been looking for rank, which returns the rank of the elements
R> a <- c(4.1, 3.2, 6.1, 3.1)
R> order(a)
[1] 4 2 1 3
R> rank(a)
[1] 3 2 4 1
so rank tells you what order the numbers are in, order tells you how to get them in ascending order.
plot(a, rank(a)/length(a)) will give a graph of the CDF. To see why order is useful, though, try plot(a, rank(a)/length(a),type="S") which gives a mess, because the data are not in increasing order
If you did
oo<-order(a)
plot(a[oo],rank(a[oo])/length(a),type="S")
or simply
oo<-order(a)
plot(a[oo],(1:length(a))/length(a)),type="S")
you get a line graph of the CDF.
判断vector中是否有某一个元素
v <- c('a','b','c','e')
'b' %in% v
## returns TRUE
match('b',v)
## returns the first location of 'b', in this case: 2
> x <- sample(1:10)
> x
[1] 4 5 9 3 8 1 6 10 7 2
> match(c(4,8),x)
[1] 1 5
match only returns the first encounter of a match, as you requested.
For multiple matching, %in% is the way to go :
> x <- sample(1:4,10,replace=T)
> x
[1] 3 4 3 3 2 3 1 1 2 2
> which(x %in% c(2,4))[1] 2 5 9 10
关于给vector中添加元素
Here are several ways to do it. All of them are discouraged. Appending to an object in a for loop causes the entire object to be copied on every iteration, which causes a lot of people to say "R is slow", or "R loops should be avoided".
# one way
for (i in 1:length(values))
vector[i] <- values[i]
# another way
for (i in 1:length(values))
vector <- c(vector, values[i])
# yet another way?!?
for (v in values)
vector <- c(vector, v)
# ... more ways
help("append") would have answered your question and saved the time it took you to write this question (but would have caused you to develop bad habits). ;-)
Note that vector <- c() isn't an empty vector; it's NULL. If you want an empty character vector, use vector <- character().
Also note, as BrodieG pointed out in the comments: if you absolutely must use a for loop, then at least pre-allocate the entire vector before the loop. This will be much faster than appending for larger vectors.
set.seed(21)
values <- sample(letters, 1e4, TRUE)
vector <- character(0)# slow
system.time( for (i in 1:length(values)) vector[i] <- values[i] )
# user system elapsed
# 0.340 0.000 0.343
vector <- character(length(values))# fast(er)
system.time( for (i in 1:length(values)) vector[i] <- values[i] )
# user system elapsed
# 0.024 0.000 0.023
要注意的是,这里有性能方面的问题。
删除data.frame中的一列
> head(data)
chr genome region
1 chr1 hg19_refGene CDS
2 chr1 hg19_refGene exon
3 chr1 hg19_refGene CDS
4 chr1 hg19_refGene exon
5 chr1 hg19_refGene CDS
6 chr1 hg19_refGene exon
You can set it to NULL.
> Data$genome <- NULL
> head(Data)
chr region
1 chr1 CDS
2 chr1 exon
3 chr1 CDS
4 chr1 exon
5 chr1 CDS
6 chr1 exon
As pointed out in the comments, here are some other possibilities:
Data[2] <- NULL # Wojciech Sobala
Data[[2]] <- NULL # same as above
Data <- Data[,-2] # Ian Fellows
Data <- Data[-2] # same as above
You can remove multiple columns via:
Data[1:2] <- list(NULL) # Marek
Data[1:2] <- NULL # does not work!
Be careful with matrix-subsetting though, as you can end up with a vector:
Data <- Data[,-(2:3)] # vector
Data <- Data[,-(2:3),drop=FALSE] # still a data.frame
从字符串中去除括号
string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) :
# invalid regular expression 'log(', reason 'Missing ')''
Escape the parenthesis with a double-backslash:
要用双斜线来转义括号
gsub("log\\(", "", string)