R语言几个解决方法总结

作为R语言新手,在使用R语言的过程中遇到了很多的问题。本文总结了几个常见问题的解决方法,希望对需要的人能有所帮助。

R语言几个解决方法总结

初学R语言会遇到很多问题,都是细细碎碎的小点,但是最困难的是,以前也没有系统的学过Matlab,对于类似语言的使用尤其是在一些思想上一时半会转变不过来。只好边学边用先积累了,这里把遇到的一些问题总结一下记录下来。每个问题都是独立的,彼此之间没有什么联系。

sprintf调用C函数sprintf,可以用来格式化字符串

> sprintf("%04d", 1)
[1] "0001"
> sprintf("%04d", 104)
[1] "0104"
> sprintf("%010d", 104)
[1] "0000000104"

安装data.table

data.table是个好用的包,安装方法可以参考这里:https://class.coursera.org/getdata-008/forum/thread?thread_id=58

Here is how I installed the data.table package:

Used my browser to download data.table_1.9.4.zip from page http://cran.r-project.org/web/packages/data.table/index.html

Put the downloaded file in my R working directory.

> install.packages("data.table_1.9.4.zip", repos=NULL)
> install.packages("plyr")
> install.packages("Rcpp")
> install.packages("rshape2")
> install.packages("chron")

Once done with that, I could do:
> library(data.table)
and everything else worked.

order的用法

对一个vector或者data.frame排序可以使用order函数,关于order和rank函数的使用和结果解释需要注意:

You can use the order() function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order) code:
R> dd[with(dd, order(-z, b)), ]
    b x y z
4 Low C 9 22 Med D 3 11  Hi A 8 13  Hi A 9 1
Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:
R> dd[ order(-dd[,4], dd[,1]), ]
    b x y z
4 Low C 9 22 Med D 3 11  Hi A 8 13  Hi A 9 1
R> 
rather than using the name of the column (and with() for easier/more direct access).

关于order函数结果的解释:

The definition of order is that a[order(a)] is in increasing order. This works with your example, where the correct order is the fourth, second, first, then third element.
You may have been looking for rank, which returns the rank of the elements
R> a <- c(4.1, 3.2, 6.1, 3.1)
R> order(a)
[1] 4 2 1 3
R> rank(a)
[1] 3 2 4 1
so rank tells you what order the numbers are in, order tells you how to get them in ascending order.
plot(a, rank(a)/length(a)) will give a graph of the CDF. To see why order is useful, though, try plot(a, rank(a)/length(a),type="S") which gives a mess, because the data are not in increasing order
If you did
oo<-order(a)
plot(a[oo],rank(a[oo])/length(a),type="S")
or simply
oo<-order(a)
plot(a[oo],(1:length(a))/length(a)),type="S")
you get a line graph of the CDF.

判断vector中是否有某一个元素

v <- c('a','b','c','e')
'b' %in% v
## returns TRUE

match('b',v)
## returns the first location of 'b', in this case: 2

> x <- sample(1:10)
> x
 [1]  4  5  9  3  8  1  6 10  7  2
> match(c(4,8),x)
 [1] 1 5
match only returns the first encounter of a match, as you requested.
For multiple matching, %in% is the way to go :
> x <- sample(1:4,10,replace=T)
> x
 [1] 3 4 3 3 2 3 1 1 2 2
> which(x %in% c(2,4))[1]  2  5  9 10

关于给vector中添加元素

Here are several ways to do it. All of them are discouraged. Appending to an object in a for loop causes the entire object to be copied on every iteration, which causes a lot of people to say "R is slow", or "R loops should be avoided".

# one way
for (i in 1:length(values))
  vector[i] <- values[i]

# another way
for (i in 1:length(values))
  vector <- c(vector, values[i])

# yet another way?!?
for (v in values)
  vector <- c(vector, v)
# ... more ways

help("append") would have answered your question and saved the time it took you to write this question (but would have caused you to develop bad habits). ;-)

Note that vector <- c() isn't an empty vector; it's NULL. If you want an empty character vector, use vector <- character().

Also note, as BrodieG pointed out in the comments: if you absolutely must use a for loop, then at least pre-allocate the entire vector before the loop. This will be much faster than appending for larger vectors.

set.seed(21)
values <- sample(letters, 1e4, TRUE)
vector <- character(0)# slow
system.time( for (i in 1:length(values)) vector[i] <- values[i] )
#  user  system elapsed
#  0.340  0.000  0.343
vector <- character(length(values))# fast(er)
system.time( for (i in 1:length(values)) vector[i] <- values[i] )
#  user  system elapsed 
#  0.024  0.000  0.023

要注意的是,这里有性能方面的问题。

删除data.frame中的一列

> head(data)
   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon



You can set it to NULL.
> Data$genome <- NULL
> head(Data)
   chr region
1 chr1    CDS
2 chr1   exon
3 chr1    CDS
4 chr1   exon
5 chr1    CDS
6 chr1   exon

As pointed out in the comments, here are some other possibilities:
Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL  # same as above
Data <- Data[,-2]  # Ian Fellows
Data <- Data[-2]   # same as above

You can remove multiple columns via:    
Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL        # does not work!

Be careful with matrix-subsetting though, as you can end up with a vector:    
Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE]  # still a data.frame

从字符串中去除括号

string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) : 
#   invalid regular expression 'log(', reason 'Missing ')''

Escape the parenthesis with a double-backslash:
要用双斜线来转义括号
gsub("log\\(", "", string)
    原文作者:浪尖儿
    原文地址: https://www.jianshu.com/p/cec33c98fc39
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞