Coming from various other languages, I find R powerful and intuitive, but I am not thrilled with its performance. So I decided to try to improve some snippet I wrote and learn how to code better in R.
Here's a function I wrote, trying to determine if a vector is binary-valued (two distinct values or just one value) or not:
isBinaryVector <- function(v) {
if (length(v) == 0) {
return (c(0, 1))
}
a <- v[1]
b <- a
lapply(v, function(x) { if (x != a && x != b) {if (a != b) { return (c()) } else { b = x }}})
if (a < b) {
return (c(a, b))
} else {
return (c(b, a))
}
}
EDIT: This function is expected to look through a vector then return c() if it is not binary-valued, and return c(a, b) if it is, a being the small value and b being the larger one (if a == b then just c(a, a). E.g., for
A B C
1 1 1 0
2 2 2 0
3 3 1 0
I will lapply this isBinaryVector and get:
$A
[1] 1 1
$B
[1] 1 1
$C
[1] 0 0
The time it took on a moderate sized dataset (about 1800 * 3500, 2/3 of them are binary-valued) is about 15 seconds. The set contains only floating-point numbers.
Is there anyway I could do this faster?
Thanks for any inputs!
lapplycall isn't assigned to anything. If v is a data frame, a and b are both initially simply the first column of v. Then you test whether each column is identical to a and b (which themselves are identical) incorrectly using vectorized comparisons in an if statement. I could go on. Consider me baffled.