0

I have a data set and I want to sort it in the following way in R. I hope I can explain clearly.

  1. Sort by the elements seen in the main column. This will give us two chunks, one chunk with all As and one chunk with all Gs.

  2. Then for the first chunk, move to the -1 column position, and sort by the elements seen there (there are two elements, C/T). This will break the first chunk into two smaller chunks, one with A at the main column and C at the - 1st column; and one chunk with A at the main column and T at the - 1st column.

  3. For the second chunk, move to the -1 column and do the same. I will end up with two smaller chunks, one with G at the main column and C at the - 1st column; and one with G at the main column and T at the -1th column.

  4. Move to the +1 column and do the same. At each step, I will end up partitioning each of the existing chunks into two new chunks.

I do not want to break the row pattern. I want to sort the rows (swap the arrangement of the rows), but I won't re-arrange the columns. How can I do that?

An idea: I did this sorting by hand and I got a normal distribution shape. That's why I gave weights (for every column) which were obtained by normal distribution function. After that I got a weighted covariance matrix (number of rows x number of rows) by using the dissimilarity coefficient between rows and weights. Then I ranked the data by using eigenvectors of correlation matrix which has the penalty for missing data. However I could not reach the result that I reached by hand. My data is so big but I am sharing a small part of it.

-7  -6  -5  -4  -3  -2  -1  Main    1   2   3   4
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   T   C   G   C   T   C   G   G   G   T   G
A   C   C   A   C   C   T   A   G   A   T   G
G   C   T   G   C   T   T   G   G   G   T   G
A   C   C   A   C   C   T   G   G   A   T   G
G   C   T   G   C   T   T   G   G   G   T   G
A   C   C   A   C   C   T   G   G   A   T   G
A   C   C   A   C   C   T   G   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   G   G   G   T   G
A   C   C   A   T   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   G   C   T   T   G   A   G   C   T
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G
A   C   C   A   C   C   T   A   G   A   T   G

1 Answer 1

0

As I read your query, you wish to sort column Main, then X.1 within groups of Main, then X1 within groups of X.1. The following will do just that:

library(dplyr)
data.sort <- arrange(data, Main, X.1, X1)

   X.7 X.6 X.5 X.4 X.3 X.2 X.1 Main X1 X2 X3 X4
1    A   C   C   A   C   C   T    A  G  A  T  G
2    A   C   C   A   C   C   T    A  G  A  T  G
3    A   C   C   A   C   C   T    A  G  A  T  G
4    A   C   C   A   C   C   T    A  G  A  T  G
5    A   C   C   A   C   C   T    A  G  A  T  G
6    A   C   C   A   C   C   T    A  G  A  T  G
7    A   C   C   A   C   C   T    A  G  A  T  G
8    A   C   C   A   C   C   T    A  G  A  T  G
9    A   C   C   A   C   C   T    A  G  A  T  G
10   A   C   C   A   C   C   T    A  G  A  T  G
11   A   C   C   A   C   C   T    A  G  A  T  G    
12   A   C   C   A   C   C   T    A  G  A  T  G
13   A   C   C   A   C   C   T    A  G  A  T  G
14   A   C   C   A   C   C   T    A  G  A  T  G
15   A   C   C   A   C   C   T    A  G  A  T  G
16   A   C   C   A   T   C   T    A  G  A  T  G
17   A   C   C   A   C   C   T    A  G  A  T  G
18   A   C   C   A   C   C   T    A  G  A  T  G
19   A   C   C   A   C   C   T    A  G  A  T  G
20   A   C   C   A   C   C   T    A  G  A  T  G
21   A   C   C   A   C   C   T    A  G  A  T  G
22   A   C   C   A   C   C   T    A  G  A  T  G
23   A   C   C   A   C   C   T    A  G  A  T  G
24   A   C   C   A   C   C   T    A  G  A  T  G
25   A   C   C   A   C   C   T    A  G  A  T  G
26   A   C   C   A   C   C   T    A  G  A  T  G
27   A   T   C   G   C   T   C    G  G  G  T  G
28   A   C   C   G   C   T   T    G  A  G  C  T
29   G   C   T   G   C   T   T    G  G  G  T  G
30   A   C   C   A   C   C   T    G  G  A  T  G
31   G   C   T   G   C   T   T    G  G  G  T  G
32   A   C   C   A   C   C   T    G  G  A  T  G
33   A   C   C   A   C   C   T    G  G  A  T  G
34   A   C   C   A   C   C   T    G  G  G  T  G

You can reverse the order with desc() as follows:

data.sort <- arrange(data, desc(Main), desc(X.1), desc(X1))

N.B. The column names need to be set up without minus signs, numbers, etc.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your sharing. As I said it is a small part of my data set. I have more than 600 columns. How will this solution work?
The other way to do the multi-column sort is to use the order() function—data[order(data[,8],data[,7],data[,9]),] is how it would be done. I don't know if either method would scale up to 600 columns, though. That is an extensive branching tree (600 deep), if every column is to be used.
Unfortunately, it doesn't give what I want. As far as I understand I should use a clustering method but I do not know which one and how.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.