0

enter image description hereI'm using Python, numpy and scipy to do some hierarchical clustering on the output of a topic model I created for text analysis.

I applied my testcorpus to the ldamodel so it became a bag-of-words representation. Then I turned it into a matrix. Now I want to use scipy to make a linkage matrix of my matrix. But it gives the Value Error: setting an array element with a sequence. I guess that this is because only equally shaped arrays can be clustered. And my matrix has a difference in lengths between the lists inside the list of lists. I just don't now how to solve this. Here is a little part of the code. I don't know if it is helpful. I just really hope someone can help me.

  import numpy as np
  X = np.array(corpus)
  from matplotlib import pyplot as plt  
  from scipy.cluster.hierarchy import dendrogram, linkage
  Z = linkage(X, 'cosine') 
4
  • When you ask questions like this you need to identify the problem line, and tell us something about the inputs, arrays or otherrwise, to that line. Look at my recent answer to another question with the same error, stackoverflow.com/questions/41621340/…. A crucial question in your case is the problem in the first or the last line? Commented Jan 12, 2017 at 22:04
  • HI, thanks for your comment. I'm pretty new in programming (linguistic student who had an introduction to Python). This may sound as a stupid question, but is it possible to help me with how I can identify the problem line? Commented Jan 13, 2017 at 10:37
  • @hpaulj , Oh and the error occurs at the last line: Z = linkage(X, 'cosine') . I can print matrix X without a problem, but it is Z that gives the error. Commented Jan 13, 2017 at 10:48
  • I added an image of the code to my post Commented Jan 13, 2017 at 13:10

1 Answer 1

1

As you'd mentioned getting matrix X from lda model, it might be a sparse matrix of some sort. You can convert to dense matrix by X.todense() and apply the linkage method.If the matrix is too large to hold in the memory you can do Z=linkage(X.todense(),distance='cosine').

In some cases, changing the dtype of matrix helps.

P.S : I too faced the same issue and converting my sparse feature matrix (scipy.sparse.csr matrix) to dense solved the issue.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.