0

I am trying to create a dataframe from the list below where the 1st column is "webpage" which is the index number and 2nd column is "destination_nodes" which is the list of dest_nodes.

for col in range(10001):
    print(col)
    dest_nodes = M.index[M[col] == 1.0].tolist()
    print(dest_nodes)

A sample of the output of print(col) and print(dest_nodes) is shown below:

0
[2725, 2763, 3575, 4377, 6221, 7798, 7852, 8014, 8753, 9575]
1
[137, 753, 1434, 2182, 3163, 3646, 3684, 3702, 3966, 4353, 4410, 5029, 5610, 5671, 6149, 6505, 6835, 7027, 7030, 7127, 7724, 7876, 8006, 8676, 8821, 9069, 9226, 9321]
2
[473, 1843, 6748]
3
[67, 433, 537, 1068, 1118, 1191, 1236, 1953, 2285, 2848, 3296, 3816, 4155, 4507, 4704, 4773, 5028, 5333, 5341, 5613, 5656, 5858, 6068, 6169, 6239, 7367, 7897, 7909, 8973, 9113, 9576, 9799, 9909]
4
[]

I tried the following but it does not seem to give me what i require.

dest_node = pd.DataFrame (col, dest_nodes, columns = ["webpage","destination_nodes"])

The output dataframe i would like is something like this: enter image description here

Would appreciate any help I can get!

2
  • do you want your 4th row to be empty in the destination nodes? Commented Oct 28, 2021 at 6:22
  • @LakpaTamang yes, I would like it to be empty Commented Oct 28, 2021 at 6:36

4 Answers 4

1

You can use zip to achieve that. Like this

pd.DataFrame(zip(col, dest_nodes), columns=["webpage","destination_nodes"])

If you want to remove the brackets and want the exact same representation as shown in the image, run the below code first and then create a DataFrame.

dest_nodes = [str(l1).replace('[', '').replace(']','') for l1 in dest_nodes]
Sign up to request clarification or add additional context in comments.

Comments

1

Maybe you can use M directly:

df = pd.DataFrame(
         {'webpage': M.columns,
         'destination_nodes': M.eq(1).apply(lambda x: M[x].index.tolist())}
)
print(df)

# Output
  webpage destination_nodes
0       0            [0, 2]
1       1            [0, 1]
2       2                []
3       3               [1]
4       4            [1, 2]

Setup:

data = {'0': [1, 0, 1],
        '1': [1, 1, 0],
        '2': [0, 0, 0],
        '3': [0, 1, 0],
        '4': [0, 1, 1]}
M = pd.DataFrame(data)
print(M)

# Output:
   0  1  2  3  4
0  1  1  0  0  0
1  0  1  0  1  1
2  1  0  0  0  1

Comments

0

This works

# Make list
colLst = [i for i in range(10001)]
dest_nodesLst =[M.index[M[col] == 1.0].tolist() for col in range(1001)]

# Make data frame
dic = {"col":colLst,"M":dest_nodesLst}
dest_node = pd.DataFrame(data=dic)
 
# print head of dataframe
print(dest_node.head())

Comments

0

I would use a list comprehension to set up the dictionary:

df = pd.DataFrame({col:[M.index[M[col] == 1.0].tolist()] for col in range(10001)}, index="nodes")
df.index.name = "website"

print(df.traspose())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.