-1

Python newbie here, I got this code from the internet(can't remember the source), and I am unable to understand how it works. What I want is to replace the output in a way so that it shows the name of the cities instead of the coordinates. Are they even linked ? Meaning once we input the values into the DB scan algorithm, do they lose their identity ? Is there any way to keep that so I can display the city names ? Any help or suggestion or edit to the question is appreciated

Here is a colab link.

kms_per_radian = 63.710088
epsilon = 1.500 / kms_per_radian
db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

clustersList = clusters.tolist()

def get_centermost_point(cluster):
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return tuple(centermost_point)

lats, lons = zip(*centermost_points)
rep_points = pd.DataFrame({'lon':lons, 'lat':lats})
rs = rep_points.apply(lambda row: df[(df['lat']==row['lat']) & (df['lon']==row['lon'])].iloc[0], axis=1)



centermost_points = clusters.map(get_centermost_point)
1
  • 1
    Do you mean reverse search the coordinates to find their labels ? Is there a better way for this ? Because I will have about 200 cities and their coords, and some coords may/may not be same for same city, and vice versa.. Commented Mar 24, 2019 at 9:52

1 Answer 1

1
+50
clusters1 = pd.Series([names[cluster_labels == n] for n in range(num_clusters)])
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print(clusters1)
print(clusters)
print(df)

I went through your code, I found this clusters coordinates are grouped based on the labels. Instead of that see clusters1 there I have grouped the cluster names based on the coordinates. Hope I answer your question.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi, the final print statement gives me the name of the cities but, they are being cut...is there a way I can get the cities grouped by clusters in list form? Also, can you please explain how did you do that ?? Thanks!
@RohitKumar google colab replaces large values as "....". You can always print the individual output like for i in clusters1 print(i). Actually in clusters1 the cities are grouped together based on labels. To convert to list you can check out this SO POST.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.