0

I am trying to extract data from openstreetmap in R via the osmdata package like so:

extract_data <- function(bbx){
  highway_vals <- c("motorway", "trunk", "primary", "secondary")
  data <- bbx %>% 
    opq(timeout = 100) %>% 
    add_osm_feature(
      key = "highway", 
      value = highway_vals
    ) 
  
  data_of_interest <- list(osm_lines = data$osm_lines, osm_polygons = data$osm_polygons)
  data_of_interest
}

As you can see from this function, I am only really interested in the lines and polygons. Is there a way to put this restriction into the query? I already tried to look at available_features() but nothing fit the bill. Though, I have seen here that in principle it should be possible to query by geometry such as "multipolygon".

Especially for large bounding boxes I will inevitably download a huge amount of points, i.e. there is always data$osm_points in the data variable after the query is executed. I believe that this wastes a huge amount of resources (at least the memory that is allocated to the variable data is way larger than the amount that is allocated to data_of_interest) and actually brings me to my download limit quite fast, i.e. I hit the Please check /api/status for the quota of your IP address error.

1 Answer 1

1

This is in principle and through design not possible, because the underlying data model of OSM is entirely different from the data model of Simple Features. The osmdata package reflects one mapping of the former data model onto the latter (with design decisions explained at some length in a package vignette). The points are always there, because they are the only parts of the OSM data base that contain the actual coordinates. The omsdata package simply converts them to Simple Features form, but they are always delivered regardless. OSM only knows three things: nodes, ways, and relations. Both lines and polygons (as well as anything else) can be formed of multiple individual "ways" which have to be stitched together to form Simple Features "lines" or "polygons". Those ways in turn have no coordinates, but are only sequences of point IDs, which is why the points always have to be delivered. That effectively prevents any simplification of queries along the lines you suggest - there is no "waste of resources" because what you want (lines and polygons) requires all the data to be delivered regardless. The only things you could do would be either reduce your bounding box, or refine your query with more specific key-value pairs (or both).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.