Using elasticsearch-py, I would like to remove all documents from a specific index, without removing the index. Given that delete_by_query was moved to a separate plugin, I want to know what is the best way to go about this?
-
You can't just delete and recreate the index?OneCricketeer– OneCricketeer2016-02-18 17:17:38 +00:00Commented Feb 18, 2016 at 17:17
-
@cricket_007 I could, but I'd rather do it by removing the documents. Otherwise, I'd have to check the index settings and mappings and use them when recreating index. I think it's easier to remove the documents.zanderle– zanderle2016-02-18 17:45:42 +00:00Commented Feb 18, 2016 at 17:45
-
A simple backup of the mappings and such shouldn't be that difficult. A full index scan and a bulk delete doesn't seem "easier", IMOOneCricketeer– OneCricketeer2016-02-18 18:00:46 +00:00Commented Feb 18, 2016 at 18:00
2 Answers
It is highly inefficient to delete all the docs by delete by query. More direct and correct action is:
- Getting the current mappings (Assuming you are not using index templates)
- Dropping the index by
DELETE /indexname - Creating the new index and the mappings.
This will take a second, former will take much, much more time and unnecessary disk I/O
1 Comment
Use a Scroll/Scan API call to gather all Document IDs and then call batch delete on those IDs. This is the recommended replacement for the Delete By Query API based on the official documentation.
EDIT: Requested information for using this specifically in elasticsearch-py. Here is the documentation for the helpers. Use the Scan helper to scan throgh all documents. Use the Bulk helper with the delete action to delete all the ids.