6

I am trying to get all the records from Elasticsearch using Java API. But I receive the below error

n[[Wild Thing][localhost:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10101].

My code is as below

Client client;
try {
    client = TransportClient.builder().build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));
    int from = 1;
    int to = 100;
    while (from <= 131881) {
        SearchResponse response = client
                .prepareSearch("demo_risk_data")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setFrom(from)
                .setQuery(QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", "")))
                .setSize(to).setExplain(true).execute().actionGet();
        if (response.getHits().getHits().length > 0) {
            for (SearchHit searchData : response.getHits().getHits()) {
                JSONObject value = new JSONObject(searchData.getSource());
                System.out.println(value.toString());
            }
        }
    }
}

Total number of records currently present are 131881 ,so I start with from = 1 and to = 100 and then get 100 records until from <= 131881. Is there are way where I can check get records in set of say 100 until there are no further records in Elasticsearch.

1 Answer 1

8

Yes, you can do so using the scroll API, which the Java client also supports.

You can do it like this:

Client client;
try {
    client = TransportClient.builder().build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));

    QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", ""));
    SearchResponse scrollResp = client.prepareSearch("demo_risk_data")
        .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet();

    //Scroll until no hits are returned
    while (true) {
        //Break condition: No hits are returned
        if (scrollResp.getHits().getHits().length == 0) {
            break;
        }

        // otherwise read results
        for (SearchHit hit : scrollResp.getHits().getHits()) {
            JSONObject value = new JSONObject(searchData.getSource());
            System.out.println(value.toString());
        }

        // prepare next query
        scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Instead of set a constant value to the size (100). Is some posibility to get all the values that is stored in the elastic?
@Val, Can u update your answer with latest API?. Now TransportClient and some are deprecated.
@PaulSteven if you follow the second link, you'll see how it looks like now ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.