I have to insert around 1.5 billion records in cosmos db using Java SDK which is broken into batches of 7k documents. I have written the code which generates the data first in the loop then put it into 2 containers, Doucment and Document_attr using CosmosClient connection. But it is too slow around 300 items in 1 sec. With this speed, it'll take too long time to get the data into containers. Can someone please suggest the best possible way to achieve insertion of items at faster speed? Throughput is set to auto-scaling and max at 4000RU/s. Since I'm new to Cosmos DB, I'm unable to optimize the process.
CosmosClient cosmosClient = new CosmosClientBuilder().
endpoint("<>").key("<>").buildClient();
CosmosContainer documentContainer = db.getContainer("DOCUMENT");
CosmosContainer attributeContainer = db.getContainer("DOCUMENT_ATT");
CosmosBulkExecutionOptions bulkExecutionOptions = new CosmosBulkExecutionOptions();
for(int i=0;i<86400;i++) {
List<Document> docInsert = new ArrayList<>();
List<DocumentAttribute> docAttr = new ArrayList<>();
for (int j = 0; j < 50; j++) {
String docId = UUID.randomUUID().toString();
Date expiryTime = DateUtils.addYears(date, 10);
docInsert.add(new Document(docId, docId, Math.floor(Math.random() * this.maxFileSize) + 1,number));
docInsert.add(new Document(docId, docId, Math.floor(Math.random() * this.maxFileSize) + 1,number+1));
number = number + 2 > totalNumber ? 1 : number + 2;
List<User> users = new ArrayList<>();
users.add(new User(idPrefix+extUserNumber,"EXT",list1));
users.add(new User(idPrefix+extUserNumber+1,"EXT",list1));
extUserNumber = extUserNumber + 2 > totalNumber ? 1 : extUserNumber +2;
for(int u = 0;u<5;u++)
users.add(new User(idPrefix+(intNumber+u),"INT",list2));
intNumber = intNumber + 5 > totalIntNumber ? 1 : intNumber +5;
docAttr.add(new DocumentAttribute(docId,users date));
}
List<CosmosItemOperation> cosmosItemOperationFlux = docInsert.stream().
map(doc -> CosmosBulkOperations.getCreateItemOperation(doc,new PartitionKey(doc.getId())))
.collect(Collectors.toList());
documentContainer.executeBulkOperations(cosmosItemOperationFlux, bulkExecutionOptions);
List<CosmosItemOperation> cosmosItemOperationFlux1 = docAttr.stream().
map(doc -> CosmosBulkOperations.getCreateItemOperation(doc,new PartitionKey(doc.getDocId())))
.collect(Collectors.toList());
attributeContainer.executeBulkOperations(cosmosItemOperationFlux1, bulkExecutionOptions);
date = DateUtils.addSeconds(date, 1);
}
I referred this document https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/tutorial-dotnet-bulk-import but CosmosClientOptions is not available in Java SDK it seems.
RU/sceiling(RU/10000)rather thanceiling(RU/6000)- so 240,000RU/s will just give you 24 physical partitions - not 40. If you started off with a single physical partition then 24 physical partitions will mean that some partitions are dealing with twice the key range of others and you should be aiming for 32 or 64. This will take a while though as it will happen through multiple rounds of binary splits rather than in one go