I'm trying to load blob names for filtering in my program, then after applying all filters I plan to download and process each blob. Currently we have around 30k blobs in storage which are stored inside container like this: year/month/day/hour/file.csv (or file.json for unprocessed files)
My program needs to dynamically enter start and end date (max length of 30 days) for downloading. Using Azure.Storage.Blobs.BlobContainerItem and method GetBlobs allows me to use single string prefix for server side filtering.
If my dates are 2020/06/01 and 2020/06/02 program works very fast and takes around 2 seconds to get blobs and apply rest of filters to it. However, if i have 2020/05/30 and 2020/06/01 then I'm unable to put month prefix because it takes only 1 string so my prefix will be just 2020, which takes around 15 seconds to complete. Rest of the filtering is done locally but biggest delay is the GetBlobs() function.
Is there any other way to use multiple filters server side from .NETCore app?
Here are relevant functions:
BlobContainerClient container = new BlobContainerClient(resourceGroup.Blob, resourceGroup.BlobContainer);
var blobs = container.GetBlobs(prefix : CreateBlobPrefix(start, end))
.Select(item=> item.Name)
.ToList();
blobs = FilterBlobList(blobs, filter, start, end);
private string CreateBlobPrefix(DateTime start, DateTime end)
{
string prefix = null;
bool sameYear = start.Year == end.Year;
bool sameMonth = start.Month == end.Month;
bool sameDay = start.Day == end.Day;
if (sameYear)
{
prefix = start.Year.ToString();
if (sameMonth)
{
if(start.Month<10)
prefix += "/0" + start.Month.ToString();
else
prefix += "/" + start.Month.ToString();
if (sameDay)
if(start.Day<10)
prefix += "/0" + start.Day.ToString();
else
prefix += "/" + start.Day.ToString();
}
}
return prefix;
EDIT: here's how i did it in the end. Because it's faster to make multiple requests with better specified prefixes i did the following:
- create a list of different dates in selected time window (coming from UI application where user inputs any window)
- for each prefix created I send the request to Azure to get blobs
- concat all blob names into 1 list
- process the list by using blob client for each blob name
Here's the code:
foreach (var blobPrefix in CreateBlobPrefix(start, end))
{
var currentList = container.GetBlobs(prefix: blobPrefix)
.Select(item => item.Name)
.ToList();
blobs = blobs.Concat(currentList).ToList();
}