1

I'm trying to load blob names for filtering in my program, then after applying all filters I plan to download and process each blob. Currently we have around 30k blobs in storage which are stored inside container like this: year/month/day/hour/file.csv (or file.json for unprocessed files)

My program needs to dynamically enter start and end date (max length of 30 days) for downloading. Using Azure.Storage.Blobs.BlobContainerItem and method GetBlobs allows me to use single string prefix for server side filtering.

If my dates are 2020/06/01 and 2020/06/02 program works very fast and takes around 2 seconds to get blobs and apply rest of filters to it. However, if i have 2020/05/30 and 2020/06/01 then I'm unable to put month prefix because it takes only 1 string so my prefix will be just 2020, which takes around 15 seconds to complete. Rest of the filtering is done locally but biggest delay is the GetBlobs() function.

Is there any other way to use multiple filters server side from .NETCore app?

Here are relevant functions:

        BlobContainerClient container = new BlobContainerClient(resourceGroup.Blob, resourceGroup.BlobContainer);
        var blobs = container.GetBlobs(prefix : CreateBlobPrefix(start, end))
            .Select(item=> item.Name)
            .ToList();
        blobs = FilterBlobList(blobs, filter, start, end);

    private string CreateBlobPrefix(DateTime start, DateTime end)
    {
        string prefix = null;
        bool sameYear = start.Year == end.Year;
        bool sameMonth = start.Month == end.Month;
        bool sameDay = start.Day == end.Day;
        if (sameYear)
        {
            prefix = start.Year.ToString();
            if (sameMonth)
            {
                if(start.Month<10)
                    prefix += "/0" + start.Month.ToString();
                else
                    prefix += "/" + start.Month.ToString();
                if (sameDay) 
                    if(start.Day<10)
                        prefix += "/0" + start.Day.ToString();
                    else
                        prefix += "/" + start.Day.ToString();
            }
        }
        return prefix;

EDIT: here's how i did it in the end. Because it's faster to make multiple requests with better specified prefixes i did the following:

  • create a list of different dates in selected time window (coming from UI application where user inputs any window)
  • for each prefix created I send the request to Azure to get blobs
  • concat all blob names into 1 list
  • process the list by using blob client for each blob name

Here's the code:

        foreach (var blobPrefix in CreateBlobPrefix(start, end))
        {
            var currentList = container.GetBlobs(prefix: blobPrefix)
                .Select(item => item.Name)
                .ToList();
            blobs = blobs.Concat(currentList).ToList();
        }
4
  • How about the issue? Does the answer below resolved your question, If yes, you could Accept it as an Answer , so it could help other community members who get the same issues and we could archive this thread, thanks. Commented Dec 22, 2020 at 8:02
  • 1
    I kind of forgot to re-visit the page after reading your answer. Yeah it helped me find good solution similar to what you wrote. Thanks Commented Dec 23, 2020 at 9:03
  • I have added the code I used, but do share yours! Commented Dec 23, 2020 at 17:52
  • Would you mind adding the implementation for CreateBlobPrefix(start, end)? Commented Jan 2, 2021 at 23:24

1 Answer 1

1

You could filter more than once, finding the common denominator between the dates:

First filter with the string prefix by the start month and year, 2020/05, and then filter locally for exact date.

Then you can gradually increase the day/month filter until you reach the end of the range.

The granularity of your stepping really depends on the time it takes to make a call to Azure for a given average number of results. Another advantage is you could run these sub-queries in parallel.

I've used this code:

    var prefixDateFilters = Enumerable.Range(0, 1 + endDateInclusive.Subtract(startDateInclusive).Days)
                                      .Select(offset => startDateInclusive.AddDays(offset))
                                      .Select(date => $"{date.ToString(BlobFileDateTimeFormat)}").ToList();

    prefixFilters.AsParallel()
                 .Select(filter => containerClient.GetBlobs(prefix: filter))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.