0

Running NbLinksExpirationStatus takes about 12 seconds to execute, which is not fast enough for what i am trying to achieve. Could it be Contains() problem and is there any way i can make it execute faster? i am working with EF6 and Sql Server. Please let me know if also in this case it is better to just use sqlquery instead or not?

NOTE :

HashSet<int> oneEntitiesPerimeter = oneEquipment.Select(x => x.ID_ONE).ToHashSet();
HashSet<int> twoEntitiesPerimeter = twoEquipment.Select(x => x.ID_TWO).ToHashSet();

The method where there is the query :

public static Tuple<int, int> NbLinksExpirationStatus(HashSet<int> oneEntitiesPerimeter, HashSet<int> twoEntitiesPerimeter)
{
    using (DbEntities context = new DbEntities())
    {
        int aboutToExpireValue = Constants.LinkStatusAboutToExpire; #1
        int expiredValue = Constants.LinkStatusExpired; #2

        var oneIds = oneEntitiesPerimeter ?? new HashSet<int>();
        var twoIds = twoEntitiesPerimeter ?? new HashSet<int>();

        var joinedQuery = (from t in context.ONE_DISTRIBUTION_STATUS
                            join o in context.TWO_DISTRIBUTION_STATUS on t.ID_ent equals o.ID_ent into joined
                            from o in joined.DefaultIfEmpty()
                            where oneIds.Contains(t.ID_ONE) && (o == null || twoIds.Contains(o.ID_TWO))
                            select new { oneExpirationStatus = t.LINK_STATUS_EXPIRATION, twoExpirationStatus = o.LINK_STATUS_EXPIRATION }).ToList();

        var aboutToExpireCount = joinedQuery.Where(j =>
            j.oneExpirationStatus == aboutToExpireValue || j.twoExpirationStatus == aboutToExpireValue).Count();

        var expiredCount = joinedQuery.Where(j =>
            j.oneExpirationStatus == expiredValue || j.twoExpirationStatus == expiredValue).Count();

        return new Tuple<int, int>(aboutToExpireCount, expiredCount);
    }
}

EDIT: i have tried to use SQL instead but the speed of the execution is always not improving. This is the attempt :

string oneIds = "NULL";            
if(oneEntitiesPerimeter != null && oneEntitiesPerimeter.Count > 0)
{
    oneIds = string.Join(",", oneEntitiesPerimeter);
}
string twoIds = "NULL";
if (twoEntitiesPerimeter != null && twoEntitiesPerimeter.Count > 0)
{
    twoIds = string.Join(",", twoEntitiesPerimeter);
}

string sqlQuery = "SELECT " +
    "COALESCE(SUM(CASE WHEN t.LINK_STATUS_EXPIRATION = " + aboutToExpireValue + " OR COALESCE(o.LINK_STATUS_EXPIRATION, '')= " + aboutToExpireValue + " THEN 1 ELSE 0 END), 0) AS AboutToExpire," +
    "COALESCE(SUM(CASE WHEN t.LINK_STATUS_EXPIRATION = " + expiredValue + " OR COALESCE(o.LINK_STATUS_EXPIRATION, '') = " + expiredValue + " THEN 1 ELSE 0 END), 0) AS Expired "+
    "FROM ONE_DISTRIBUTION_STATUS t LEFT JOIN TWO_DISTRIBUTION_STATUS o ON t.ID_ent = o.ID_ent WHERE t.ID_ONE in (" + oneIds + ") AND (o.ID_TWO in (" + twoIds + ") OR o.ID_TWO IS NULL)";

var counter = context.Database.SqlQuery<LinkExpirationStatusWidget>(sqlQuery).FirstOrDefault();           
result = new Tuple<int, int>(counter.AboutToExpire, counter.Expired);

The class added to store results :

public class LinkExpirationStatusWidget
{
    /// <summary>
    /// Gets or sets count of links about to expire
    /// </summary>
    /// <value>About To Expire count</value>
    public int AboutToExpire { get; set; }

    /// <summary>
    /// Gets or sets count of links expired
    /// </summary>
    /// <value>Expired count</value>
    public int Expired { get; set; }
}
16
  • The problem is the entire attempt to use LINQ as embedded SQL. LINQ is the query language of an ORM, Entity Framework. It's the ORM that generates SQL and executes the query. ORMs deal with entities, not tables. It's the ORM's job to generate the JOINs from the relations between entities. Commented Apr 8 at 12:58
  • 1
    got it, so it is better to use SQL from what i am understanding? because i have attempted to do that but the execution is as slow Commented Apr 8 at 13:02
  • The performance of the query depends on the actual generated SQL, column indexes and the amount of data. The code in NbLinksExpirationStatus execytes at least 2 very complicated queries based on joinedQuery, the part that shouldn't be done by hand. It seems like those 2 queries could be replaced by a GroupBy clause on oneExpirationStatus and twoExpirationStatus. Apart from that, there's no way to say why each of the 2 queries is slow - are there missing indexes? Too much data? Commented Apr 8 at 13:02
  • 1
    @mnl it is better to use SQL no, it's better to understand what EF is actually doing and how to use it. If you have a Customer entity with a Customer.Orders property, you don't need JOINs. If you wanted to count specific order and customer statuses statuses by customer you could write context.Orders.Where(ord=>ordIds.Contains(ord.Status) && custIds.Contains(ord.Customer.Status)).GroupBy(o=>new {OrderStatus=o.Status,CustomerStatus=o.Customer.Status}).Select(g=>new {g.Key,Count=g.Count()); Commented Apr 8 at 13:11
  • 1
    @DStanley this is already happening. I just realized the joinedQuery part has a ToList() way out to the right. Without any GroupBy though that returns all combinations instead of returning a count for each combination Commented Apr 8 at 13:28

2 Answers 2

2

Rather than doing two separate queries, you can aggregate and get the count in one query.

Also

  • Use valuetuples rather than the old-style Tuple.
  • Push the second Contains before the join to make it more efficient.
  • Use async where possible.
  • Don't use ToList, instead do the aggregation on the server.
public static async Task<(int aboutToExpireCount, int expiredCount)> NbLinksExpirationStatus(HashSet<int>? oneEntities, HashSet<int>? twoEntities)
{
    using DbEntities context = new DbEntities();
    int aboutToExpireValue = Constants.LinkStatusAboutToExpire;
    int expiredValue = Constants.LinkStatusExpired;

    oneEntities ??= new HashSet<int>();
    twoEntities ??= new HashSet<int>();

    var counts = await (
        from t in context.ONE_DISTRIBUTION_STATUS
        join o in context.TWO_DISTRIBUTION_STATUS.Where(two => twoEntities.Contains(two.ID_TWO))
          on t.ID_ent equals o.ID_ent into joined
        from o in joined.DefaultIfEmpty()
        where oneIds.Contains(t.ID_ONE)
        group new { t, o } by 1 into grp  // group by a constant value
        select new {
            aboutToExpireCount =  grp.Count(tuple =>
                tuple.t.LINK_STATUS_EXPIRATION == aboutToExpireValue || tuple.o.LINK_STATUS_EXPIRATION == aboutToExpireValue),
            expiredCount = grp.Count(tuple =>
                tuple.t.LINK_STATUS_EXPIRATION == expiredValue || tuple.o.LINK_STATUS_EXPIRATION == expiredValue),
        }
    ).FirstAsync(); 

    return (counts.aboutToExpireCount, counts.expiredCount);
}
Sign up to request clarification or add additional context in comments.

6 Comments

what is the use of grouping please?
EF requires you to use .GroupBy or equivalent to get a grouped result, even if you are grouping by the whole set.
i see, i have tried this without Task option and it works better but i wonder when i tried to add Task and applied async/await i had a problem since this method is nested inside other ones i applied async/await on them too but there are also methods used aside NbLinksExpirationStatus that are not Task type so the home page was just empty as a final result of my webapp
as long as async is there i get an error that tells me to use await
So remove the async and just make it public static (int aboutToExpireCount, int expiredCount) and remove the await and use .First()
|
0

There is a high likelihood that you'll get a vast performance boost simply by removing the ToList() at the end of joinedQuery. That call causes joinedQuery to get executed and (according to your comment) send hundreds of thousands of rows of data back to the server. By removing it, each call to Count() will send a separate request, but each request will allow SQL Server to optimize the query and only send back a single data point, which will almost certainly be orders of magnitude faster.

I would strongly recommend putting chained methods like ToList() and Count() on their own line, because it's easy to miss them when they're at the end of a long line of code like this.

Other optimizations, like grouping the queries into a single round-trip, using async, etc., are probably premature at this point. There are plenty of other options for improving performance, but they all come with drawbacks in terms of code maintainability, and they might not make that much of a difference.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.