1

`

SELECT DISTINCT ECS.UserId,,PFR.EntityId,PFR.CreatedBy
FROM EventConsentStatus ECS 
INNER JOIN @Institutions I ON ECS.InstitutionId=I.InstitutionId             
LEFT JOIN ParentFormResponses PFR ON  PFR.EntityTypeId = 1 
          AND PFR.EntityId=@ActivityId AND ECS.EventId=PFR.EntityId                 
WHERE ECS.EventId = @ActivityId

A table called ParentFormResponses mentioned in the above query is having more than 3lacs records. Table is not having any identity column instead I have created clustered primary index on group of some columns based on uniqueness. But still it is taking more than 16 minutes to execute a simple select statement i.e. select * from ParentFormResponses.

If I am removing column names of ParentFormResponses table from above mentioned select statement then it is showing result in 2 3 seconds but for above query it is taking too much time.

If I create non clustered index on entityid and entitytypeid then also it is not giving optimized result.

Please suggest me how can I improve table structure and query performance.

Detail information: Table Stucture:

`CREATE TABLE [dbo].[ParentFormResponses](
[EntityId] [int] NOT NULL,
[FormId] [int] NOT NULL,
[StudentId] [nvarchar](50) NOT NULL,
[CreatedBy] [nvarchar](50) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[EntityTypeId] [int] NOT NULL,
[FormVersion] [decimal](18, 1) NOT NULL,
[DigitallySigned] [bit] NULL,
[HasResponseChanged] [bit] NULL,

CONSTRAINT [PK_ParentFormResponses] PRIMARY KEY CLUSTERED ( [EntityId] ASC, [FormId] ASC, [StudentId] ASC, [EntityTypeId] ASC, [FormVersion] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY]`

=> I dropped primary key index and created below non clustered index CREATE NONCLUSTERED INDEX [IX_ParentFormResponses_sakshi_EntityId] ON [dbo].[ParentFormResponses_sakshi] ( [EntityId] ASC, [EntityTypeId] ASC, [FormId] ASC, [FormVersion] ASC ) INCLUDE ( [StudentId], [CreatedBy]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO

=> Result of all three tables based on sp_spaceused are-

  • name- rows, reserved, data, index_size, unused
  • ParentFormResponses - 309961, 64704 KB, 63592 KB, 936 KB, 176 KB
  • ParentFormResponses_sakshi- 309893, 117696 KB, 60472 KB, 56944 KB, 280 KB
  • EventConsentStatus - 673796, 380920 KB, 109240 KB, 271512 KB, 168 KB
  • Note: Number of records are not 30k but more than 3 lacs. By mistake I wrote wrongly.

    7
    • Do you know what a query plan is? If not - read is up. And post it. That said, start by making the Distinct surplus. That could possibly require a lot of work on the tempdb, but then the query plan will tell you. Commented Nov 17, 2014 at 10:42
    • To select data from such a small table should not take long. It must be the fetching proces taking so long. Are you storing big data (images, videos, etc.)? I assume select count(*) from ParentFormResponses to take some seconds only. Correct? As to the join: an index on EntityId and EntityTypeId as mentioned by you should usually make this fast, at least when there are many different EntityIds. Commented Nov 17, 2014 at 10:45
    • Another question: estimated, how many users access table ParentFormResponses when running your query(es)? Commented Nov 17, 2014 at 10:47
    • Post the query plan .. also make sure that non clustered index you have created are actually being used and that statistics are up to date.. Commented Nov 17, 2014 at 10:51
    • try use group by instead of distinct Commented Nov 17, 2014 at 11:08

    1 Answer 1

    1

    The combination of distinct and left join is suspicious from a performance perspective. First, check the following query:

    SELECT ECS.UserId, ECS.EntityId
    FROM EventConsentStatus ECS INNER JOIN
         @Institutions I
         ON ECS.InstitutionId = I.InstitutionId             
    WHERE ECS.EventId = @ActivityId;
    

    You can optimize this with an index on EventConsentStatus(EventId, UserId). Assuming this has good performance, it will produce the right set of rows. If not, then you may be getting duplicates from @Institions. If this is the case, then consider removing it (no columns are used from that table, it is only for filtering). Or:

    SELECT ECS.UserId, ECS.EntityId
    FROM EventConsentStatus ECS 
    WHERE ECS.EventId = @ActivityId AND
          EXISTS (SELECT 1 FROM  @Institutions I WHERE ECS.InstitutionId = I.InstitutionId)
    

    To get the right values, let's get the additional data using outer apply:

    SELECT ECS.UserId, PFR.EntityId, PFR.CreatedBy
    FROM EventConsentStatus ECS INNER JOIN
         @Institutions I
         ON ECS.InstitutionId = I.InstitutionId OUTER APPLY
         (SELECT TOP 1 PFR.EntityId, PFR.CreatedBy
          FROM ParentFormResponses pfr
          WHERE ECS.EventId = PFR.EntityId  AND
                PFR.EntityTypeId = 1 AND
                PFR.EntityId = @ActivityId
         ) pfr
    WHERE ECS.EventId = @ActivityId;
    

    For this, you want an index on ParentFormResponses(EntityId, EntityTypeID, CreatedBy).

    Sign up to request clarification or add additional context in comments.

    2 Comments

    I have tried outer apply with proper indexes but still taking too much time. If I run both queries the one inside the Outer apply and the other one separately then it takes only few seconds. But with Left join or outer apply not giving optimized result.
    @Sakshi . . . I just updated the index definition. I had misread one the correlated join condition -- that affects the index and the performance.

    Your Answer

    By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.