I am experiencing a strange performance issue. I have a view based on a CTE. It's a view that I wrote years ago, and it has been running without issue. Suddenly, 4 days ago, the query that ran in 1 - 2 minutes, ran for hours before we identified the long running query and halted it.
The CTE produces a time-stamped list of transactions that an agent performs. I then Select from the CTE, left joining back to the CTE using the timestamp of the subsequent transaction to determine the length of time an agent spend on each transaction.
WITH [CTE_TABLE] (COLUMNS) AS
(
SELECT [INDEXED COLUMNS]
,[WINDOWED FUNCTION] AS ROWNUM
FROM [DB_TABLE]
WHERE [EMPLOYEE_ID] = 111213
)
SELECT [T1].[EMPLOYEE_ID]
,[T1].[TRANSACTION_NAME]
,[T1].[TIMESTAMP] AS [START_TIME]
,[T2].[TIMESTAMP] AS [END_TIME]
FROM [CTE_TABLE] [T1]
LEFT OUTER JOIN [CTE_TABLE] [T2] ON
(
[T1].[EMPLOYEE_ID] = [T2].[EMPLOYEE_ID]
AND [T1].[ROWNUM] = [T2].[ROWNUM] + 1
)
In testing I filter for a specific agent. If it run the inner portion of the CTE it produces 500 records in less than a second. (When not filtering for a single agent, it produces 95K records in 14 seconds. This is the normal running timeframe.) If I run the CTE with a simple SELECT * FROM [CTE_TABLE], it also runs in less than a second. When I run it using an INNER JOIN back to itself, again, runs in less than a second. Finally, when I run it as a LEFT OUTER JOIN it takes over a minute and a half just for the 500 records of a single agent. I need the LEFT OUTER JOIN because the final record of the day is the agent's log-off the system, and it never has a record to join to.
The table that I pull from is over 22GB in size, and has 500 Million rows. Selecting the records from this table for a single day takes 14 seconds, or a single agent in less than a second, so I don't think the performance bottleneck comes from the source table. The bottleneck occurs in the LEFT OUTER JOIN back to the CTE, but I have always had the LEFT OUTER JOIN. Again, the very strange aspect is that this only began 4 days ago. I have checked space on the server, there is plenty. The CPU spikes to approx. 25% and remains there until the query ends running, either on its own, or halted by an admin.
I am hoping someone has some ideas as to what could have caused this. It appears to have cropped up from nowhere.