I have a table filemapping with over 140 million rows, and then commit a batch update (on, say, a million rows) with spring data like follows:
jdbcTemplate.batchUpdate("UPDATE filemapping SET checksum=? WHERE filePath=?", new BatchPreparedStatementSetter() {
public void setValues(PreparedStatement stmt, int issueIndex) throws SQLException {
stmt.setString(1, batchObjects[issueIndex].getChecksum());
stmt.setString(2, batchObjects[issueIndex].getFilePath());
}
public int getBatchSize() {
return 1000;
}
});
Which may look like:
CREATE TABLE [dbo].[filemapping] (
[id] INT IDENTITY (1, 1) NOT NULL,
[filePath] VARCHAR (3000) NULL,
[project_id] INT NOT NULL,
[checksum] VARCHAR (255) NULL,
CONSTRAINT [PK_FM] PRIMARY KEY NONCLUSTERED ([id] ASC),
CONSTRAINT [ReFileMap] FOREIGN KEY ([project_id]) REFERENCES [dbo].[project] ([id]) ON DELETE CASCADE
);
CREATE NONCLUSTERED INDEX [MapIndexOne]
ON [dbo].[filemapping]([project_id] ASC, [fileName][filePath] ASC);
CREATE NONCLUSTERED INDEX [MapIndexChecksum]
ON [dbo].[filemapping]([checksum] ASC);
As this table has grown the execution time for this has gone up orders of magnitude -- a series of updates which used to take a minute now takes hours. The sp_WhoIsActive informs me that we are getting locks on the filemapping table which, which to my understanding may explain why server resources are under-utilized and the update operations are so slow.
My questions are:
- Foremost, what can be done to speed this up?
- Are lower or higher batch sizes per transaction worth exploring?
- Do the indexes matter, presuming it's the lock that's the limiting factor? How can I tell? (my statistics show CPU waits being the highest, and no normally only 1 cpu being used)
- Why would a lock slow the updates down in the first place? Could it be something else?
- Would Skipping locks matter at all for a series of updates with no selects?