I am working in the backend of an application. One part of this application (like every application nowadays) is using AI for multiple things. The application's main purpose is building data warehouse L0 layer, DWH layer, data mart, and write-back.
In many cases, we are using AI, and I need to pass the whole structure of the existing tables in staging and in DWH, along with a lot of related metadata for these tables. I implemented this already, but the problem is this: we are using TypeORM and it's a NestJS application. When I wrote a raw repository query, something that looks like this:
typescriptconst fullTable = await repository.find({
where: { id: In(tableId) },
relations: {
coreColumn: true,
fromReference: {
toTable: true,
fromColumn: true,
toColumn: true,
},
toReference: {
fromTable: true,
fromColumn: true,
toColumn: true,
},
dwhTable: {
columns: {
fkColumn: {
coreColumn: {
coreTable: {
dwhTable: {
tableQuery: {
sourceSystem: true,
},
logicalModels: true,
sourceSystems: true,
columns: true,
},
},
},
},
},
logicalModels: true,
sourceSystems: true,
tableQuery: {
sourceSystem: true,
},
},
stagingTable: {
stagingColumns: true,
},
},
});
which looks very bad first of all, but that is a part I can deal with. It's extremely slow, which I guess was expected. I will need to use the service that runs these queries on tableId quite often, so I can't have one table taking 10+ seconds. I need to get it down to something really small, like maybe 0.5 seconds for one table, and faster per table if it gets multiple tables.
Now, there are many ways I can optimize this, as I have researched: writing selects in the query to not have everything selected fully, using query builder instead of this pattern is another (I don't think that will help much though), and running direct raw SQL is the last option. I know how to write SQL in general (I have worked as a DWH developer in the past), and I can optimize the raw query for sure in a way that it takes a small amount of time, but I don't know if I can manage to get it down to milliseconds.
I wonder what are some best practices and ways I can handle this type of problem. It seems like something that most people would face at some point. One thing that will not work though is simple caching, because the properties of the table change left and right - they will most likely be completely different for different requests.