Many columns vs few tables - performance wise

Question

Yes, I am aware that data normalization should be my priority (as it is).

I've got a table with 65 columns storing vehicle data with columns: used_vehicle, color, doors, mileage, price and so forth, in total 65.
Now, I can divide that and have a Vehicle table, VehicleInterior, VehicleExterior, VehicleTechnical, VehicleExtra (all one-to-one with main Vehicle table).

Let's assume I'll have about 5 million rows (vehicles).

On SELECT with a WHERE clause: Will the performance be better searching through (both cases indexed at least on IDs):

Vehicle table with 65 columns or
Vehicle table with JOINS on four other tables (all with 5 million rows) to return all the data related to Vehicle?

(As per database engine, consider PostgreSQL and/or MySQL).

Truly appreciate any detailed insights you might have from your previous experience?

Updates will be rare if any, and selects will be mostly for all columns (vehicle details page) and main info (few columns) for search results list, and in fact maybe the best solution would be two tables: one with main info (few columns) and the other table with rest of the columns.

One reason do this (vertical partitioning) is if you have queries that deal with the columns from VehicleInterior, other queries that deal with columns from only VehicleTechnical, etc. Or if there are many rows/vehicles that have absolutely no info about (for example) VehicleExtra so instead of many rows with lots of nulls in the one table, you have rows in the rest of the tables and no rows in VehicleExtra — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Apr 29, 2015 at 19:20

Community · Accepted Answer · 2017-04-13 12:42:44Z

Assuming we are talking about 1:1 relationships among all tables.

Overall storage is practically always (substantially) cheaper with a single table instead of multiple tables in 1:1 relationship. Each row has 28 bytes of overhead, plus typically a few more bytes for extra padding. And you need to store the PK column with every table. And have a separate (redundant) index on each of these columns ... Size does matter for performance.

This is even true if many columns are NULL in most rows because NULL storage is very cheap:

Configuring PostgreSQL for read performance

While retrieving all columns a single table is substantially faster than 5 tables joined together. It's also much simpler. Five tables may be tricky to join if not all rows are present in all tables. With WHERE conditions targeting a single table, it's easy enough to append other tables with LEFT JOIN. Not as trivial if you have predicates on multiple tables ...

Vertical partitioning may still improve performance of certain queries. For example, if 90 % of your queries retrieve the same 5 columns out of the 65 available, this would be faster with a table just holding these 5 columns.

What is retrieved from disk during a query? (my answer)

OTOH, you might be able to cater for such queries on a few selected columns with a "covering" index allowing for index-only scans.

Another candidate for vertical partitioning: If you have lots of updates on just a few columns, while the rest hardly ever changes. It might be considerably cheaper to split rows in such a case, since Postgres writes a new row version for every update. There are exceptions for big values stored out-of-line ("TOASTed"). More details:

What is retrieved from disk during a query? (Daniel's answer)
Update all columns from another table

It really depends on the complete situation. If in doubt, go with the simple solution of having a single table, especially if it portraits reality well: In your example, those are all attributes of a car and make sense together.

A single table with a multicolumn index on the few columns to allow index-only scans for the result list might be the best route. (Be aware that column sequence matters in btree indexes.) Joins are not that expensive, but it will still be faster without join. The added storage size and spread-out of data for multiple tables may be the bigger slow-down (more data pages to read for each query). — Erwin Brandstetter
– Erwin Brandstetter, Commented Apr 29, 2015 at 20:26
For a 1:N relation you need two separate tables anyway. Except if you cram multiple rows into an array or document type. Then it depends. The principles outlined here apply regardless. Your access patterns and index strategies can make a difference. Ask a new question if you want to be more specific. — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 31, 2018 at 11:52

Sir Swears-a-lot · Accepted Answer · 2021-04-30 04:20:56Z

1

A select on a single table may be faster. (But not always). If you had all data in a single flat table, once you have found your vehicle you already have all the details. However that may involve more I/O and potentially more delay.

You also lose the efficiency of normalization. For example if 1 car had many models with different options. With normalised data, you could potentially return the entire record set with less IO than if it was in a fact table. Even though the db engine might have to do more computation, it may still be faster.

Is this a reference db of all cars? Or a list of second hand vehicles? Would there be many examples of the same make/model with the same options?

I should qualify my answer as being generic rdbms rather than Postgres-specific. I defer to Erwin's detailed answer specific to Postgres.

edited Apr 30, 2021 at 4:20

answered Apr 29, 2015 at 19:30

Sir Swears-a-lot

3,2533 gold badges31 silver badges48 bronze badges

vehiclemake and vehiclemodel are different tables, so vehicle table has foreign keys of vehiclemake and vehiclemodel. i don't think normalization is a problem here. i understand that select on single table would be faster, however we have a different situation, how will the row with many columns affect the performance and so forth versus tables with less columns (but few tables - 5 of them with joins)

Urim Kurtishi
– Urim Kurtishi

2015-04-29 19:45:03 +00:00
Commented Apr 29, 2015 at 19:45

Add a comment |

Stack Exchange Network

Many columns vs few tables - performance wise

2 Answers 2

Your Answer

Linked

Hot Network Questions

Many columns vs few tables - performance wise

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions