2

I have the following table structure:

EVENT_ID(INT)    EVENT_NAME(VARCHAR)    EVENT_DATE(DATETIME)    EVENT_OWNER(INT)

I need to add the field EVENT_COMMENTS which should be a text field or a very big VARCHAR.

I have 2 places where I query this table, one is on a page that lists all the events (in that page I do not need to display the event_comments field).

And another page that loads all the details for a specific events, which I will need to display the event_comments field on.

Should I create an extra table with the event_id and the event_comments for that event? Or should I just add that field on the current table?

In other words, what I'm asking is, if I have a text field in my table, but I don't SELECT it, will it affect the performance of the queries to my table?

0

6 Answers 6

2

Adding a field to your table makes it larger.

This means that:

  • Table scans will take more time
  • Less records will fit into a page and hence into the cache, thus increasing the risk of cache misses

Selecting this field with a join, however, would take more time.

So adding this field into this table will make the queries which don't select it run slower, and those which do select it run faster.

Sign up to request clarification or add additional context in comments.

3 Comments

So if I break it down: on the same table: event list page (not selecting the text field) will load slower, event details page (selecting the text field) will load faster. on different tables: event list page will load faster, event details page will load slower. Correct?
I don't know anything about the internal representation of records in MySQL, but it's not immediately clear to me why more columns necessarily slow table scans. If each row contains the offsets to the column data and to the next row in the table are you just talking about the seek() times to the offsets, or is there something else going on?
@Larry: what is faster: to read 100 or 1000 pages from a disk or RAM?
1

Yes, it affect the performance. At least, according to this article published yesterday.

According to it, if you don't want to suffer performance issues, it's better to put them in a separate table and JOIN them when needed.

This is the relative section:

Try to limit the number of columns in a table. Too many columns in a table can make the scan time for queries much longer than if there are just a few columns. In addition, if you have a table with many columns that aren't typically used, you are also wasting disk space with NULL value fields. This is also true with variable size fields, such as text or blob, where the table size can grow much larger than needed. In this case, you should consider splitting off the additional columns into a different table, joining them together on the primary key of the records

3 Comments

I don't think too many columns is going from 4 to 5. Also the question doesn't specify that not there would ever be a null value in comments.
I disagree with this argument in this context. Although it's right to make the table bigger and would increase the scan time, that's only marginally when indexing is done right. when that new column is not included in the index--and I don't see a reason for that--the index size will remain the same. I don't even think that cache/hit ratio is an aspect in this context, because an independent table would need space as well, AND an additional index....
I see. Well, I'll leave the answer as a reference for the OP. I think the best bet, if performance is important, is to profile both approach and - if there is no clear winner - to prefer the extra column in the table.
0

You should put in on the same table.

3 Comments

I love it when two answers say the exact opposite.
I hate it when two answer say the exact opposite. ;-)
I (date(s) % 2 == 0 ? 'love' : 'hate') it when two comments say the exact opposite
0

Yes, it probably will affect other queries on the same table, and you should probably do it anyway, as you probably don't care.

Depending on the engine, blobs are either stored inline (MyISAM), partially off-page (InnoDB) or entirely off-page (InnoDB Plugin, in some cases).

These have the potential to decrease the number of rows per page, and therefore increase the number of IO operations to satisfy some query.

However, it is extremely unlikely that you care, so you should just do it anyway. How many rows does this table have? 10^9 ? How many of them have non-null values for the blob?

3 Comments

the EVENTS table will only grow with time, so yes, let's say it has 10^9 rows. about 75%-80% will have non-null values for the blob.
In which case, you probably ought to think about that when you get nearer to 10^9. How many rows are you expecting in production this year?
I will probably get a few tens of thousands of rows a year. But still, I'm curious on which one performs better in a given situation where I have hundreds of thousands of rows.
0

It shouldn't be too much of a hit, but if you're worried about performance, you should always run a few benchmarks and run EXPLAINs on your queries to see the true effect.

Comments

0

How many events are you expecting to have?

Chances are that if you don't have a truckload of hundred of thousands events, your performance will be good in any case.

2 Comments

I will probably get a few tens of thousands of rows a year. But still, I'm curious on which one performs better in a given situation where I have hundreds of thousands of rows. I know my MySQL db can easily handle much more, but that wasn't the question.
I will leave further comments to the other experts here :-), anyway if you have time you could try to fill the table(s) with a few hundreds of rows and time the queries. Only for the timing, I'd use "SELECT SQL_NO_CACHE <rest of your query>" (of course not on the live site).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.