1

I'm creating an online dictionary and I have to use three different dictionaries for this purpose: everyday terms, chemical terms, computer terms. I have tree options:

1) Create three different tables, one table for each dictionary

2) Create one table with extra columns, i.e.:

id    term    dic_1_definition    dic_2_definition    dic_3_definition
----------------------------------------------------------------------
1     term1   definition
----------------------------------------------------------------------
2     term2                       definition
----------------------------------------------------------------------
3     term3                                           definition
----------------------------------------------------------------------
4     term4                       definition
----------------------------------------------------------------------
5     term5   definition                              definition
----------------------------------------------------------------------
etc.

3) Create one table with an extra "tag" column and tag all my terms depending on it's dictionary, i.e.:

id    term     definition    tag
------------------------------------
1     term1    definition    dic_1
2     term2    definition    dic_2
3     term3    definition    dic_3
4     term4    definition    dic_2
5     term1    definition    dic_2
etc.

A term can be related to one or more dictionaries, but have different definitions, let's say a term in everyday use can differ from the same term in IT field. That's why term1 (in my last) table can be assigned two tags - dic_1 (id 1) and dic_2 (id 5).

In future I'll add more dictionaries, so there probably will be more than three dics. I think if I'll use option 2 (with extra columns) I'll get in future one table and many many columns. I don't know if it's bad for performance or not.

Which option is the best approach in my case? Which one is faster? Why? Any suggestions and other options are greatly appreciated.

Thank you.

3
  • How much data is being loaded into this, a full dictionary or a couple of hundred to thousand words? Commented Nov 30, 2009 at 13:07
  • for example, first table has more than 200 000 rows. So I suppose it'll be around 500 000 rows. Commented Nov 30, 2009 at 13:11
  • The third approach is better, from my view. I have done a little modification in my post below. Commented Nov 30, 2009 at 13:53

9 Answers 9

6

2) Create one table with extra column

You definitely shouldn't be using the 2nd approach. What if in the future you decide that you want 10 dictionaries? You would have to create an additional 10 columns which is madness..

What you should do is create a single table for all your dictionaries, and a single table for all your terms and a single table for all your definitions, that way all your data is grouped together in a logical fashion.

Then you can create a unique ID for each of your dictionaries, which is referenced in the terms table. Then all you need is a simple query to obtain the terms for a particular dictionary.

Sign up to request clarification or add additional context in comments.

Comments

5

I think you should have a lookup table for your dictionary types

DictionaryType(DTId, DTName)

Have another Table for you terms

Terms(TermID, TermName)

Then your definitions

Difinitions(DifinitionId, TermID, Definition, DTId)

This should work.

9 Comments

Whats the DictionaryType, your answer is the best but i dont see how that table is needed at all.
The DictionaryType table contain all dictionary names. He said, "I'm creating an online dictionary and I have to use three different dictionaries"
What if I have 3 the same terms with different definitions? Will this term have three id's or one id and 3 definitions?
@Anthony: The term would have one record in the Terms table and thus one identifier, and three records in the Definitions table, one for each dictionary type.
This is the way to go, if you want to be flexible with all three objects in the future. This way you can use ORM easily and add properties to all your three object types independently of the other object types.
|
2

Option 3 sounds like to most appropriate choice for your scenario. It makes the queries a little simpler and is definitely more maintainable in the long run.

Option 2 is definitely not the way to go because you will end up with a lot of null values and writing queries against such a table will be a nightmare.

Option 1 is not bad but before your application could query it has to deceide which table to query against and that could be a problem.

So option 3 would result in simple queries like:

Select term, definition from table where tag = 'dic_1'

You may even create another tag table to keep info about the tags themselves.

1 Comment

Instead of using a tag, he can create a new dictionary table (id, name) and use the id in the table. Takes less memory and is faster to check and join on.
2

I have developed similar project and my design was as follows. Storing words, definitons and dictionaries in different tables is a flexible choice especially where you will add new dictionaries in future.

alt text http://img300.imageshack.us/img300/6550/worddict.png

2 Comments

Could I ask the name of the UML tool you used?
Sure, I use MySQL Workbech for that purpose.
1

Data Normalization .. I would go with 3, then you don't have to do any fancy queries to identify how many definitions are applicable per a given term

Comments

1

There's always an "it depends..."

Having said that, option 2 will usually be a bad choice - both from the purist perspective (Data Normalisation) and the practical perspective - you have to alter the table definition to add a new dictionary (or remove an old one)

If your main access is always going to be looking for a matching term, and the dictionary name ('everyday', 'chemical', 'geek') is an attribute, then option 3 makes sense.

If on the other hand your access is always primarily by dictionary type as well as term, and dictionary 1 is huge but rarely used, while dictionaries 2..n are small but commonly used, then option 1 might make more sense (or option 1a => 1 table for rarely used dictionaries, another for heavily used dictionaries)... this is a very hypothetical case !

1 Comment

+1 I agree with you. The requirements here a far too vague, resulting in the 'accepted answer' being totally over-'solved'. That said, working off the little provided; I'd go with a variation on #3.
1

Your database structure should contain data, the structure itself should not be data. This rules out option 2 immediately, unless you create the different tables in order to build separate applications running on the different dictionaries. If they are being shared, then it is the wrong way to do it.

Option 1 requires a database modification and queries to be rewritten in order to accommodate addition of new dictionaries. It also adds excessive complication to simple queries, such as "what dictionaries are this word in?"

Option 3 Is the most flexible and best choice here. If your data grows too large you can eventually use DB side details like table partitioning to speed up things.

Comments

1

You want to fetch data based on the dictionary type, that means that the dictionary type is data.

Data should be in the fields of the tables, not as table names or field names. If you don't have the data in the fields, you have a data model that needs changes if the data chances, and you need to create queries dynamically to get the data.

The first option uses the dictionary type as table names.

The second option uses the dictionary type as field names.

The third option correctly places the dictionary type as data in a field.

However, the term and the tag should not be strings, they should rather be foreign keys to tables where the terms and dictionary types are defined.

Comments

0

The requirements here are far too vague, resulting in the 'accepted answer' being totally over-'solved'. The requirements need to provide more information about how the dictionaries will be used.

That said, working off the little provided; I'd go with a variation on #3.

  • Number 1 is perfectly viable if the dictionaries will be used entirely independently, and the only reason the concept of shared terms was mentioned is that it just happens to be a coincidental possibility.
  • Ditch 2; it unnecessarily leads to NULL values in columns, and DB designs don't like that.
  • Number 3 is the best, but ditch the artificial key, and key on Term + Tag. Apart from the artificial key creating the possibility of duplicate entries (by Term + Tag). If no other tables reference TermDefinitions, the key is a waste; if something does; then they say (for example) "I'm referencing TermDefinition #3... Uhhm, whatever that is. :S"

In a nutshell, nothing provided so far in the requirement indicates any need for anything more complicated than option 3.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.