1

What is the exact point of using database if I have simple relations (95% queries are dependent on ID).

I am storing users and their stats.

Why would I use external database if I can have neat constructions like:

db.users[32] = something

Array of 500K users is not that big effort for RAM

Pros are:

  • no problematic asynchronity (instant results)
  • easy export/import
  • dealing with database like with a native object LITERALLY

ps. and considerations:

  • Would it be faster or slower to do collection[3] than db.query("select ...
  • I am going to store it as a file/s
  • There is only ONE application/process accessing this data, and the code is executed line by line - please don't elaborate about locking.
  • Please don't answer with database propositions but why to use external DB over native array/object - I have experience in a few databases - that's not the case.
  • What I am building is a client/gateway/server(s) game. Gateway deals with all users data, processing, authenticating, writing statistics e.t.c No other part of software needs to access directly to this data/database.
4
  • How do you recover if your application crashes? Databases are often overkill, but their ACID properties come in handy in a lot of scenarios. Commented Oct 10, 2012 at 0:25
  • "no problematic asynchronity (instant results)" - if you're at any point sharing the array between threads, this statement is patently false Commented Oct 10, 2012 at 0:27
  • Look into MongoDB for constructions like that. It's non-relational and you can store any type of data in JSON format without any type of schema restrictions. EDIT: Just realized you have Mongo in your tags. Commented Oct 10, 2012 at 0:28
  • You shouldn't assume 1 process. See: nodejs.org/docs/latest/api/cluster.html Commented Oct 11, 2012 at 2:12

2 Answers 2

2

It depends on the requirements for the durability, latency and lifetime of that data. In memory access to a data structure is almost always significantly faster than hopping the network to an external database but there's things to consider.

You can keep it solely in memory but if your process recycles for some reason, it's gone. That may be OK for your scenarios ...

You can also have issues if you have multiple front ends/processes with load balancing (as opposed to partitioning) or don't have affinity. In some scenarios like that, in memory state can be problematic. There's also options like memcached to address issues like that.

memcached is how facebook solved problems like these: http://www.facebook.com/note.php?note_id=39391378919

Similar to facebook, you can also persist data in a database (be it SQL or NoSQL like mongodb) and cache in memory for efficiency. If you cache in memory and it's backed by a database then you have to worry about the latency of that data and how to refresh it. memcached is a solution for scenarios like that as well. Either that or you write your own mechanism to piggy back data back, have polling (try to avoid though) etc... That's essentially what fb is doing - using databases but offloading the db load with distributed in memory caches. From that post:

memcached is a high-performance, distributed memory object caching system. Here at Facebook, we're likely the world's largest user of memcached. We use memcached to alleviate database load.

Sign up to request clarification or add additional context in comments.

3 Comments

A huge con for me is that you must handle all of the data locking unless you are doing immutable data.
@Suroot - agreed - goes back to the data requirements. If you go with the DB, they often offer locking and perhaps the in memory cache is read only cache with some latency?? Depends ... no one right answer here.
I agree that there is no one right answer. My comment was more aligned more with commenter Michael Stum who noted that the database will give ACID compliance and will handle the concurrent modifications for you (dealing with locking the data during writes).
1

This is going to be a more considerative answer than anything. One thing you need to consider here as well is your language. I am a PHP programmer and I am glad for databases.

Trying to store 500K user array in memory in PHP (and operate on it) would be a living nightmare, infact it probably would be for most languages. Databases implement searching tactics to overcome such scenarios using logarithmic time functions upon pre-defined indexes.

You have also got the cost factor. Storing it in a MySQL or MongoDB database on the same server is actually cheaper since you will most likely require less memory to hold your information.

I would seriously test your memory consumption under the load of such an array, I am also guessing this is just one array of many, right?

Would it be faster or slower to do collection[3] than db.query("select ...

Now that depends, I am unsure how node.js handles arrays and the iteration to a specific index within them but some languages don't do a O(log n) search on the index which means that you would just do a O(n) search, this would actually be slower than a straight call on the index of a SQL table. Fair enough, take into consideration the amount of time it would take for SQL to create the result set, write it to disk and then respond for node.js to pick it up and it would probably be slower.

So node.js would definitely be faster on a small index or object but on a much larger one...I am unsure.

There is only ONE application/process accessing this data, and the code is executed line by line - please don't elaborate about locking.

That's surprising. I have, before now, easily had to spin up more than one node.js server. Infact to keep up the ideal web hosting environment you should always have another server ready to come into the fray if your primary server fails (which, believe me it does...). With this in mind I think it is kind of weird you are not taking locking and a central store point for distributed data into account here.

2 Comments

On the multiple process node point, agreed. Note that to even exercise multiple cores in production you need to use something like cluster (one process per core spun up). Processes also recycle with something like forever so I think the Ops assumption that theres only one and only one process is a bit naive for production.
@bryanmac I never knew that about multicore setup, so even if he expects to use the full power of a modern server he will find that actually he needs a central datastore, that in itself kinda answers the question :) (I must admit I have only ever used node.js as a side language to do some polling etc)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.