In theory, a Java string can contain close to 2^31 characters and a Java array can contain close to 2^31 strings.
In practice (assuming Java 8, 64 bit, no oops1) the space utilization of String[] and String are as follows:
- a
String[] array needs 8 bytes per entry,
- a
String needs 2 bytes per character ... plus overheads of about 40 bytes per String.
It is easy to see that a maximal array of maximally sized strings would take more memory that than is addressible with 64 bit address, even assuming you could build a machine capable of holding that much memory. However that is just a theoretical concern ...
In your example:
My guess is that the space needed amounts to roughly 500 x 5,000,000 = 2.5GB heap space to represent the array and the strings. If you started by reading the entire record into memory as a String before splitting it, that could be as much as 7.5GB depending on how you read it. (But you can be smarter than that ...)
Is it a good practice to store all records in string array ?
It depends on what you intend to do with the records. Without more information we can't say whether it is a good idea.
Note that there is no such thing as "good practice" or "best practice" in the general sense. Solutions need to be designed for purpose, and judgement about them can only be made in context.
Will it casue any issue ?
As per the above, it could use a lot of heap space.
If yes how can I perform in efficient way?
We can't tell you that unless you explain clearly what you are actually going to do with the records in memory.
It also depends on what kind of efficiency you are concerned about. CPU utilization? Memory utilization? Software developer time?
Disk memory is sufficiently available.
That may or may not be relevant. It depends on what you are going to do with the records in memory.
1 - The amount of space used to represent strings is JVM dependent in a number of respects. For example, for Java 9 onwards, strings that consist of ASCII characters only need 1 byte per character.
So looking at your updated question, it is clear that reading the entire file into memory and splitting it is the wrong approach.
What you need to do is to read characters until you get a record; i.e until you get a ;. Then you split the record into fields based on ,. Then you process the fields and output them. Finally you discard that record and start reading the next one.
In other words you avoid creating a huge array of 5,000,000 String in memory.