Sort Java String array by multiple numbers

Question

I have a list of data in a txt file like this

Date,Lat,Lon,Depth,Mag

20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,37.4172,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,59.7753,-152.2192,86.34,1.48
20000101,34.5230,-116.2410,11.63,1.61
20000101,59.5369,-153.1360,100.15,1.62
20000101,44.7357,-110.7932,4.96,2.20
20000101,34.6320,-116.2950,9.00,1.73
20000101,44.7370,-110.7938,5.32,1.75
20000101,35.7040,-117.6320,4.15,1.45
20000101,41.9270,20.5430,10.00,4.80

my assignment is to sort these data by each criterion ex) sort by date, latitude and longtitude

i tried bubble sort like this

if ( Double.parseDouble(a[0].split(",")[1]) <  Double.parseDouble(a[1].split(",")[1]))

this works but takes too much time

theres 40000 data in the txt file

is there any alternative way to sort these data?

Store each line in a class object and then define different comparators for the list of that class object. Use this as an refenrece stackoverflow.com/questions/5245093/… — StackFlowed
– StackFlowed, Commented Oct 23, 2014 at 17:18

Dan Chrostowski · Accepted Answer · 2014-10-23 17:22:18Z

4

Try a merge sort. Merge sort has a worst case performance of O(n log n). Bubble sort's worst case time is O(n^2).

answered Oct 23, 2014 at 17:22

Dan Chrostowski

3472 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

muttley91 Over a year ago

and if you don't know what this notation means, it's Big Oh Notation. O(nlogn) is better than O(n^2), meaning that, in the worst case, Merge Sort will perform way better than Bubble Sort. See here for more: en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions (this will be important for you going forward).

Basil Bourque · Accepted Answer · 2014-10-24 04:37:40Z

0

I'm probably ruining some students’ homework assignment, but here goes…

As suggested on the Question, the natural way in Java is to create a class to represent your data. Then implement a Comparator to be passed to the utility method Collections.sort.

On my MacBook Pro 2.3 GHz Intel Core i7 running a Parallels virtual machine with Java 8, a data set of 42,000 elements takes between 45-90 milliseconds to sort.

I changed your example data to be more interesting, introducing some varied dates and duplicate latitudes.

20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,34.6920,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,34.6920,-152.2192,86.34,1.48
20000102,34.6320,-116.2410,11.63,1.61
20000102,59.5369,-153.1360,100.15,1.62
20000102,44.7357,-110.7932,4.96,2.20
20000102,34.6320,-116.2950,9.00,1.73
20000102,34.6320,-110.7938,5.32,1.75
20000102,34.6320,-117.6320,4.15,1.45
20000102,41.9270,20.5430,10.00,4.80

My GeoReading class to represent the data.

class GeoReading
{

    LocalDate localDate = null;
    BigDecimal latitude = null;
    BigDecimal longitude = null;
    BigDecimal depth = null;
    BigDecimal magnitude = null;

    public GeoReading( String arg )
    {
        // String is comma-separated values of: Date,Lat,Lon,Depth,Mag
        List<String> items = Arrays.asList( arg.split( "\\s*,\\s*" ) ); // Regex explained here: http://stackoverflow.com/a/7488676/642706
        this.localDate = ISODateTimeFormat.basicDate().parseLocalDate( items.get( 0 ) );
        this.latitude = new BigDecimal( items.get( 1 ) );
        this.longitude = new BigDecimal( items.get( 2 ) );
        this.depth = new BigDecimal( items.get( 3 ) );
        this.magnitude = new BigDecimal( items.get( 4 ) );
    }

    @Override
    public String toString()
    {
        return "GeoReading{" + "localDate=" + localDate + ", latitude=" + latitude + ", longitude=" + longitude + ", depth=" + depth + ", magnitude=" + magnitude + '}';
    }

}

Here is the Comparator implementation.

class GeoReadingAscendingComparator implements Comparator<GeoReading>
{

    @Override
    public int compare( GeoReading o1 , GeoReading o2 )
    {
        int localDateCompare = o1.localDate.compareTo( o2.localDate );
        if ( localDateCompare != 0 ) { // If not equal on this component, so compare on this.
            return localDateCompare;
        }

        int latitudeCompare = o1.latitude.compareTo( o2.latitude );
        if ( latitudeCompare != 0 ) { // If not equal on this component, so compare on this.
            return latitudeCompare;
        }

        return o1.longitude.compareTo( o2.longitude );

    }
}

Main code.

Path path = Paths.get( "/Users/basil/lat-lon.txt" );  // Path for Mac OS X.
try {
    List<GeoReading> list = new ArrayList<>();
    Stream<String> lines = Files.lines( path );
    lines.forEach( line -> list.add( new GeoReading( line ) ) );
    // Take those 14 lines and multiply to simulate large text file. 14 * 3,000 = 42,000.
    int count = 3000;
    List<GeoReading> bigList = new ArrayList<>( list.size() * count ); // Initialze capacite to expected number of elements.
    for ( int i = 0 ; i < count ; i++ ) {
        bigList.addAll( list );
    }
    long start = System.nanoTime();
    Collections.sort( bigList , new GeoReadingAscendingComparator() );
    long elapsed = ( System.nanoTime() - start );
    System.out.println( "Done sorting the GeoReading list. Sorting " + bigList.size() + " took: " + TimeUnit.MILLISECONDS.convert( elapsed , TimeUnit.NANOSECONDS ) + " ms ( " + elapsed + " nanos )." );

    System.out.println( "Dump…" );
    for ( GeoReading g : bigList ) {
        System.out.println( g );
    }
} catch ( IOException ex ) {
    System.out.println( "ERROR - ex: " + ex );
}

In the real world I would add some defensive-programming code to verify the incoming data. Data from external sources is always defective and/or changing.

edited Oct 24, 2014 at 4:37

answered Oct 24, 2014 at 4:27

Basil Bourque

347k130 gold badges950 silver badges1.3k bronze badges

5 Comments

user949300 Over a year ago

Why the BigXXX fields? Why not doubles and ints?

Basil Bourque Over a year ago

@user949300 Accuracy > Performance. I have a habit of using BigDecimal in my usual projects working with money and such where accuracy is required. If this scientific data could tolerate the inaccuracy of floating-point drift, then go for it. Might be faster, though for a total execution time of less than a second I would consider that premature optimization.

user949300 Over a year ago

OP has 40K records, 5 fields each, so using BigXXX is 200K Objects. An Object is, very roughly, 10 extra bytes over a primitive, so that's 2MB. Which, I guess, isn't all that much nowadays, but it is something.

Basil Bourque Over a year ago

@user949300 BigDecimal is at least 2-3 times bigger than just the minimal Object overhead you mentioned. But still not a large amount of memory on multi-gig machines. If memory were an issue, I would move the data to a database like Postgres. So, again, it is not a matter of preference but need. If you are certain that floating-point inaccuracy is tolerable then use a float (32-bit) or double (64-bit), better for both memory and speed. If accuracy is needed, always use BigDecimal.

Basil Bourque Over a year ago

For those who are unaware, read about how floating-point number technology trades away accuracy for performance (speed of execution). The floating-point primitive data types in Java are float (32-bit) and double (64-bit).

Collectives™ on Stack Overflow

Sort Java String array by multiple numbers

2 Answers 2

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related