5

I have a list of data in a txt file like this

Date,Lat,Lon,Depth,Mag

20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,37.4172,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,59.7753,-152.2192,86.34,1.48
20000101,34.5230,-116.2410,11.63,1.61
20000101,59.5369,-153.1360,100.15,1.62
20000101,44.7357,-110.7932,4.96,2.20
20000101,34.6320,-116.2950,9.00,1.73
20000101,44.7370,-110.7938,5.32,1.75
20000101,35.7040,-117.6320,4.15,1.45
20000101,41.9270,20.5430,10.00,4.80

my assignment is to sort these data by each criterion ex) sort by date, latitude and longtitude

i tried bubble sort like this

if ( Double.parseDouble(a[0].split(",")[1]) <  Double.parseDouble(a[1].split(",")[1]))

this works but takes too much time

theres 40000 data in the txt file

is there any alternative way to sort these data?

2
  • 4
    Store each line in a class object and then define different comparators for the list of that class object. Use this as an refenrece stackoverflow.com/questions/5245093/… Commented Oct 23, 2014 at 17:18
  • 3
    how about merge sort? O(nlogn) Commented Oct 23, 2014 at 17:19

2 Answers 2

4

Try a merge sort. Merge sort has a worst case performance of O(n log n). Bubble sort's worst case time is O(n^2).

Sign up to request clarification or add additional context in comments.

1 Comment

and if you don't know what this notation means, it's Big Oh Notation. O(nlogn) is better than O(n^2), meaning that, in the worst case, Merge Sort will perform way better than Bubble Sort. See here for more: en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions (this will be important for you going forward).
0

I'm probably ruining some students’ homework assignment, but here goes…

As suggested on the Question, the natural way in Java is to create a class to represent your data. Then implement a Comparator to be passed to the utility method Collections.sort.

On my MacBook Pro 2.3 GHz Intel Core i7 running a Parallels virtual machine with Java 8, a data set of 42,000 elements takes between 45-90 milliseconds to sort.

I changed your example data to be more interesting, introducing some varied dates and duplicate latitudes.

20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,34.6920,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,34.6920,-152.2192,86.34,1.48
20000102,34.6320,-116.2410,11.63,1.61
20000102,59.5369,-153.1360,100.15,1.62
20000102,44.7357,-110.7932,4.96,2.20
20000102,34.6320,-116.2950,9.00,1.73
20000102,34.6320,-110.7938,5.32,1.75
20000102,34.6320,-117.6320,4.15,1.45
20000102,41.9270,20.5430,10.00,4.80

My GeoReading class to represent the data.

class GeoReading
{

    LocalDate localDate = null;
    BigDecimal latitude = null;
    BigDecimal longitude = null;
    BigDecimal depth = null;
    BigDecimal magnitude = null;

    public GeoReading( String arg )
    {
        // String is comma-separated values of: Date,Lat,Lon,Depth,Mag
        List<String> items = Arrays.asList( arg.split( "\\s*,\\s*" ) ); // Regex explained here: http://stackoverflow.com/a/7488676/642706
        this.localDate = ISODateTimeFormat.basicDate().parseLocalDate( items.get( 0 ) );
        this.latitude = new BigDecimal( items.get( 1 ) );
        this.longitude = new BigDecimal( items.get( 2 ) );
        this.depth = new BigDecimal( items.get( 3 ) );
        this.magnitude = new BigDecimal( items.get( 4 ) );
    }

    @Override
    public String toString()
    {
        return "GeoReading{" + "localDate=" + localDate + ", latitude=" + latitude + ", longitude=" + longitude + ", depth=" + depth + ", magnitude=" + magnitude + '}';
    }

}

Here is the Comparator implementation.

class GeoReadingAscendingComparator implements Comparator<GeoReading>
{

    @Override
    public int compare( GeoReading o1 , GeoReading o2 )
    {
        int localDateCompare = o1.localDate.compareTo( o2.localDate );
        if ( localDateCompare != 0 ) { // If not equal on this component, so compare on this.
            return localDateCompare;
        }

        int latitudeCompare = o1.latitude.compareTo( o2.latitude );
        if ( latitudeCompare != 0 ) { // If not equal on this component, so compare on this.
            return latitudeCompare;
        }

        return o1.longitude.compareTo( o2.longitude );

    }
}

Main code.

Path path = Paths.get( "/Users/basil/lat-lon.txt" );  // Path for Mac OS X.
try {
    List<GeoReading> list = new ArrayList<>();
    Stream<String> lines = Files.lines( path );
    lines.forEach( line -> list.add( new GeoReading( line ) ) );
    // Take those 14 lines and multiply to simulate large text file. 14 * 3,000 = 42,000.
    int count = 3000;
    List<GeoReading> bigList = new ArrayList<>( list.size() * count ); // Initialze capacite to expected number of elements.
    for ( int i = 0 ; i < count ; i++ ) {
        bigList.addAll( list );
    }
    long start = System.nanoTime();
    Collections.sort( bigList , new GeoReadingAscendingComparator() );
    long elapsed = ( System.nanoTime() - start );
    System.out.println( "Done sorting the GeoReading list. Sorting " + bigList.size() + " took: " + TimeUnit.MILLISECONDS.convert( elapsed , TimeUnit.NANOSECONDS ) + " ms ( " + elapsed + " nanos )." );

    System.out.println( "Dump…" );
    for ( GeoReading g : bigList ) {
        System.out.println( g );
    }
} catch ( IOException ex ) {
    System.out.println( "ERROR - ex: " + ex );
}

In the real world I would add some defensive-programming code to verify the incoming data. Data from external sources is always defective and/or changing.

5 Comments

Why the BigXXX fields? Why not doubles and ints?
@user949300 Accuracy > Performance. I have a habit of using BigDecimal in my usual projects working with money and such where accuracy is required. If this scientific data could tolerate the inaccuracy of floating-point drift, then go for it. Might be faster, though for a total execution time of less than a second I would consider that premature optimization.
OP has 40K records, 5 fields each, so using BigXXX is 200K Objects. An Object is, very roughly, 10 extra bytes over a primitive, so that's 2MB. Which, I guess, isn't all that much nowadays, but it is something.
@user949300 BigDecimal is at least 2-3 times bigger than just the minimal Object overhead you mentioned. But still not a large amount of memory on multi-gig machines. If memory were an issue, I would move the data to a database like Postgres. So, again, it is not a matter of preference but need. If you are certain that floating-point inaccuracy is tolerable then use a float (32-bit) or double (64-bit), better for both memory and speed. If accuracy is needed, always use BigDecimal.
For those who are unaware, read about how floating-point number technology trades away accuracy for performance (speed of execution). The floating-point primitive data types in Java are float (32-bit) and double (64-bit).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.