dart efficient string processing techniques?

Question

I strings in the format of name:key:dataLength:data and these strings can often be chained together. for example "aNum:n:4:9879aBool:b:1:taString:s:2:Hi" this would map to an object something like:

{
  aNum: 9879,
  aBool: true,
  aString: "Hi"
}

I have a method for parsing a string in this format but I'm not sure whether it's use of substring is the most efficient way of pprocessing the string, is there a more efficient way of processing strings in this fashion (repeatedly chopping off the front section):

Map<string, dynamic> fromString(String s){
  Map<String, dynamic> _internal = new Map();
  int start = 0;
  while(start < s.length){
    int end;
    List<String> parts = new List<String>(); //0 is name, 1 is key, 2 is data length, 3 is data
    for(var i = 0; i < 4; i++){
      end = i < 3 ? s.indexOf(':') : num.parse(parts[2]);
      parts[i] = s.substring(start, end);
      start = i < 3 ? end + 1 : end;
    }
    var tranType = _tranTypesByKey[parts[1]]; //this is just a map to an object which has a function that can convert the data section of the string into an object
    _internal[parts[0]] = tranType._fromStr(parts[3]);
  }
  return _internal;
}

Can you please provide information about the expected size of such strings or how many such strings you want to process at once. It depends a lot of such information what strategy makes sense or if it makes sense at all to invest time for optimizations. — Günter Zöchbauer
– Günter Zöchbauer, Commented Feb 27, 2014 at 12:29
the data part of the string could be absolutely anything in terms of size and content, and the chaining of these objects could also be enormous, I wouldnt expect them to be my personal reason for using them will only use smal simple objects, but the classes should support objects with any number of properties. — Daniel Robinson
– Daniel Robinson, Commented Feb 27, 2014 at 12:32
Then you also might consider to do async processing using streams and do the parsing with a Finite State Machine, so you don't need to have more than one copy in memory at the same time. You could also already start processing the data while the data is still beeing recieved. Probably slower for small chunks but might be faster for huge data. — Günter Zöchbauer
– Günter Zöchbauer, Commented Feb 27, 2014 at 12:38
Why don't you just use json or ProtocolBuffers and existing parsers? (of course only if you can decide which format to use for encoded data) — Günter Zöchbauer
– Günter Zöchbauer, Commented Feb 27, 2014 at 12:52

Günter Zöchbauer · Accepted Answer · 2014-02-27 12:27:42Z

1

I would try s.split(':') and process the resulting list. If you do a lot of such operations you should consider creating benchmarks tests, try different techniques and compare them.

If you would still need this line

 s = i < 3 ? s.substring(idx + 1) : s.substring(idx);

I would avoid creating a new substring in each iteration but instead just keep track of the next position.

answered Feb 27, 2014 at 12:27

Günter Zöchbauer

661k235 gold badges2.1k silver badges1.6k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Daniel Robinson Over a year ago

the only reason I didnt go with split(':') is because I can't guarantee that the data part of the string won't have : in it so I would end up haivng to join those bits back up again. nice tip on the keeping track of the next position rather than repeatedly overwriting the original string though.

Günter Zöchbauer Over a year ago

I see. You delimiter is then :x:y: where x is some type information and y is the field identifier. You could try regex for this. I heard Dart regex isn't very fast yet but I would consider and benchmar it.

lrn · Accepted Answer · 2014-02-28 09:44:40Z

You have to decide how important performance is relative to readability and maintainability of the code.

That said, you should not be cutting off the head of the string repeatedly. That is guaranteed to be inefficient - it'll take time that is quadratic in the number of records in your string, just creating those tail strings.

For parsing each field, you can avoid doing substrings on the length and type fields. For the length field, you can build the number yourself:

int index = ...;
// index points to first digit of length.
int length = 0;
int charCode = source.codeUnitAt(index++);
while (charCode != CHAR_COLON) {
   length = 10 * length + charCode - 0x30;
   charCode = source.codeUnitAt(index++);
}
// index points to the first character of content.

Since lengths are usually small integers (less than 2<<31), this is likely to be more efficient than creating a substring and calling int.parse.

The type field is a single ASCII character, so you could use codeUnitAt to get its ASCII value instead of creating a single-character string (and then your content interpretation lookup will need to switch on character code instead of character string).

For parsing content, you could pass the source string, start index and length instead of creating a substring. Then the boolean parser can also just read the code unit instead of the singleton character string, the string parser can just make the substring, and the number parser will likely have to make a substring too and call double.parse.

It would be convenient if Dart had a double.parseSubstring(source, [int from = 0, int to]) that could parse a substring as a double without creating the substring.

thanks this was just the sort of alternative ideas I was looking for

Collectives™ on Stack Overflow

dart efficient string processing techniques?

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related