1

I strings in the format of name:key:dataLength:data and these strings can often be chained together. for example "aNum:n:4:9879aBool:b:1:taString:s:2:Hi" this would map to an object something like:

{
  aNum: 9879,
  aBool: true,
  aString: "Hi"
}

I have a method for parsing a string in this format but I'm not sure whether it's use of substring is the most efficient way of pprocessing the string, is there a more efficient way of processing strings in this fashion (repeatedly chopping off the front section):

Map<string, dynamic> fromString(String s){
  Map<String, dynamic> _internal = new Map();
  int start = 0;
  while(start < s.length){
    int end;
    List<String> parts = new List<String>(); //0 is name, 1 is key, 2 is data length, 3 is data
    for(var i = 0; i < 4; i++){
      end = i < 3 ? s.indexOf(':') : num.parse(parts[2]);
      parts[i] = s.substring(start, end);
      start = i < 3 ? end + 1 : end;
    }
    var tranType = _tranTypesByKey[parts[1]]; //this is just a map to an object which has a function that can convert the data section of the string into an object
    _internal[parts[0]] = tranType._fromStr(parts[3]);
  }
  return _internal;
}
11
  • Can you please provide information about the expected size of such strings or how many such strings you want to process at once. It depends a lot of such information what strategy makes sense or if it makes sense at all to invest time for optimizations. Commented Feb 27, 2014 at 12:29
  • the data part of the string could be absolutely anything in terms of size and content, and the chaining of these objects could also be enormous, I wouldnt expect them to be my personal reason for using them will only use smal simple objects, but the classes should support objects with any number of properties. Commented Feb 27, 2014 at 12:32
  • Then you also might consider to do async processing using streams and do the parsing with a Finite State Machine, so you don't need to have more than one copy in memory at the same time. You could also already start processing the data while the data is still beeing recieved. Probably slower for small chunks but might be faster for huge data. Commented Feb 27, 2014 at 12:38
  • see also stackoverflow.com/questions/13221168 Commented Feb 27, 2014 at 12:44
  • Why don't you just use json or ProtocolBuffers and existing parsers? (of course only if you can decide which format to use for encoded data) Commented Feb 27, 2014 at 12:52

2 Answers 2

1

I would try s.split(':') and process the resulting list. If you do a lot of such operations you should consider creating benchmarks tests, try different techniques and compare them.

If you would still need this line

 s = i < 3 ? s.substring(idx + 1) : s.substring(idx);

I would avoid creating a new substring in each iteration but instead just keep track of the next position.

Sign up to request clarification or add additional context in comments.

2 Comments

the only reason I didnt go with split(':') is because I can't guarantee that the data part of the string won't have : in it so I would end up haivng to join those bits back up again. nice tip on the keeping track of the next position rather than repeatedly overwriting the original string though.
I see. You delimiter is then :x:y: where x is some type information and y is the field identifier. You could try regex for this. I heard Dart regex isn't very fast yet but I would consider and benchmar it.
1

You have to decide how important performance is relative to readability and maintainability of the code.

That said, you should not be cutting off the head of the string repeatedly. That is guaranteed to be inefficient - it'll take time that is quadratic in the number of records in your string, just creating those tail strings.

For parsing each field, you can avoid doing substrings on the length and type fields. For the length field, you can build the number yourself:

int index = ...;
// index points to first digit of length.
int length = 0;
int charCode = source.codeUnitAt(index++);
while (charCode != CHAR_COLON) {
   length = 10 * length + charCode - 0x30;
   charCode = source.codeUnitAt(index++);
}
// index points to the first character of content.

Since lengths are usually small integers (less than 2<<31), this is likely to be more efficient than creating a substring and calling int.parse.

The type field is a single ASCII character, so you could use codeUnitAt to get its ASCII value instead of creating a single-character string (and then your content interpretation lookup will need to switch on character code instead of character string).

For parsing content, you could pass the source string, start index and length instead of creating a substring. Then the boolean parser can also just read the code unit instead of the singleton character string, the string parser can just make the substring, and the number parser will likely have to make a substring too and call double.parse.

It would be convenient if Dart had a double.parseSubstring(source, [int from = 0, int to]) that could parse a substring as a double without creating the substring.

1 Comment

thanks this was just the sort of alternative ideas I was looking for

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.