1

Hi I'm not sure if this is the right place to ask this question. Anyway I have written this code to parse a molecule formula and split it into atoms and amount of each atoms.

For instance if I input "H2O" I will for the atom array get {"H", "O"} and in the amount array I will get {2, 1}. I haven't taken account for amount that is larger than 9, since I don't think there are molecule which can bind to something that is larger than 8.

Anyway I'm quite newbie, so I wonder if this piece of code can be made better?

   string formula = "H2O";
   int no, k = 0, a = 0;
   string atom[10];
   int amount[10];
   bool flag = true;
   stringstream ss(formula);

   for(int i = 0; i < formula.size(); ++i)
   {

      no = atoi(&formula[i]);
      if(no == 0 && (flag || islower(formula[i]) )  )
      {
         cout << "k = " << k << endl;
         atom[k] += formula[i];
         flag = false;
         cout << "FOO1 " << atom[k] << endl;
         amount[a] = 1;
      }
      else if(no != 0)
      {
         amount[a] = no;
         cout << "FOO2 " << amount[a] << endl;
         a++;
         flag = true;
         k++;
      }
      else
      {
         k++;
         a++;
         atom[k] = formula[i];
         cout << "FOO3 " << atom[k] << endl;
         amount[a] = 1;

         flag = false;
      }

      cout << no << endl;
   }
14
  • 2
    "I don't think there are molecule which can bind to something that is larger than 8". Long chain hydrocarbons can be expressed as CNHM, with N and M large. Commented Jan 14, 2011 at 13:15
  • "I wonder if this piece of code can be made better?" CAn you be more specific? Is there anything in particular you are unhappy about? Commented Jan 14, 2011 at 13:16
  • This is definitely the right place to ask your question :-) Btw shouldn't the amount array be {2, 1} for input "H2O"? Commented Jan 14, 2011 at 13:16
  • @Peter - Yes I made a typo there, going to correct that now Commented Jan 14, 2011 at 13:17
  • 2
    it is a very dangerous assumption, that it is not possible that there are bigger molecules. Caffeine e.g. has the molecule formular C8H10N4O2. I am more the python guy, and I would create a regex to find all Atoms and their occurrences and add them to the list. Commented Jan 14, 2011 at 13:20

3 Answers 3

2

Have you considered an approach with regular expressions? Do you have access to Boost or TR1 regular expressions? An individual atom and its count can easily be represented as:

(after edits based on comments)

([A-Z][a-z]{0,2})([0-9]*)

Then you just need to repeatedly find this pattern in your input string and extract the different parts.

Sign up to request clarification or add additional context in comments.

4 Comments

I have no access to those things, and I'm not so good with regular expression :(
I think I would take this approach too. As is usual the regex is never quite as simple as one thinks. I suggest [A-Z][a-z]{0,2}[0-9]{1,2} to cover the initial capital and optional, one or two lower case letters and two digits. I am sure this is not perfect either because you don't want numbers beginning with zero. I find it easier to perfect a regex than many lines of code for parsing.
@T33C: Good call on the modifications. I have edited my answer to include your suggestions.
there is still a little mistake with the regex. it is also possible that there is no number at all. i would just use [0-9]* even if it is possible to have a number starting with a zero. or ([1-9]?|[1-9][0-9]*) to be correct.
1

There are many potential improvements that could be made, of course. But as a newbie, I guess you only want the immediate ones. The first improvement is to change this from a program that has a hard coded formula to a program that reads a formula from the user. Then try testing yout program by inputting different formulae, and check that the output is correct.

3 Comments

A the hardcoded is just for testing purpose, I didn't want to type in molecule formula all the time ^^
but you could a textfile, with some testcases
For such purposes, I usually make the program accept such values as command-line parameters, and set the debugger to run it with the desired testing input. In my situation, this is typical paths to files to process, which would be even more cumbersome to always enter manually. ;)
1

What if you modified it to be like this algorithm? This would maybe be less code, but would definitely be more clear:

// while not at end of input
     // gather an uppercase letter
     // gather any lowercase letters
     // gather any numbers
     // set the element in your array

This could be implemented with 3 very simple loops inside of your main loop, and would make your intentions to future maintainers much more obvious.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.