0

Suppose I want to read a sequence of inputs, where each input is a tuple is of the form <string> , <integer>, <string>. Additionally, there can be arbitrary amount of whitespace around the commas. An easy way to do this in C/C++ is to use scanf with format string "%s , %d , %s". What is the equivalent function in python?

Suppose we knew that each input is on a separate line, then you could easily parse this in python using split and strip. But the newline requirement complicates things. Furthermore, we could even have weird inputs such as

<s11>, <i1> , <s12> <s21>, <i2> , <s22> Where s11, i1, s12 is the first input and s21, i2, s22 is the second. And scanf would still be able to handle this. How does one do it in python? I also don't want to take the entire input at once and parse it, since I know that there will be other inputs that don't fit this format later on, and I don't want to do the parsing manually.

3
  • so there is no comma in between <s12> and <s21>? Commented Nov 1, 2018 at 14:25
  • @onno That's correct. But there can be arbitrary amount of whitespace there too (including newline(s)). Commented Nov 1, 2018 at 14:26
  • You are attributing way too much power to scanf. The problems you list also appear when (mis)using scanf. As there is no such function in Python, you indeed have to write one yourself. Commented Nov 1, 2018 at 14:28

3 Answers 3

1

You should be able to first strip the whitespace, then split on commas, then handle the resulting strings and integers however you want. The regular expression s\+ matches any nonzero amount of whitespace characters:

input_string = "    hello     \n  \t   ,    10  ,   world   \n     "
stripped_string = re.sub('\s+', '', input_string)
substrings = stripped_string.split(',')
string1 = substrings[0]
integer1 = int(substrings[1])
string2 = substrings[2]

You'd just have to put those last three lines inside a loop if you need to handle multiple s,i,s tuples in a row.

EDIT: I realize now you want to interpret any whitespace as a comma. I'm not sure how wise that is, but a hacky way to do it is to replace all the commas with whitespace, split on whitespace, and call it a day

input_string = "    hello     \n  \t   ,    10     world   \n     "
stripped_string = re.sub(',', ' ', input_string)
substrings = stripped_string.split()
string1 = substrings[0]
integer1 = int(substrings[1])
string2 = substrings[2]
Sign up to request clarification or add additional context in comments.

Comments

0

For delimited format it's pretty easy with the csv module. You can plugin any kind of file-like inputs to it.

And you handle stripping white spaces and type casting downstream. Here's a sample to get you going:

In [25]: import fileinput

In [26]: import csv

In [28]: reader = csv.reader(fileinput.input())

In [29]: for l in reader:
    ...:     print(l)
    ...:
stdin input -> a,b, c, d
print output -> ['a', 'b', ' c', ' d   ']

Comments

0

A simple equivalent could be as follows (results are returned as strings):

def scan(s, fmt) :
  result = []

  ind = 0; # s upto ind has been consumed
  slen = len(s)

  i = 0
  while i < len(fmt) :  
    c = fmt[i] 
    if c == "%" and i < len(fmt) - 1 : 
      d = fmt[i+1] 
      if d == "s" : 
        endstring = s[ind:slen].find(" ") 
        if endstring == -1 : 
          result.append(s[ind:slen])
          return result 
        else : 
          result.append(s[ind:(ind+endstring)])
          ind = ind + endstring
        i = i + 1  
      else : 
        if d == "d" : 
          inchars = "" 
          for j in range(ind, slen) : 
            x = s[j] 
            if x.isdecimal() :
              inchars = inchars + x
            else : 
              break
          result.append(inchars)
          ind = ind + len(inchars)
          i = i + 1
        else : 
          if d == "f" : 
            incharsf = "" 
            for j in range(ind, slen) : 
              y = s[j] 
              if y.isdecimal() or y == "." :
                incharsf = incharsf + y
              else : 
                break
            result.append(incharsf)
            ind = ind + len(incharsf)
            i = i + 1
    else :
      if s[ind] == c :  
        ind = ind+1 
      else :  
        return result
    i = i + 1
  return result 


print(StringLib.scan("30=100.5#45", "%d=%f#%d"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.