0

Good morning I have thoroughly looked around to try figuring out a way to create a matlab like struct array in python. My input .csv file is header less

My matlab code

  dumpdata = csvread('dumpdata.csv'); 
  N_dumpdata_samples = length(dumpdata);  

  rec_sample_1second = struct('UTC_time',{},'sv_id_set',{},'pseudorange',
  {},'state',{});

  for s=1:1:N_dumpdata_samples

    rec_sample_1second(s).UTC_time = dumpdata(s,1);
   rec_sample_1second(s).UTC_time = round(rec_sample_1second(s).
   UTC_time * 10);
   rec_sample_1second(s).UTC_time = rec_sample_1second(s).
   UTC_time / 10;               

   for t=1:1:15

       rec_sample_1second(s).sv_id_set(t) = dumpdata(s,t+1);           
       rec_sample_1second(s).pseudorange(t) = dumpdata(s,t+16);       
       rec_sample_1second(s).state(t) = dumpdata(s,t+31);             
       end;
    end;

Trying to implement in python

   import numpy as np
   import pandas as pd

   df = pd.read_csv('path'/Dumpdata.csv',header=None)
   N_dumpdata_samples=len(df)
   structure={}
   structure["parent1"] = {}
  UTC_time=[]
  for s in range(N_dumpdata_samples):
      # structure['parent1']['UTC_time']=df[s,0] -> this line give error
      UTC_time=df['s',0]
  .......

My question is : How can I implement same logic and structure in python.

Thanks

2 Answers 2

1

In Octave:

>> data = struct('A',{}, 'B', {});
>> for s=1:1;5
       data(s).A = s
       for t=1:1:3
           data(s).B(t) = s+t
       end;
   end;

producing

>> data.A
ans =  1
ans =  2
ans =  3
ans =  4
ans =  5
>> data.B
ans =
   2   3   4
ans =
   3   4   5
ans =
   4   5   6
ans =
   5   6   7
ans =
   6   7   8
>> save -7 stack47277436.mat data

Loading that in numpy with the scipy.io.loadmat:

In [17]: res = loadmat('stack47277436.mat')
In [18]: res
Out[18]: 
{'__globals__': [],
 '__header__': b'MATLAB 5.0 MAT-file, written by Octave 4.0.0, 2017-11-14 04:48:21 UTC',
 '__version__': '1.0',
 'data': array([[(array([[ 1.]]), array([[ 2.,  3.,  4.]])),
         (array([[ 2.]]), array([[ 3.,  4.,  5.]])),
         (array([[ 3.]]), array([[ 4.,  5.,  6.]])),
         (array([[ 4.]]), array([[ 5.,  6.,  7.]])),
         (array([[ 5.]]), array([[ 6.,  7.,  8.]]))]],
       dtype=[('A', 'O'), ('B', 'O')])}

Or load with squeeze_me to remove the singular dimensions

In [22]: res = loadmat('stack47277436.mat',squeeze_me=True)
In [24]: res['data']
Out[24]: 
array([(1.0, array([ 2.,  3.,  4.])), (2.0, array([ 3.,  4.,  5.])),
       (3.0, array([ 4.,  5.,  6.])), (4.0, array([ 5.,  6.,  7.])),
       (5.0, array([ 6.,  7.,  8.]))],
      dtype=[('A', 'O'), ('B', 'O')])
In [25]: _.shape
Out[25]: (5,)

The struct has been translated into a structured array with 2 fields, corresponding to the struct fields (is that the MATLAB name?)

In [26]: res['data']['A']
Out[26]: array([1.0, 2.0, 3.0, 4.0, 5.0], dtype=object)
In [27]: res['data']['B']
Out[27]: 
array([array([ 2.,  3.,  4.]), array([ 3.,  4.,  5.]),
       array([ 4.,  5.,  6.]), array([ 5.,  6.,  7.]),
       array([ 6.,  7.,  8.])], dtype=object)

A is an array (object dtype). B is also object dtype, but contains arrays. That's how loadmat handles MATLAB cells.

MATLAB struct could also be implemented as custom class with attributes A and B, or as a dictionary with those keys.

I know numpy better than pandas, but lets try to put this array into a dataframe:

In [28]: import pandas as pd
In [29]: df = pd.DataFrame(res['data'])
In [30]: df
Out[30]: 
   A                B
0  1  [2.0, 3.0, 4.0]
1  2  [3.0, 4.0, 5.0]
2  3  [4.0, 5.0, 6.0]
3  4  [5.0, 6.0, 7.0]
4  5  [6.0, 7.0, 8.0]
In [31]: df.dtypes
Out[31]: 
A    object
B    object
dtype: object

In numpy the fields could be cleaned up and assigned to variables:

In [37]: A = res['data']['A'].astype(int)
In [38]: B = np.stack(res['data']['B'])
In [39]: A
Out[39]: array([1, 2, 3, 4, 5])
In [40]: B
Out[40]: 
array([[ 2.,  3.,  4.],
       [ 3.,  4.,  5.],
       [ 4.,  5.,  6.],
       [ 5.,  6.,  7.],
       [ 6.,  7.,  8.]])

One is a (5,) shape array, the other (5,3).

I could pack those back into a structured array with a prettier dtype:

In [48]: C = np.empty((5,), [('A',int), ('B', int, (3,))])
In [49]: C['A'] = A
In [50]: C['B'] = B
In [51]: C
Out[51]: 
array([(1, [2, 3, 4]), (2, [3, 4, 5]), (3, [4, 5, 6]), (4, [5, 6, 7]),
       (5, [6, 7, 8])],
      dtype=[('A', '<i4'), ('B', '<i4', (3,))])
Sign up to request clarification or add additional context in comments.

Comments

0

When accessing a dataframe using integer locations you need to use df.iloc[int].

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html

For example, if you want to access the instance in the first row and first column you would want to look at df.iloc[0,0].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.