I have a dictionary of dataframes where each dictionary key corresponds to sample name and the dataframe itself has a "time" column and some measurement columns (temperature, concentration,etc). The time column is not consistent among samples (both start and finish times are different for different samples, although I think all time points between start and finish are measured/have same dT).
I want to merge all of the data into a single xarray where one axis is the time, the other axis is the measurement type, and the third axis is the sample name. Since not all times are measured for all samples the missing data should be replaced with nan.
I have little experience with xarray but doing a simple merger (construct xarray from dict of xarrays) I couldn't figure out how to make "time" one of the axis (instead each just concatenated all of the samples with time being one of the data columns).
Thank you for your help!
Edit:
Here is code I have with dummy data
import pandas as pd
import xarray as xr
#make fake data
dfs = {'sample1':pd.DataFrame([[1,0,0],[2,0,0],[3,0,0]],columns = ["Time","ColA","ColB"]),
'sample2':pd.DataFrame([[2,1,1],[3,1,1],[4,1,1]],columns = ["Time","ColA","ColB"])}
#code I use for real data
xrs = {k: xr.DataArray(v) for k, v in dfs.items()}
merged = xr.Dataset(variables).to_array(dim="samples")
print(merged)
Output is:
<xarray.DataArray (samples: 2, dim_0: 3, dim_1: 3)>
array([[[1, 0, 0],
[2, 0, 0],
[3, 0, 0]],
[[2, 1, 1],
[3, 1, 1],
[4, 1, 1]]], dtype=int64)
Coordinates:
* dim_0 (dim_0) int64 0 1 2
* dim_1 (dim_1) object 'Time' 'ColA' 'ColB'
* samples (samples) <U7 'sample1' 'sample2'
Desired output:
<xarray.DataArray (samples: 2, Time: 4, dim_1: 2)>
array([[[0, 0],
[0, 0],
[0, 0],
[nan, nan]],
[[nan, nan]
[1, 1],
[1, 1],
[1, 1]]], dtype=int64)
Coordinates:
* Time (Time) int64 1 2 3 4
* dim_1 (dim_1) object 'ColA' 'ColB'
* samples (samples) <U7 'sample1' 'sample2'