# Handling data arrays with MODOS
Any count-like data, e.g protein abundances, RNA counts, metabolomic measurements, etc. can be stored as arrays in the `MODO`.
The underlying zarr supports array creation as well as an interface to NumPy arrays.
## Load data
(pandas)=
### Using panda DataFrames
Count-like data can usually be loaded into pandas DataFrame.
To keep column names (__observations__) and row names (__variables__) both need to be stored in a separate numpy array first:
::::{tab-set}
:::{tab-item} python
:sync: python
```{code-block} python
import pandas as pd
import numpy as np
# Example of RNA-seq count data
rna_count = pd.read_csv('/path/to/rna/counts.csv', index_col="gene")
rna_count
# rna_count
# time1 time2 time3 ...
# gene
# Xkr4 1891 2410 2159 ...
# Rp1 2 2 0 ...
# ... ... ... ... ...
# TrnP 334 202 218 ...
obs = rna_count.columns.to_numpy()
var = rna_count.index.to_numpy()
rna_array = rna_count.to_numpy()
obs
# array(['time1', 'time2', 'time3', ...], dtype=object)
var
# array(['Xkr4', 'Rp1', ..., 'TrnP'], dtype=object)
rna_array
# array([[1891, 2410, 2159, ...],
# [ 2, 2, 0, ...],
# ...,
# [ 334, 202, 218, ...]])
```
::::
:::{warning}
`to_numpy()` automatically removes row and column names from pandas DataFrames.
It is important to store them separately, if they contain important information.
:::
:::{note}
Skip this section, if you already have your data in a NumPy array.
:::
## Add array element to a MODO
Next, an element with the metadata describing the array can be added to the `MODO`:
::::{tab-set}
:::{tab-item} python
:sync: python
```{code-block} python
from modos.api import MODO
import modos_schema.datamodel as model
# load modo - example at "data/ex"
modo= MODO("data/ex")
# Generate an Array element
array_element = model.Array(id="rna1", name= "RNA raw counts", description = "RNA counts from multiple timepoints", has_sample="sample/sample1", data_format = "Zarr", data_path="data/ex/data/rna1")
# Add element to modo
modo.add_element(element = array_element)
# Check the modo structure
modo.list_arrays()
#/
# ├── assay
# ├── data
# │ └── rna1
# ├── reference
# └── sample
# └── sample1
```
:::
::::
:::{note}
Skip this step, if you want to add the count data to an already existing element in the `MODO`.
A helper function to facilitate adding the metadata element and numpy array in one step will also be added in future releases.
:::
## Add array to a MODO
Finally all arrays can be added to the modo element:
::::{tab-set}
:::{tab-item} python
:sync: python
```{code-block} python
modo.archive["data/rna1"].create_dataset("data", data=rna_array)
modo.archive["data/rna1"].create_dataset("obs", data=obs)
modo.archive["data/rna1"].create_dataset("var", data=var, object_codec=numcodecs.JSON())
# update zarr metadata
zarr.consolidate_metadata(modo.store)
# check the new structure
modo.list_arrays()
#/
# ├── assay
# ├── data
# │ └── rna1
# │ └── data (1473,3) float64
# │ └── obs (3,) object
# │ └── var (1473,) object
# ├── reference
# └── sample
# └── sample1
```
:::
::::
(access)=
## Access Array data
### Load array as pandas DataFrame
To access array data and analyse them the separated arrays can be loaded into a pandas dataframe:
::::{tab-set}
:::{tab-item} python
:sync: python
```{code-block} python
import pandas as pd
rna_array = modo.archive["data/rna1/data"][:]
obs = modo.archive["data/rna1/obs"][:]
var = modo.archive["data/rna1/var"][:]
rna_counts = pd.DataFrame(rna_array, index=var, columns=obs)
```
:::
::::