Skip to content

Dataframes

Containers of the dataframe type represent tables. A table has typed columns and stores data in the table format (like the pandas dataframe).

Creating dataframes

First of all, you need to declare a table schema. Sometimes the schema may be guessed from the data through the Dataframe.create_schema_from_data() static method. But using this method is not recommended because the detection of non-POD types does not work properly and may fail to wide objects, such as RawObject.

A table schema is a list of TableColumn's. Every TableColumn defines a column name and type, the type set as the python type for POD values and wrappers for objects. To see the difference, see the example tables below.

Here is an example of a simple table definition:

[Copy Code](javascript:void(0)) Python
from research_sdk import TableColumn table_schema = [ TableColumn("count", float), TableColumn("weight", int), TableColumn("length", int), TableColumn("code", str), TableColumn("id", str), TableColumn("valid", bool), ]

You can use various custom object types to define columns. In the SDK, PyMatGenObject is used for the pymatgen library data and RawObject is used as a generic untyped storage.

InformationFor the information about how to define your own object types, see Custom object format.

Here is example of a table definition with objects:

[Copy Code](javascript:void(0)) Python
from research_sdk import RawObject, TableColumn from research_sdk.structures.object.pymatgen import PyMatGenObject table_schema = [ TableColumn(name="Index", type=int), TableColumn(name="Structure", type=PyMatGenObject), TableColumn(name="Raw experiment data", type=RawObject), ]

After a table is defined, you can fill it using the insert call. This call receives a list of table rows with the proper types.

[Copy Code](javascript:void(0)) Python
from research_sdk import Dataframe, DataStorageInterface, DataStorageType, with DataStorageInterface.create(DataStorageType.Datacat) as storage: # Table schema described in table_schema variable with Dataframe(name="Test frame", schema=table_schema, storage=storage) as frame: # list of rows are prepared in table_data variable frame.insert(table_data)

If a type is not valid you will receive the TableNotConformToSchema exception.

For the full code examples, see Examples of using data catalog SDK.

Defining tables schemas

[Copy Code](javascript:void(0)) Python
@property def schema(self) -> List["TableColumn"]:

This call returns a list of colum definitions containing name and type.

Inserting data

[Copy Code](javascript:void(0)) Python
def insert(self, rows: List[List]) -> None:

This call inserts rows of data to the table.

Selecting data

[Copy Code](javascript:void(0)) Python
def select(self) -> List[List]:

This call only selects data without filtering and other aggregation.

Converting data from pandas dataframe

[Copy Code](javascript:void(0)) Python
def from_pandas(cls, data: PandasDataframe, schema: Optional[List["Dataframe.Column"]] = None) -> Dataframe:

Not implemented.