Dataframes
Containers of the dataframe type represent tables. A table has typed columns and stores data in the table format (like the pandas dataframe).
Creating dataframes
First of all, you need to declare a table schema. Sometimes the schema may be guessed from the data through the Dataframe.create_schema_from_data() static method. But using this method is not recommended because the detection of non-POD types does not work properly and may fail to wide objects, such as RawObject.
A table schema is a list of TableColumn's. Every TableColumn defines a column name and type, the type set as the python type for POD values and wrappers for objects. To see the difference, see the example tables below.
Here is an example of a simple table definition:
| [Copy Code](javascript:void(0)) Python |
from research_sdk import TableColumn table_schema = [ TableColumn("count", float), TableColumn("weight", int), TableColumn("length", int), TableColumn("code", str), TableColumn("id", str), TableColumn("valid", bool), ] |
You can use various custom object types to define columns. In the SDK, PyMatGenObject is used for the pymatgen library data and RawObject is used as a generic untyped storage.
| Information | For the information about how to define your own object types, see Custom object format. |
Here is example of a table definition with objects:
| [Copy Code](javascript:void(0)) Python |
from research_sdk import RawObject, TableColumn from research_sdk.structures.object.pymatgen import PyMatGenObject table_schema = [ TableColumn(name="Index", type=int), TableColumn(name="Structure", type=PyMatGenObject), TableColumn(name="Raw experiment data", type=RawObject), ] |
After a table is defined, you can fill it using the insert call. This call receives a list of table rows with the proper types.
| [Copy Code](javascript:void(0)) Python |
from research_sdk import Dataframe, DataStorageInterface, DataStorageType, with DataStorageInterface.create(DataStorageType.Datacat) as storage: # Table schema described in table_schema variable with Dataframe(name="Test frame", schema=table_schema, storage=storage) as frame: # list of rows are prepared in table_data variable frame.insert(table_data) |
If a type is not valid you will receive the TableNotConformToSchema exception.
For the full code examples, see Examples of using data catalog SDK.
Defining tables schemas
| [Copy Code](javascript:void(0)) Python |
@property def schema(self) -> List["TableColumn"]: |
This call returns a list of colum definitions containing name and type.
Inserting data
| [Copy Code](javascript:void(0)) Python |
def insert(self, rows: List[List]) -> None: |
This call inserts rows of data to the table.
Selecting data
| [Copy Code](javascript:void(0)) Python |
def select(self) -> List[List]: |
This call only selects data without filtering and other aggregation.
Converting data from pandas dataframe
| [Copy Code](javascript:void(0)) Python |
def from_pandas(cls, data: PandasDataframe, schema: Optional[List["Dataframe.Column"]] = None) -> Dataframe: |
Not implemented.