Saving and uploading data to data catalog
The example below shows how to use the Constructor Platform Data Catalog SDK to save and load dataframes to the data catalog. The example involves:
- Creating and populating a dataframe with structured data.
- Saving the data to the catalog.
- Loading the data back for further processing.
Prerequisites
Before running this example, ensure that you have:
- Access to the Constructor Platform Data Catalog SDK.
- Authorization to interact with the data catalog. If you run code inside Constructor Platform, you are authorized. Otherwise you can get authorized using the research_sdk/authorize.py script from the SDK.
Importing required modules
| [Copy Code](javascript:void(0)) Python |
from random import randbytes from pymatgen.core import Lattice, Structure from research_sdk import Dataframe, DataStorageInterface, DataStorageType, RawObject, TableColumn from research_sdk.structures.object.pymatgen import PyMatGenObject |
Defining helper functions to generate sample data
| [Copy Code](javascript:void(0)) Python |
def create_raw() -> bytes: return randbytes(1024) def create_material() -> Structure: return Structure(Lattice.orthorhombic(1, 2, 3), ["Ag", "Si", "Si"], [[0.7, 0.4, 0.5], [0, 0, 0.1], [0, 0, 0.2]]) |
Defining table schema
| [Copy Code](javascript:void(0)) Python |
table_schema = [ TableColumn(name="Index", type=int), TableColumn(name="Structure", type=PyMatGenObject), TableColumn(name="Raw data", type=RawObject), ] |
Every column is described by the TableColumn dataclass with the name and type parameters.
Generating sample data
| [Copy Code](javascript:void(0)) Python |
table_data = [ [1, create_material(), create_raw()], [2, create_material(), create_raw()], [3, create_material(), create_raw()], [4, create_material(), create_raw()], ] |
Saving table to Data Catalog
| [Copy Code](javascript:void(0)) Python |
with DataStorageInterface.create(DataStorageType.Datacat) as storage: with Dataframe(name="Results of experiment #42", schema=table_schema, storage=storage) as frame: # Insert data into the table frame.insert(table_data) # Add one more row frame.insert([[5, create_material(), create_raw()]]) # Obtain the frame_id frame_id = frame.object_id # Upon exiting the context manager, the Dataframe will be committed, and data will be available in the Datacatalog. |
Loading data
| [Copy Code](javascript:void(0)) Python |
with DataStorageInterface.create(DataStorageType.Datacat) as storage: # Load a Dataframe container with Dataframe.load(storage=storage, object_id=frame_id) as frame: # Retrieve data from the table rows = frame.select() # Perform necessary operations with the retrieved data. |
The Dataframe object handles interactions with the data catalog seamlessly and provides an intuitive interface for saving and retrieving structured data.
| Information | Data is stored in its original form without any additional wrapping or identifiers to ensure straightforward data manipulation.. |