Skip to content

Saving and uploading data to data catalog

The example below shows how to use the Constructor Platform Data Catalog SDK to save and load dataframes to the data catalog. The example involves:

  1. Creating and populating a dataframe with structured data.
  2. Saving the data to the catalog.
  3. Loading the data back for further processing.

Prerequisites

Before running this example, ensure that you have:

  • Access to the Constructor Platform Data Catalog SDK.
  • Authorization to interact with the data catalog. If you run code inside Constructor Platform, you are authorized. Otherwise you can get authorized using the research_sdk/authorize.py script from the SDK.

Importing required modules

[Copy Code](javascript:void(0)) Python
from random import randbytes from pymatgen.core import Lattice, Structure from research_sdk import Dataframe, DataStorageInterface, DataStorageType, RawObject, TableColumn from research_sdk.structures.object.pymatgen import PyMatGenObject

Defining helper functions to generate sample data

[Copy Code](javascript:void(0)) Python
def create_raw() -> bytes: return randbytes(1024) def create_material() -> Structure: return Structure(Lattice.orthorhombic(1, 2, 3), ["Ag", "Si", "Si"], [[0.7, 0.4, 0.5], [0, 0, 0.1], [0, 0, 0.2]])

Defining table schema

[Copy Code](javascript:void(0)) Python
table_schema = [ TableColumn(name="Index", type=int), TableColumn(name="Structure", type=PyMatGenObject), TableColumn(name="Raw data", type=RawObject), ]

Every column is described by the TableColumn dataclass with the name and type parameters.

Generating sample data

[Copy Code](javascript:void(0)) Python
table_data = [ [1, create_material(), create_raw()], [2, create_material(), create_raw()], [3, create_material(), create_raw()], [4, create_material(), create_raw()], ]

Saving table to Data Catalog

[Copy Code](javascript:void(0)) Python
with DataStorageInterface.create(DataStorageType.Datacat) as storage: with Dataframe(name="Results of experiment #42", schema=table_schema, storage=storage) as frame: # Insert data into the table frame.insert(table_data) # Add one more row frame.insert([[5, create_material(), create_raw()]]) # Obtain the frame_id frame_id = frame.object_id # Upon exiting the context manager, the Dataframe will be committed, and data will be available in the Datacatalog.

Loading data

[Copy Code](javascript:void(0)) Python
with DataStorageInterface.create(DataStorageType.Datacat) as storage: # Load a Dataframe container with Dataframe.load(storage=storage, object_id=frame_id) as frame: # Retrieve data from the table rows = frame.select() # Perform necessary operations with the retrieved data.

The Dataframe object handles interactions with the data catalog seamlessly and provides an intuitive interface for saving and retrieving structured data.

InformationData is stored in its original form without any additional wrapping or identifiers to ensure straightforward data manipulation..