Constructor cannot cover all available object formats which you may store in the data catalog. You can implement your own format, but please note that the web viewer may not be able to properly display it.
If you need to store the format in DataFrame, you need to implement a class inherited from DataObject with these two mandatory methods:
The example of implementing a numpy matrix object as a data catalog object is provided below:
| Python |
from typing import TYPE_CHECKING, Optional, ClassVar import numpy from research_sdk.structures import DataObject from research_sdk.structures.manager import object_manager if TYPE_CHECKING: from research_sdk.storage.base import DataStorageInterface, DataStorageMeta @object_manager.add_object class NumpyMatrixObject(DataObject): object_type: ClassVar[str] = "numpy_matrix/binary" _native: ClassVar[Optional["numpy.matrix"]] @classmethod def load(cls, storage: "DataStorageInterface", meta: "DataStorageMeta") -> "NumpyMatrixObject": raise NotImplementedError def serialize(self) -> bytes: raise NotImplementedError @classmethod def from_native(cls, native: numpy.matrix) -> "NumpyMatrixObject": raise NotImplementedError |
| Python |
object_type: ClassVar[str] = "numpy_matrix/binary" |
| Information | You cannot check that such type already exists in the data catalog. |
There is one more property of the class that returns the _native class variable. It is not recommended to redefine it in your class as it's unreasonable.
It's necessary to implement functions and describe what they do.
| Python |
from dataclasses import dataclass from research_sdk.serializers import bytes_to_object, object_to_bytes from research_sdk.serializers.serializer import DataclassSerializerBase, custom_encoder_manager @dataclass class NumpyMatrixStorage: data_type: str shape: list data: bytes @custom_encoder_manager.add_serializer class NumpyMatrixStorageSerializer(DataclassSerializerBase): data_class = NumpyMatrixStorage |
The NumpyMatrixStorage defines the structure of an object in the Data Catalog, where
| Information | It is possible to create custom serialization for non-primitive types, but it is out of scope of the documentation. Simply use simple python types in such dataclasses. We support int, float, bool, str and bytes. |
NumpyMatrixStorageSerializer is the dataclass wrapped with the @custom_encoder_manager.add_serializer decorator to enable proper serialization and deserialization of data.
| Python |
@classmethod def load(cls, storage: "DataStorageInterface", meta: "DataStorageMeta") -> "NumpyMatrixObject": serialized = storage.read_object_raw(meta.id) obj = NumpyMatrixStorage(**bytes_to_object(serialized)) return cls( name=meta.name, description=meta.description, native=numpy.frombuffer(obj.data, dtype=obj.data_type).reshape(obj.shape), object_id=meta.id, ) |
You can see here a call of the read_object_raw function from the storage, which returns your the object in bytes format. To deserialize it to dict and put to the proper fields you may use the bytes_to_object function.
After that, you need to instantiate the NumpyMatrixObject class with the loaded data, properly filling the native parameter.
| Information | IMPORTANT: Pass object_id to the constructor at this step. This object_id is the unique object identifier in the data catalog |
The registration is needed to properly handle data with your defined mime type loaded from the data catalog.
| Python |
@classmethod def from_native(cls, native: numpy.matrix) -> "NumpyMatrixObject": return cls( name=f"Matrix {native.shape}", native=native, ) |
| Python |
def serialize(self) -> bytes: serialized = NumpyMatrixStorage( data_type=self._native.dtype.name, shape=self._native.shape, data=self._native.tobytes(), ) return object_to_bytes(serialized) |
Fill the NumpyMatrixStorage dataclass and then call the object_to_bytes method to convert a dataclass to bytes. SDK will call this function to handle saving of an object.