Technical Architecture

OpenHEXA is a data integration platform composed of a series of components:

The OpenHEXA application, usually called openhexa-app for historical reasons, a Python/Django application providing a GraphQL API, a data pipelines' orchestration engine, user management capabilities and a NextJS frontend
The OpenHEXA notebooks environment (see openhexa-notebooks), a heavily customized JupyterHub/JupyterLab setup running the same image as the pipelines environment

In terms of data storage, we have to make a distinction between:

Application data storage, which resides in a PostgreSQL database
Workspace storage or user storage (see User manual for more information about workspaces), which is stored either in PosgtreSQL databases or in Object Storage buckets (Google Cloud Storage, AWS S3 or Minio)

When running code using Jupyter notebooks or OpenHEXA data pipelines, technical users can leverage the OpenHEXA Python SDK to interact with the OpenHEXA backend (see openhexa-sdk-python).

Notebooks and data pipelines typically run in containers using one of our Docker images (see openhexa-docker-images) or a custom one set by workspace.

The whole OpenHEXA stack is meant to be deployed in a Kubernetes cluster, so that notebooks and pipelines run in isolated environments and leverage the auto-scaling capabilities offered by Kubernetes.