Frequently asked questions

The ONE DATA platform is all about taking your data project from prototype to production. As this can be a multi-step process with different stakeholders involved, questions will arise for which we provide this FAQ.

 

 

General

We have observed, that for various reasons, many ideas and projects do not make it beyond a PoC (Proof of Concept) into production. Until then, they usually create high costs, but few value-creating insights that can be used sustainably and profitably in the company.
"From prototype to production" means, from our perspective, that ideas from the prototype phase - i.e. from the think tank in which you tinker and build - are put into production, into daily, regular use and set profitable in a meaningful and efficient way within the company. ONE DATA considers the production-readiness of the projects right from the beginning, not just at the end of the process.

  • When you like to work efficiently in a team and have the opportunity to do so cross-functionally across different departments.
  • When transparency is important to you and the results of your work should be easy to understand with the help of interactive dashboards even for stakeholders of your project with less IT knowledge.
  • When you are a decision maker without in-depth IT knowledge and would like to be able to derive recommendations for actions independently from the analyses of your colleagues with the help of individualized interfaces.
  • When you’d like to go one step further and change parameters in workflows and analyses that are tailored to your needs.
  • When your project needs the integration and harmonization of heterogeneous data sources.
  • When you are aware that data science allows you implement your successful ideas faster, easier and more efficiently, bringing projects "from prototype to production".

Within the ONE DATA platform, every team member in your enterprise as well as external users or clients can work together on projects. Whether you are a decision-maker with less IT knowledge, a data scientist, a data engineer, an assistant to a department or an external user, each user has an individual access to predefined areas within one or more projects.

ONE DATA is a data-driven application builder that can combine heterogeneous data sources, lets you build and visualize data products transparently and efficiently and, after a short prototyping phase, quickly establishes them in a productive environment according to our principle 'from prototype to production’. ONE DATA functions as an independent self-service platform, is a link between different tools to connect interfaces and can be used as a Data Hub. With interactive reporting charts, even business users without in-depth IT knowledge can track and understand data science projects and easily derive recommendations for action from them. The user interface of the platform can be completely customized to your requirements. It fits seamlessly into the company's own IT infrastructure and is easily scalable at enterprise level. Data scientists and other technically savvy users develop their own analysis approaches, uncovering further optimization potential in an uncomplicated manner.

ONE DATA offers a unified rights and roles management. Users with appropriate rights can easily create subprojects and define roles within a project. A group of users can be assigned to each role, equipped with a set of access and execution rights. Analysis authorization is used to restrict access to data on a row-level, enabling project owners to scale a growing user base with various responsibility levels. Only groups or users specified by explicit authorization dimensions can access the data specified. Keyring, like in real life, allows you to keep a certain sets of keys together for external ETL data sources. The credentials store usernames and passwords to facilitate access to different data sources.

Born out of the drive to stop wasting resources, ONE DATA as a data-driven application builder transforms data into added value for your business quickly and efficiently. True to the principle “from prototype to production”, ONE DATA paves the way for successful ideas to be rapidly implemented after only a short prototyping phase. The main components of ONE DATA are datasets, workflows, models and dashboards - the four main layers of a typical data science process. Our goal within ONE DATA is not to make data science easy - our goal is to simplify the process of getting data science production-ready.

Functional

We make use of a variety of tools in order to be able to implement teamwork in a meaningful way. ONE DATA is divided into projects whose participants are assigned to different roles and rights. In addition, we provide you the option of Analysis Authorization, which can be used to make different areas visible and/or editable for different groups of people at global company level. Keyrings allow previously defined persons to view, edit or use analyses of critical data without passing on sensitive access data.

For the consumption of results in form of visualizations, the ONE DATA platform provides comprehensive report functionalities to visualize your data. You can visualize the analysis results from your workflows in interactive apps. In addition to the visualizations, users can embed containers to change parameters of models or algorithms. This provides the opportunity for subject matter experts to change variables or input parameters without the need to modify the entire underlying workflow and analysis. ONE DATA comes with a wide range of about 25 visualization possibilities, like bar charts, gauge charts, boxplots, heat maps, KPI visualizations and many more, optimizable for different devices with a descriptive language that gives data scientists all options they need:

 

The ONE DATA platform offers the option to integrate and execute models with zero code in your analysis workflows, upload external developed models and manage your trained models. The integration of Python and R code is supported. Machine Learning algorithms are available as non-coding elements (based on Spark ML). ONE DATA provides a modular setup allowing for selecting coding or non-coding tools for every step (Spark, R, Python, SQL are supported) and you can run Python (incl. scikit-learn) and R models using Docker containers for execution as well as for model serving. Tensorflow is currently supported within Python. The ONE DATA platform can train, maintain and serve models of different sources (Spark, Mleap for R and Python).

ONE DATA supports various ways to access data: upload files, upload models, relational databases, Web APIs, streaming data, no-SQL databases, specific connectors and is extendable for additional data sources:

 

In ONE DATA, every analysis process is defined as a separate workflow. As a result, users can draw on a comprehensive library of predefined processes and methods which they can use to transform data, apply statistical methods or conveniently create entire analysis sequences. All existing processes can be individually adjusted, and internal algorithms (e.g. R or Python) additionally integrated.

Thanks to the precise roles and rights management system, only users who genuinely require access are able to gain access to analysis workflows. The tool takes care of data management and archiving previous analysis by itself. This ensures audit compliance and makes every change can be analyzed, made transparent and become traceable at all times.

  • Our software operates independently from platforms and databases (on-premise and in the cloud)
  • Open source-based architecture for cluster computing based on Apache Spark
  • Distributed file storage and data transfer architecture based on Apache Hadoop
  • Modularly expandable
  • User interface completely customisable as required (corporate design)
  • Flexible graphics engine
  • Extendable library (Java, R, Scala)

Technical

The ONE DATA platform is implemented on a client / server architecture. Our central Apache Spark component manages the parallelization and execution logic, including the available physical and virtual infrastructure components. The client is based on a HTML5 / JavaScript frontend and the server component is sub-divided in modules and is implemented in Java using Spring.io as the main application framework. The ONE LOGIC platform uses Spark, Python and R computation depending on the required context, which helps to achieve scalable and efficient workflow executions. A HDFS and Apache Parquet are used to save intermediate results and datasets. User management and meta information is stored in a DBMS (data base management system). The ONE DATA platform can scale-up and scale-out by design. Depending only on the availability of minimal hardware requirements, a nearly unlimited amount of data can be processed using ONE DATA.

For a basic installation and running environment in a single-node-setup, the following minimum hardware requirements should be met:

  • 8 physical/dedicated CPU-Cores
  • 4 GB RAM on each CPU-Core = 32GB RAM
  • 100 GB system volume for operating system and temporary data (SSD)
  • 2 TB Data volume (HDD/Network)

If the amount of data is likely to be larger than 2 TB, a cluster setup is the best way to support a ONE DATA platform installation with a minimum of three nodes to save data and execute distributed operations:

  • 32+ CPU-Cores
  • 8+ GB RAM per CPU-Core
  • 250 GB System-Volume for operating system and temporary data
  • 4 TB Data volume for HDFS

OS and environment for installation:

  • Preferably a Linux operating system (Red Hat or Debian), but also other OS are supported
  • PostgreSQL (version >= 9.6 and < 10) for saving meta data
  • Java 1.8
  • Tomcat 8.5
  • PostgreSQL JDBC driver
  • JavaMail

The ONE DATA platform offers full transparency for entitled user groups and reproducibility of all analysis workflows and results created within the platform. ONE DATA stores the complete history of analysis workflows on the platform in an efficient and encrypted way. Therefore, you can maintain the results and the quality of the implemented functionalities at any time. Within the ONE DATA platform, all resource types can be individually named, tagged with user-defined keywords and searched for. Resources can also be added to specific projects and can then be additionally documented on project level. Processors within an analysis workflow can be renamed, color coded, grouped and more. Additionally, you are able to share your resources within a project.

A unified rights and roles management is natively integrated in the ONE DATA platform. Our user management offers user-based authentication, resource restrictions, analysis authorization, comprehensive group & role assignments, access to key administration via an included keyring system and an open registration and / or an invite process. The ONE DATA platform offers state-of-the-art security and backup technologies to provide a reliable service in form of transparent data encryption, secure data provisioning via token-based authentication between peers, secure transmission using HTTPS and Kerberos support.

Yes. For comprehensive analytics, ONE DATA can be connected to and ingest data from a various number of data sources. ONE DATA can also connect to a non-SQL database, as it transforms the data accordingly before it is processed. E.g. for Apache Cassandra, ONE DATA supports an native connector that can access data from these systems.

Any further questions regarding the ONE DATA platform?