What is going on: The Federal Chancellor speaks of data spaces. Instead of “cash for clunkers” as an aid for the automotive industry there will be a data space now (Delhaes 2020, Benrath & Löhr 2021). So, what exactly is a data space? We’re all familiar with data storage. Old hands still remember punch cards and magnetic tapes. And then hard disks and USB sticks. In companies, there are databases and data warehouses. And since the well-known article on the Internet of Things (IoT) by experts Porter & Heppelmann, in the Harvard Business Review, there has been a trend towards data lakes (2015).
Von Daten zu Informationen und zu Erkenntnissen
From data to information to insight
So today, it is easy for a data scientist to drown in such data lake while searching for relevant data. Figure 1 highlights the importance of finding the right data, not any data, but the data with the information in it (information’s ore) that is required to generate insight for business impact. As a result, the search for data and data preparation to extract the information often account for more than 80% of the time budget of a data analytics project (read more here on our empirical study on “Data is broken: The data productivity crisis,” link). And the big avalanche of data is still to come (see Figure 2). So, the time seems ripe for new approaches to data storage anyway, such as data spaces.
What is a data space?
The German federal government’s data strategy describes a data space as “a shared, trusted space for transactions with data. A data space is based on shared standards (or values, technologies, interfaces), for example, that permit or promote transactions with data.” (Federal government 2021; see also Figure 3 for further definitions). Essentially, data spaces reverse the traditional logic of data storage. The OpenDEI project describes a dataspace as follows “A data space is defined as a decentralised infrastructure for trustworthy data sharing and exchange in data ecosystems based on commonly agreed principles” (OpenDEI project: Design principles for Data Spaces p. 23). It is no longer so important to store all data centrally. Instead it is crucial to ensure that an application, such as a correlation analysis or deep learning algorithm, receives the right data in the right quantity. Just-in-time data sharing, so-to-speak, instead of central data storage. However, the problem to date has been that those involved in a data transaction often do not trust each other to actually share data. There are various reasons for this, including worry about competitive advantage and data protection. In short, data sovereignty, the right to retain control over your own data, is often lost (see Figure 4). As soon as a file is sent, anything can happen to it. New technology, such as the International Data Spaces (IDS) standard, can help here. Even if two parties do not trust each other, because they are competitors, for example, they can still trust in a data transaction that benefits both an end customer and the two parties themselves.
From data lakes to decentralized and federated data storage such as data spaces
“Creating complete freedom to share data” as the headline article in the “IT Director” trade journal in August 2020 (link) But how can this succeed when uncertainties about the data processing are still part of daily life in practice. In interviews with data science & analytics experts, Daniela Hoffmann from “IT Director” identified clear possible solutions to these challenges. “These days, our attention is clearly turned to the subject of data. In politics too, the right steps are being taken and a great deal of money is being spent on the issue of data, such as for GAIA-X,” said Christoph Schlueter Langdon, responsible for Mobility Data Spaces at the Telekom Data Intelligence Hub and Professor for Data Science & Analytics an the Peter Drucker School of Management of Claremont Graduate University. People have reflected on the quality and correctness of data, because they are now affected by it themselves, see R number, doubling time and cases per 100,000 inhabitants. The coronavirus crisis has not only resulted in an increased interest in valid data, but also in a rethink about central data storage and data lakes, with a new preference for decentralized structures: The best example of this is the coronavirus app, which, after some wrangling about the architecture, for data privacy reasons, ultimately ended up with a fully decentralized design (more on that here: “Corona warning app: Answers to frequently asked questions,” link).
Advantage of data spaces: Data products just like in a supermarket
To help with decision-making for business intelligence (BI), it was necessary to consolidate as much of the correct data as possible, said Schlueter Langdon: “But today, the following analogy is increasingly frequently applied to data storage and analysis: We all slaughter our own cattle and grow our own vegetables instead of shopping at the supermarket”. So, there is demand for data supermarkets with data products on the shelves (see Figure 5 and “Data is a Product,” Crosby & Schlueter Langdon 2019, link), and data factories, which convert raw data into data products (more on this in our article “Data factories for data products,” link; Schlueter Langdon & Sikora 2020). Approaches like that of the Telekom Data Intelligence Hub provide corresponding functionality on a cloud-based platform with Open Source tools. At the same time, companies can also obtain additional context data there, such as weather or location data, in order to complement their own data.
Figure 5: Data spaces for a rich selection of data products, just like in a supermarket
Advantage of data spaces: The right data in the right quantity
AI disciplines, such as deep learning, for text, image and voice recognition, increasingly often provide important results but are fully dependent on the data quantity and quality, according to the experts (more on that in our article “Data: Quantity or quality?,” link). Wherever better analysis results are achieved only through large quantities of data (Big Data), it is also the case that companies have trouble accessing adequate data volumes. “There are two contributing factors to this: On the one hand, self-interest – we think that keeping the data for ourselves brings a competitive advantage. On the other hand, GDPR requirements must be met, which many companies still find difficult,” said Schlueter Langdon. “Concepts and standards such as Industrial Data Spaces (IDS) help to generate this data: Simply, because they reduce the barriers to sharing data that we previously did not want to pass on, due to a lack of trust,” said Chris Schlueter Langdon. And timing is everything: The emergence of this technology is coinciding with proposed regulation for data sharing and governance, the proposed Data Governance Act (DGA) by the European Union (EU DGA 2020).
This article is based on a longer article “Creating complete freedom to share data” in IT Director, August 2020 (Link)
For additional insights, please check out:
Benrath, B., and J. Löhr. GAIA-X-Initiative: Die Staats-Cloud kommt. Frankfurter Allgemeine Zeitung (2021-02-13), p. 2
Federal Government of the Federal Republic of Germany. 2021. Data strategy of the federal government – An innovation strategy for social progress and sustainable growth. Cabinet version 2021-01-27, Federal Chancellery, Berlin, www.bundesregierung.de/publikationen
Crosby, L., and C. Schlueter Langdon. 2019. Data is a Product. American Marketing Association Marketing News (April), link
Delhaes, D. 2020. Merkel drängt Autokonzerne: BMW, Daimler und VW sollen Datenschatz teilen. Handelsblatt (2020-10-28), link
Drucker, P. 1992. Be Data – Know What to Know. The Wall Street Journal (December 3)
Drucker, P. 1967. The Manager and the Moron. McKinsey Quarterly (December), link
European Union Data Governance Act (DGA). 2020. Regulation on European data governance (Data Governance Act). Proposal (November 20), link
Fraunhofer, International Data Spaces, Retrieved from https://www.dataspaces.fraunhofer.de/de/InternationalDataSpaces.html, Accessed 2021-01-26
Handelsblatt. 2019. Grenzen des Speichers. Grafik des Tages (2019-05-14): 24-25
IDC report, Worldwide Global DataSphere Forecast, 2020–2024: The COVID-19 Data Bump and the Future of Data Growth (Doc #US44797920)
Otto, B., A. Rubina, A. Eitel et al. 2021. GAIA-X and IDS – Position Paper. International Data Spaces Association, Version 1.0 (January), Dortmund, Germany, link
Porter, M. E., and J. E. Heppelmann. 2015. How Smart, Connected Products Are Transforming Companies. Harvard Business Review (October), link
Schlueter Langdon, C., and R. Sikora. 2020. Creating a Data Factory for Data Products. In: Lang, K. R., J. J. Xu et al. (eds). Smart Business: Technology and Data Enabled Innovative Business Models and Practices. Springer Nature, Switzerland
International Data Spaces Association, OpenDEI project, 2021, position paper. Design principles for Data Spaces. Link