Welcome! #
“Hatlas” is a personal project to build a data lakehouse for Fedora, leveraging Apache Iceberg and Polaris as foundation layers. It is a proof of concept and a development environment for what will hopefully become an official Fedora project.
Hatlas currently only re-publishes data made publicly available by Fedora. For more information on what’s in here, see the Datasets page. See the Getting Started guide to begin exploring the data, and for a rough roadmap and ways to contribute see Contributing.
Goals #
Community Health / Decision Support #
The primary goal of this project is to enable the Fedora community leaders to understand the health of our community and to drive improvements. Are we successfully recruiting? Are we retaining those who join? Which efforts are most effective? Etc.
Gaining these insights will require us to blend data from multiple existing systems, and to augment this with new data sources. Hatlas provides the technical foundation for these goals.
See also: the CHAOSS community, which is a Linux Foundation working group dedicated to performing this sort of work across many open source communities.
Democratized access #
We also hope to democratize access to this data so that anyone with a question or idea can perform research and contribute to our understanding. We aim to provide all necessary tooling for this work, such that it can be performed with a minimum of user hardware and bandwidth.
Data usage guidelines #
By accessing this data, you agree to abide by the Fedora Data Usage Guidelines:
We endeavor to ensure that analytics are conducted transparently, ethically, and in a way that supports the overall health of the Fedora community without focusing on individuals.
Our mission with data is to empower community members to understand community health, identify opportunities for collaboration, and support informed decision-making through responsible use of community data.
The following principles guide our work when developing data systems, reports, dashboards, and other analytics outputs:
-
Community, Not Individuals: Metrics focus on understanding community collaboration and sustainability, not evaluating or ranking individual contributors.
-
Transparency and Openness: All data sources, transformations, and methods are openly documented, reproducible, and available for community review.
-
Privacy and Respect: Personally identifiable information (PII) is used only when required for aggregation or deduplication, and is never displayed or analyzed directly.
-
Ethical Use of Data: Data must never be used to measure the personal productivity of others, enforce policy, or make access or participation decisions.
-
Community Governance: Data and metrics initiatives follow open governance processes and are reviewed publicly before adoption.
Unofficial #
It’s worth repeating that Hatlas is not an official Fedora project yet, and the name “Hatlas” is temporary.