HATLAS - a fedora data project

Docs: Contributing

Join us! #

There is an incredible amount of work ahead, in all areas of engineering!

The best way to get involved is to join the Fedora Matrix channel for our Data Working Group: #data:fedoraproject.org. If you’ve never used Matrix before, this link will walk you through creating a Fedora Accounts login and joining us there.

You’d also be welcome to join us on Fedora Discussions (Discourse). Just be sure to tag your post with #commops, since the “Community Operations” group is the primary driver of community health analytics.

Note: Hatlas is not an official Fedora project (yet!), but many interested parties are coordinating efforts in Matrix.

5-second overview #

Last updated: 2025-11-09

Target Architecture: #

Current Architecture

Current Architecture: #

Current Architecture


Contribution areas #

Infrastructure #

It all starts with Infrastructure! Here is a set of high-level TODOs in approximate delivery order.

TODO: Observability #

Our current deployment has very little observability. This is my top priority, but my first pass will likely be minimalistic and focused on just keeping the thing alive, which will leave plenty of opportunity for improvement.

Like all other things Hatlas, it would be neat to make these public too.

TODO: Deploy Polaris to Fedora Infra #

This is a top priority, but it’s also a huge amount of work! I plan to start this ASAP but it will take some time before it’s ready. We may also need to defer this until after Hatlas has helped us solidify our data formats / understand our infra requirements (e.g. expected load).

Polaris is currently running in a container on my personal VPS and backed by my Cloudflare R2 storage account.

See also: Libera, Stripe, Patreon – I’m unemployed at the moment and will shamelessly take all the help I can get. My wife thanks you for supporting open data!

Challenges #

Moving the container should be fairly easy, but:

TODO: More QuickStarts #

I’d like to have QuickStarts for at least:

TODO: OAuth #

Fedora Accounts Service (“FAS”) is Fedora’s OAuth / OIDC provider.

TODO: Data Orchestration Engine #

All data engineering is currently being triggered in an ad-hoc basis, and using scripts which are still quite minimalistic. Ideally, we would deploy something like Apache Airflow or Dagster to get these things automated.

TODO: Public Trino #

I’m uncertain as to whether this is feasible, but it would be nice to POC a fully-hosted public query interface to further lower the barriers to entry. This would presumably require OAuth even for the POC launch to avoid abuse.


Programming #

TODO: Integrate with Fedora’s “Personal Data Removal” process #

Fedora has a “personal data removal” process (“PDR”) for compliance with GDPR. However:


Data Engineering & Architecture #

TODO: Datanommer Quality: Silver #

Datanommer is our most important dataset. The dataset page outlines the general status and steps ahead. There may be some overlaps with infra, since it’s uncertain whether our current software stack sufficiently supports the features we need.

TODO: Countme: Bronze #

Countme is “low-hanging fruit” for a bronze conversion. (Read: it should be pretty easy! And I’m saving it for you!)

This dataset gives us statistics as to how prevalent each Fedora and CentOS release is in the wild.

We already have this data available in other formats, but bringing it into Hatlas would allow us to unify where we are performing our analytics.


Data Analysis #

Yes!