HATLAS - a fedora data project

Hatlas News: 2026-05-01

News #

I’m taking a stab at more-frequent updates plus a structure around those. I personally hate to “announce” things that are highly fluid / tentative because I try hard not to propagate noise, but I think I can do better than once every 6 months.

If you’re short on time, jump to the status brief below for the 30-second version. If you have any suggestions for how to improve this space, hit me up in Matrix or email.

And PSA: if you haven’t seen https://copy.fail/ yet, spit out your coffee now and run.

Community Updates #

I’m thrilled to announce that we’ve had two new FDWG contributors join us, and they’re both already making huge contributions:

@evelynrp #

Evelyn Park is a Master of Library and Information Science (MLIS) student by day and FDWG contributor by night. I absolutely love that she has joined FDWG because “Information Science” is exactly what we’re trying to do here, and it’s an often-invisible and under-appreciated pre-requisite to the sort of number crunching we want to do. (It’s stunningly easy to accidentally crunch the wrong numbers / with the wrong assumptions!) We are very lucky to be able to put Evelyn’s academic training in taxonomy, information architecture, etc to use here!

She’s also a perfect example of how “open source” work doesn’t just mean “code”. That said, she’s already contributed her first PR to the Datanommer data dictionary! This sort of “boring” and “abstract” foundational work will underpin all of our data pipelines and eventually our analysis of Datanommer data. We’re very grateful for her contributions. Welcome to open source, Evelyn!

Check out her blog at https://evelynpark.com/ .

@smoliicek #

Vít Smolík is a student from Prague and a trusted member of the Fedora Infrastructure team. He’s been contributing his talents to Fedora’s core infrastructure for nearly a year now, and he’s recently offered to help run the Hatlas infra too. FDWG caught his eye with our HTTP logs POC, and he’s excited about the operational value this could provide.

Ever since I launched Hatlas there have been too many Infra yaks to shave for me to get my hands dirty with any analytics work. I’m very hopeful that with a little help from Vit we can change that soon, and start producing some actionable insights.

Check out his blog at https://smoliicek.cz/ .

Technical Updates #

Multiplayer Ops #

The big focus this past week has been on moving Hatlas towards shared ownership of the infrastructure. This was always the plan, but it became a priority a little over a week ago when @smoliicek offered to help.

Hatlas has always been my “quick and dirty personal dev environment”, but now it’s quickly becoming prod. (Or perhaps, “staging” in a long-term roadmap view.) Time to take a step up the maturity curve!

I’ve been slowly working towards “separation of concerns” in the infra code for some time and applying opportunistic refactors as I go, but it’s time to finish the job. This will position the “dev” portion for a (hopefully) clean lift into Fedora Infra at some point.

Status:

I intend for any trusted FDWG member to have at least read-only access to all of our infra tooling, so let me know if you’re interested.

Privacy Updates #

I’ve been reviewing both the Hatlas and Fedora GDPR stance with the help of Claude Opus because my infosec / compliance / governance background has been setting off alarms in my head ever since I started working on this data. This has caused me to hold off on being as public as I’d like with many aspects of Hatlas, partially because I don’t want to stir up community distrust.

On a side note: I’d much prefer to work with a real human lawyer here, but the friendly ones at RedHat seem to be occupied in their queue to join the Borg (IBM legal), and let’s just say I’m not in any rush to get a response from them. (Ask me about my scar tissue!)

Suffice to say I’m working on both some policy items and technical items.

Parquet Downloads #

Now that Hatlas has SSO capabilities, I’ve decided to change the Datanommer canonical parquet downloads to require a login. The general push here is that all data analysis activity must be directly tied to Fedora, and we can’t assert that this is true if we’re making data available to the general public.

Working through the OIDC specifics was interesting, but I think the final result is almost as easy to use as the unauthenticated version, so I plan to poke at getting similar implemented upstream.

Next (?) #

My standard disclaimer: my priorities shift constantly depending on opportunities and obstacles.

Flock #

Flock is coming! And I think FDWG will have a slot for a presentation + workshop!

FDWG still has sooo many things on the TODO list to get done before I’ll feel ready to announce our work to the world, but the flywheel is starting to accelerate here with the help of our new contributors. I’m hopeful I’ll be able to prepare something interesting and worthy of other people’s time and attention in the next 6 weeks.

Automation #

The data dictionary is coming along nicely. After the current round of Infra enablement work I’d like to finish the POC of generating our Datanommer SQLMesh pipelines from the dictionary via Argo Workflows.

Tooling #

We are almost in position to launch general shared tooling such as Superset.

Docs Updates #

The super-secret Fedoran docs and this news feed have been kept up to date, but the public side of Hatlas is still way overdue for some updates. We also have a goal to create official FDWG docs. I’m not sure what order this will happen in.

Employment? #

I’m still unemployed. I’m trying not to get neurotic about that, but it does add up after a while.

My family and I would be incredibly grateful if you’re able to contribute to any of the hosting costs for Hatlas (roughly $100/mo). ❤️

Status brief #

Done #

WIP #

On-deck #