Data engineers are data librarians or how to upgrade your data lake to 2500 BCE technology.

How do you imagine a library before we started using computers everywhere?

You’re reading a blog for programmers, so I have to clarify something. A library is a public institution that lends you books for free ;)

What did a library look like without computers? How would you find a book in the building?

You could walk between the shelves and read the titles. Right? Yes, you could use the full scan way. That would work. At least in small buildings. In some libraries, they would kick you out at the end of the working day before you could find the thing you’re looking for.

So there must be a better way.

Would you like to help fight youth unemployment while getting mentoring experience?

Develhope is looking for tutors (part-time, freelancers) for their upcoming Data Engineer Courses.

The role of a tutor is to be the point of contact for students, guiding them throughout the 6-month learning program. The mentor supports learners through 1:1 meetings, giving feedback on assignments, and responding to messages in Discord channels—no live teaching sessions.

Expected availability: 15h/week. You can schedule the 1:1 sessions whenever you want, but the sessions must happen between 9 - 18 (9 am - 6 pm) CEST Monday-Friday.

Check out their job description.

(free advertisement, no affiliate links)

They sort the books, right? They sort them by genre and author. You could find the part of the building where you are the most likely to find the book. Then, you can go there and start checking what they have on the shelves. It would take some time, but you could find the book. Eventually.

What if you didn’t know what you were looking for? Did they have a search engine? What could you do in a pre-computer library besides asking a friendly librarian?

You could use the card catalog!

A card catalog keeps a record of what is in the library. It’s organized by book titles, authors, and subjects. You can go to the authors’ catalog to find all Stephen King’s books. You can also go to the title catalog to look for all books titled “Cujo.” You would find it in both places. Of course, you could also find it in a subject catalog.

Do you know how long librarians have been using a card catalog? Sumerian librarians used clay tablets as a card catalog system in 2500 BCE! (https://en.wikipedia.org/wiki/Cataloging_(library_science)#History)

What’s the point? What does it have to do with data engineering?

Do you have a metadata catalog? If I ask you for some data, can you tell me the file location without retrieving many files and checking whether you found the correct one?

If you don’t have a data catalog, your data lake uses pre-2500 BCE technology!

Seriously! People have known how to catalog information for almost 5000 years.

In 2022, it’s finally time to upgrade your data lake to 2500 BCE technology!


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.


Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.