Do you struggle with figuring out what went wrong when a data pipeline fails?
Is it difficult to even spot a failure?
Are you afraid to rerun a failed pipeline because something else may break?

Does your finance or marketing team trust the reports produced by the data engineering team?
Have you spent hundreds of thousands of dollars building data pipelines, and nobody wants to use the results?

Can your team prove that the code works correctly?

When do you see errors?
When the error occurs or in the final report?
How long does it take to debug the cause of incorrect data in the report?
Do you need a few days to figure it out?

How do you find errors?
Does your alerting work well, or do you need to check statuses and application logs manually?
Is your team spending the first hour of every day looking around to see what failed?

Are you afraid to change the code?
Do you have a service that is running right now, but you have no idea how to start it again if it fails?
Are you testing in production or by running the whole workflow on your local machine?

Is your data pipeline extremely slow?
Does everything break apart if one ETL runs 15 minutes longer than usual?

I can help you fix that!

Some of the areas I consult in:

  • Helping people who don't specialize in testing to test their data pipelines.
  • Using tools to aid monitoring, observability, and testing of your data pipelines and machine learning models.
  • Automating data infrastructure on AWS.
  • Optimizing Apache Spark-based data pipelines.
  • Creating a culture of technical excellence by nurturing software craft skills.

