Building trustworthy data pipelines because AI cannot learn from dirty data
Archive of posts with
Check-Engine - data quality validation for PySpark 3.0.0
Last week, I was testing whether we can use AWS Deequ for data quality validation. I ran into a few problems. First of all, it was using an outdated version...
06 Jul 2020