Newest 'data-quality' Questions

1 vote

1 answer

45 views

How to detect and remove inconsistent timestamps in a time-series dataset?

I’m working with a time-series dataset where each record is supposed to be logged at 1-minute intervals. However, due to data quality issues, the dataset contains: duplicated timestamps missing ...

Kinjal Radadiya

35

asked yesterday

0 votes

0 answers

34 views

Pydeequ - Volume checks based on custom result key values

Im currently trying to implement Pydeequ for identifying anomalies in volumes for specific time periods, the problem is that pydeequ is picking up the latest entry from the metrics repository instead ...

polo1211

1

asked Mar 25 at 15:36

0 votes

0 answers

47 views

Unexpected Feature ID in Yahoo! Webscope ydata-frontpage-todaymodule-clicks-v1_0 Dataset

I'm working with the Yahoo! Webscope dataset ydata-frontpage-todaymodule-clicks-v1_0 (specifically, the click logs for the first ten days in May 2009). The dataset description states that each user ...

amarchin

2,124

asked Jan 11 at 0:13

0 votes

1 answer

367 views

How to get rows that failed CustomSql data quality check in AWS Glue

According to this documentation page, AWS Glue can now detect rows that failed a CustomSql data quality check. I tried it and I am not seeing the rows that failed but only a % of failed data. Here is ...

Haha

1,019

asked Oct 16, 2024 at 14:53

-1 votes

1 answer

51 views

Verify all codes under one system has one name

Row 4&5 have same value in col C and also have same value in Col D (Correct) Row 6&7 have same value in Col C but different value in Col D (Incorrect) So all unique combination in ColA &B ...

user3585510

151

asked Jun 24, 2024 at 16:16

0 votes

1 answer

97 views

Execute a query that is on a field in a table

Currently I have a table that saves data quality results (using Dataplex), in this table it leaves me a query to see the data that does not meet the quality rule. Example: In order to know which ...

Bastián SN

405

asked Jun 19, 2024 at 18:30

1 vote

1 answer

110 views

Quality control and quality control table in SSIS using DQS

I am working on DWH doing incremental load in staging from application database then doing quality checking and loading the data data in reporting schema with rows having flag as 0/1(for errors) using ...

Pritesh singh

21

asked May 16, 2024 at 10:37

0 votes

1 answer

643 views

Great Expectations expect_column_values_to_be_unique with nulls in columns

I'm developing a solution to make a data quality check in one column, and already used the rule expect_column_values_to_be_unique in many other columns like the following: df....

Lucas Mengual

415

asked Apr 12, 2024 at 6:59

1 vote

2 answers

319 views

SAS create a table only if there is data available

I have some data in SAS that I am performing QA on. I know I can output data to different tables using IF statements etc. What I want to do is output data to a table called 'error_data' if it fails a ...

Sproodle

35

asked Feb 16, 2024 at 16:12

0 votes

1 answer

115 views

Data quality rule on REDCap returns two associated discrepancies

REDCap returns two associated discrepancies to the same rule. One shows that the values involver have no complete data ([no_data]) an the second one returns the case with the discrepancy that matches ...

user_pir

13

asked Dec 19, 2023 at 10:35

0 votes

1 answer

825 views

How to actually measure/compute data quality

I need to come up with data quality metrics for a project and how to measure them. I've been googling and reading and I understood that you can 'measure' the quality of data using the 6 dimensions (...

Alex

11

asked Nov 2, 2023 at 13:07

2 votes

1 answer

2k views

create a custom expectation in Great Expectations to validate multiple unique observations based on a given key in a DataFrame

Regarding Great Expectations I want to create a custom expectation to validate if there are multiple unique observations of id_client based on a given id_product key in a DataFrame. After set up my ...

PeCaDe

478

asked Oct 9, 2023 at 14:23

3 votes

3 answers

862 views

AWS DataQuality Rules should fail but passed for null value

I have a csv file with 8 columns. within the columns i purposely deleted some cells. When i tried to run a Glue DataQuality job, for IsComplete, the result passed (which is not supposed to) for one ...

khorjle

31

asked Aug 17, 2023 at 4:03

1 vote

1 answer

262 views

How to process multiple csv files for identifying null values in R?

I have various .csv files. Each file has multiple columns. I am using the given code in R to pursue a quality check that for a particular column, how many rows have valid values and how many are null. ...

Michael_Brun

51

asked Jul 6, 2023 at 22:21

1 vote

0 answers

636 views

How to write regular expressions in the Rules Grid in Abinitio

I have a regular expression which is working perfectly fine in the Sheet view in Abinitio ExpressIT but I am trying to do the same in the Rules Grid / Grid view But I am not sure which function can I ...

JKC

2,628

asked Jun 29, 2023 at 0:15

1 vote

1 answer

1k views

Using great expectations with databricks autolaoder

I have implemented a data pipeline using autoloader bronze --> silver --> gold. now while I do this I want to perform some data quality checks, and for that I'm using great expectations library. ...

Chhaya Vishwakarma

1,447

asked Mar 10, 2023 at 11:12

0 votes

2 answers

417 views

Check the data quality in Google Sheets (asking for suggestions)

I'm trying to create a sheet to check the data quality from a survey in Google Sheets the document have this format: So basically I was using this formula =COUNTIF(B2:F2,"Don't know") to ...

user16239103

asked Mar 7, 2023 at 1:44

0 votes

0 answers

75 views

How to Null check multiple columns, with casting reporting elements

Looking for the most efficient way to check for nulls and have a desired output for a report. This is done in a Hadoop environment. For example, Database contains: FirstName LastName State John {null} ...

Supernova

31

asked Jan 29, 2023 at 16:09

1 vote

1 answer

918 views

great expectation with delta table

I am trying to run a great expectation suite on a delta table in Databricks. But I would want to run this on part of the table with a query. Though the validation is running fine, it's running on full ...

S.Dasgupta

71

asked Jan 16, 2023 at 7:27

2 votes

1 answer

418 views

Using Pydequu on Jupyter Notebook and having this "An error occurred while calling o70.run.'

I'm trying to use Pydequu on Jupyter Notebook when i try to use ConstraintSuggestionRunner and show this error: Py4JJavaError: An error occurred while calling o70.run. : java.lang.NoSuchMethodError: '...

LuisRicardo

21

asked Nov 18, 2022 at 17:12

1 vote

0 answers

562 views

how can I specify a different database and schema to create temporary tables in Great Expectations?

Great Expectations creates temporary tables. I tried profiling data in my Snowflake lab. It worked because the role I was using could create tables in the schema that contained the tables I was ...

Alex Woolford

4,583

asked Oct 19, 2022 at 15:33

1 vote

1 answer

924 views

python great expectation compatible with pyspark

I am implementing data quality checks using Great expectation library. does this library compatible with Pyspark does this run on multiple cores?

code_bug

415

asked Oct 7, 2022 at 8:00

0 votes

1 answer

883 views

How to UPIVOT all columns in a table and aggregate into Data Quality/ Validation Metrics? SQL SNOWFLAKE

I have a table with 60+ columns in it that I would like to UNPIVOT so that each column becomes a row and then find the fill rate, min value and max value of each entry. For Example ID START_DATE ...

user18623003

1

asked Aug 22, 2022 at 10:00

0 votes

1 answer

97 views

How to change the way Talend formulates SQL queries in a JDBC connection?

In Talend Data Quality, I have configured a JDBC connection to an OpenEdge database and it's working fine. I can pull the list of tables and select columns to analyse, but when executing analysis, I ...

Sergei K.

1

asked Aug 2, 2022 at 12:50

0 votes

0 answers

65 views

Repairing data in a Pandas dataframe when duplicate data exists

I've not had to do any heavy lifting with Pandas until now, and now I've got a bit of a situation and can use some guidance. I've got some code that generates the following dataframe: ID_x HOST_NM ...

Magneto Optical

1

asked Jun 3, 2022 at 19:31

Collectives™ on Stack Overflow

How to detect and remove inconsistent timestamps in a time-series dataset?

Pydeequ - Volume checks based on custom result key values

Unexpected Feature ID in Yahoo! Webscope ydata-frontpage-todaymodule-clicks-v1_0 Dataset

How to get rows that failed CustomSql data quality check in AWS Glue

Verify all codes under one system has one name

Execute a query that is on a field in a table

Quality control and quality control table in SSIS using DQS

Great Expectations expect_column_values_to_be_unique with nulls in columns

SAS create a table only if there is data available

Data quality rule on REDCap returns two associated discrepancies

How to actually measure/compute data quality

create a custom expectation in Great Expectations to validate multiple unique observations based on a given key in a DataFrame

AWS DataQuality Rules should fail but passed for null value

How to process multiple csv files for identifying null values in R?

How to write regular expressions in the Rules Grid in Abinitio

Using great expectations with databricks autolaoder

Check the data quality in Google Sheets (asking for suggestions)

How to Null check multiple columns, with casting reporting elements

great expectation with delta table

Using Pydequu on Jupyter Notebook and having this "An error occurred while calling o70.run.'

how can I specify a different database and schema to create temporary tables in Great Expectations?

python great expectation compatible with pyspark

How to UPIVOT all columns in a table and aggregate into Data Quality/ Validation Metrics? SQL SNOWFLAKE

How to change the way Talend formulates SQL queries in a JDBC connection?

Repairing data in a Pandas dataframe when duplicate data exists

Hot Network Questions