137 questions
1
vote
1
answer
45
views
How to detect and remove inconsistent timestamps in a time-series dataset?
I’m working with a time-series dataset where each record is supposed to be logged at 1-minute intervals.
However, due to data quality issues, the dataset contains:
duplicated timestamps
missing ...
0
votes
0
answers
34
views
Pydeequ - Volume checks based on custom result key values
Im currently trying to implement Pydeequ for identifying anomalies in volumes for specific time periods, the problem is that pydeequ is picking up the latest entry from the metrics repository instead ...
0
votes
0
answers
47
views
Unexpected Feature ID in Yahoo! Webscope ydata-frontpage-todaymodule-clicks-v1_0 Dataset
I'm working with the Yahoo! Webscope dataset ydata-frontpage-todaymodule-clicks-v1_0 (specifically, the click logs for the first ten days in May 2009). The dataset description states that each user ...
0
votes
1
answer
367
views
How to get rows that failed CustomSql data quality check in AWS Glue
According to this documentation page, AWS Glue can now detect rows that failed a CustomSql data quality check.
I tried it and I am not seeing the rows that failed but only a % of failed data.
Here is ...
-1
votes
1
answer
51
views
Verify all codes under one system has one name
Row 4&5 have same value in col C and also have same value in Col D (Correct)
Row 6&7 have same value in Col C but different value in Col D (Incorrect)
So all unique combination in ColA &B ...
0
votes
1
answer
97
views
Execute a query that is on a field in a table
Currently I have a table that saves data quality results (using Dataplex), in this table it leaves me a query to see the data that does not meet the quality rule.
Example:
In order to know which ...
1
vote
1
answer
110
views
Quality control and quality control table in SSIS using DQS
I am working on DWH doing incremental load in staging from application database then doing quality checking and loading the data data in reporting schema with rows having flag as 0/1(for errors) using ...
0
votes
1
answer
643
views
Great Expectations expect_column_values_to_be_unique with nulls in columns
I'm developing a solution to make a data quality check in one column, and already used the rule expect_column_values_to_be_unique in many other columns like the following:
df....
1
vote
2
answers
319
views
SAS create a table only if there is data available
I have some data in SAS that I am performing QA on. I know I can output data to different tables using IF statements etc. What I want to do is output data to a table called 'error_data' if it fails a ...
0
votes
1
answer
115
views
Data quality rule on REDCap returns two associated discrepancies
REDCap returns two associated discrepancies to the same rule.
One shows that the values involver have no complete data ([no_data]) an the second one returns the case with the discrepancy that matches ...
0
votes
1
answer
825
views
How to actually measure/compute data quality
I need to come up with data quality metrics for a project and how to measure them. I've been googling and reading and I understood that you can 'measure' the quality of data using the 6 dimensions (...
2
votes
1
answer
2k
views
create a custom expectation in Great Expectations to validate multiple unique observations based on a given key in a DataFrame
Regarding Great Expectations I want to create a custom expectation to validate if there are multiple unique observations of id_client based on a given id_product key in a DataFrame.
After set up my ...
3
votes
3
answers
862
views
AWS DataQuality Rules should fail but passed for null value
I have a csv file with 8 columns. within the columns i purposely deleted some cells.
When i tried to run a Glue DataQuality job, for IsComplete, the result passed (which is not supposed to) for one ...
1
vote
1
answer
262
views
How to process multiple csv files for identifying null values in R?
I have various .csv files. Each file has multiple columns. I am using the given code in R to pursue a quality check that for a particular column, how many rows have valid values and how many are null. ...
1
vote
0
answers
636
views
How to write regular expressions in the Rules Grid in Abinitio
I have a regular expression which is working perfectly fine in the Sheet view in Abinitio ExpressIT but I am trying to do the same in the Rules Grid / Grid view
But I am not sure which function can I ...
1
vote
1
answer
1k
views
Using great expectations with databricks autolaoder
I have implemented a data pipeline using autoloader bronze --> silver --> gold.
now while I do this I want to perform some data quality checks, and for that I'm using great expectations library.
...
0
votes
2
answers
417
views
Check the data quality in Google Sheets (asking for suggestions)
I'm trying to create a sheet to check the data quality from a survey in Google Sheets the document have this format:
So basically I was using this formula =COUNTIF(B2:F2,"Don't know") to ...
0
votes
0
answers
75
views
How to Null check multiple columns, with casting reporting elements
Looking for the most efficient way to check for nulls and have a desired output for a report. This is done in a Hadoop environment.
For example,
Database contains:
FirstName
LastName
State
John
{null}
...
1
vote
1
answer
918
views
great expectation with delta table
I am trying to run a great expectation suite on a delta table in Databricks. But I would want to run this on part of the table with a query. Though the validation is running fine, it's running on full ...
2
votes
1
answer
418
views
Using Pydequu on Jupyter Notebook and having this "An error occurred while calling o70.run.'
I'm trying to use Pydequu on Jupyter Notebook when i try to use ConstraintSuggestionRunner and show this error:
Py4JJavaError: An error occurred while calling o70.run.
: java.lang.NoSuchMethodError: '...
1
vote
0
answers
562
views
how can I specify a different database and schema to create temporary tables in Great Expectations?
Great Expectations creates temporary tables. I tried profiling data in my Snowflake lab. It worked because the role I was using could create tables in the schema that contained the tables I was ...
1
vote
1
answer
924
views
python great expectation compatible with pyspark
I am implementing data quality checks using Great expectation library. does this library compatible with Pyspark does this run on multiple cores?
0
votes
1
answer
883
views
How to UPIVOT all columns in a table and aggregate into Data Quality/ Validation Metrics? SQL SNOWFLAKE
I have a table with 60+ columns in it that I would like to UNPIVOT so that each column becomes a row and then find the fill rate, min value and max value of each entry.
For Example
ID
START_DATE
...
0
votes
1
answer
97
views
How to change the way Talend formulates SQL queries in a JDBC connection?
In Talend Data Quality, I have configured a JDBC connection to an OpenEdge database and it's working fine.
I can pull the list of tables and select columns to analyse, but when executing analysis, I ...
0
votes
0
answers
65
views
Repairing data in a Pandas dataframe when duplicate data exists
I've not had to do any heavy lifting with Pandas until now, and now I've got a bit of a situation and can use some guidance.
I've got some code that generates the following dataframe:
ID_x HOST_NM ...