Cause:
Problem:
WHERE (condition)
HAVING (condition)
Filters NULL but leaves blank
WHERE field IS NOT NULL
Filters NULL and filters blank
WHERE field IS NOT NULL AND field < > " "
Cause:
Problem:
A difference means there are more duplicates
COUNT (DISTINCT field)
COUNT (field)
smaller 1 indicates duplicates
COUNT (field)
GROUP BY (field)
Use String Functions to clean data:
PARSE_DATE()
SUBSTR()
REPLACE()
Store one fact in one location and use IDs for lookup
Cause:
Problem:
Create test cases or calculated fileds to check values
SQL:(quantity_ordered * item_price) AS sub_total
Lookup values against on objective reference dataset
SQL:IN() with a subquery or JOIN
Cause:
Problem:
Thoroughly explore the existing dataset shape and skew and look for missing data
NULLIF(), IFNULL(), COALESCE()
Enrich the existing dataset with others using UNIONs and JOINs
UNION, JOIN
Verify file integrity with checksum values (hash, MD5)
Cause:
Problem:
Document and comment your approach
Use FORMAT() to clearly indicate units
FORMAT()
CAST() data types to the same format and digits
CAST()
CAST()
Label all visualizations appropriately