SQL Filter Advices

SQL statements for filter to identify and isolate valid data

Cause:

Problem:

WHERE (condition)

HAVING (condition)

Filters NULL but leaves blank

WHERE field IS NOT NULL

Filters NULL and filters blank

WHERE field IS NOT NULL AND field < > " "


SQL statements for detection, enforce uniqueness for consistency

Cause:

Problem:

A difference means there are more duplicates

COUNT (DISTINCT field)

COUNT (field)

smaller 1 indicates duplicates

COUNT (field)

GROUP BY (field)

Use String Functions to clean data:

PARSE_DATE()

SUBSTR()

REPLACE()

Store one fact in one location and use IDs for lookup


SQL statements to test data against known good values for accuracy

Cause:

Problem:

Create test cases or calculated fileds to check values

SQL:(quantity_ordered * item_price) AS sub_total

Lookup values against on objective reference dataset

SQL:IN() with a subquery or JOIN


SQL statements to identify and fill in missing values for completeness

Cause:

Problem:

Thoroughly explore the existing dataset shape and skew and look for missing data

NULLIF(), IFNULL(), COALESCE()

Enrich the existing dataset with others using UNIONs and JOINs

UNION, JOIN

Verify file integrity with checksum values (hash, MD5)


SQL statements to make data types and formats explicit for uniformity

Cause:

Problem:

Document and comment your approach

Use FORMAT() to clearly indicate units

FORMAT()

CAST() data types to the same format and digits

CAST()

CAST()

Label all visualizations appropriately