Full join issue: When use full join, the below join condition should not occur:
1, OR statement
2, an obvious false or true condition, like 1 = 0 or 1=1
3, the datatype of the join column should not be timestamp, this case is very common in pcp join, when use the report date as the join key.
ERROR: 42846: could not convert type “unknown” to numeric because of modifier
When the datatype of a column is supposed to be numeric in a UNION ALL context(basically there are more than three sub query parts),
If there exists NULL value for some sub query, it is needed to cast the NULL value to the numeric datatype.
Error info: [Assert error]
This means the Redshift performance is bad, usually multiple large table join, more CTAS is needed.
Error info: query is automatically killed saying cancelled on user’s request
Error info: could not devise a query plan for the given query
The issue is prabably due to there is an join condition in the full join clause. The join column is report_date, which is a date datatype, but there is function applied to report_date, which convert the datatype to timestamp. Unluckily redshift seems not supported this datatype in the join condition.
To solve this issue, we need to cast the join column to date again, like below:
DATEADD(MONTH,-12,CUR.REPORT_DATE) :: DATE = RESULT_PCP.REPORT_DATE ::DATE
Tips: Avoid to use the string concatenation when doing a join, create a temporary table for that particular column.
Redshift related documents
http://docs.aws.amazon.com/redshift/latest/dg/c_Byte_dictionary_encoding.html
http://docs.aws.amazon.com/redshift/latest/dg/welcome.html
https://aws.amazon.com/redshift/pricing/
Sort key testing
Currently test on a table about 10millon rows, no significant difference, Need a larger data set to do the testing.
Previously, the table with sort key is not applied column compression, so this could be an potential reason why there is no significant improvement.