SCD2Loader
loadx.scd2.loader.SCD2Loader
Slowly Changing Dimension Type 2 loader for PySpark DataFrames.
Tracks historical changes using valid_from/valid_until date ranges, hash-based change detection, and active/delete flags.
Source code in loadx/scd2/loader.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
slowly_changing_dimension(df_src: DataFrame, business_keys: list[str] | str, date_column: str = DEFAULT_DATE_COLUMN, df_tgt: DataFrame | None = None, ignore_columns: list[str] | None = None, non_copy_fields: list[str] | None = None, open_end_date: datetime | None = OPEN_END_DATE, scd_columns: SCD2ColumnNames | dict[str, str] | None = None, enable_latest_record_flag: bool = False, source_type: SourceType = SourceType.FULL) -> DataFrame
Process slowly changing dimension type 2 transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_src
|
DataFrame
|
Source DataFrame containing the data to process |
required |
business_keys
|
list[str] | str
|
List of columns that constitute the business key |
required |
date_column
|
str
|
Column name containing snapshot dates |
DEFAULT_DATE_COLUMN
|
df_tgt
|
DataFrame | None
|
Optional target DataFrame for incremental loads |
None
|
ignore_columns
|
list[str] | None
|
Columns to ignore when calculating row hashes |
None
|
non_copy_fields
|
list[str] | None
|
Fields to exclude from source to target |
None
|
open_end_date
|
datetime | None
|
Date to use for active records (default: 9999-12-31) |
OPEN_END_DATE
|
scd_columns
|
SCD2ColumnNames | dict[str, str] | None
|
Override default SCD2 output column names. Accepts an
|
None
|
enable_latest_record_flag
|
bool
|
When |
False
|
source_type
|
SourceType
|
|
FULL
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with SCD2 columns and transformations applied |
Raises:
| Type | Description |
|---|---|
EmptyDataExceptionError
|
When source DataFrame is empty |
OldDataExceptionError
|
When source data is older than target data |
ValueError
|
When invalid parameters are provided |
Source code in loadx/scd2/loader.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |