- Data Ingestion
- The process of importing raw data from one or more sources into a system where it can be stored and processed.
- Data Validation
- A check applied to incoming data to confirm it meets defined format, completeness, and accuracy requirements before further processing.
- Data Transformation
- Converting data from its source format or structure into the format required by a target system, analysis tool, or reporting layer.
- ETL (Extract, Transform, Load)
- A three-step data integration process: extracting data from a source, transforming it to fit the target schema, and loading it into a destination system.
- Data Lineage
- A traceable record of where data originated, how it has moved, and what transformations were applied to it at each stage.
- Data Quality
- A measure of data's fitness for use, assessed across dimensions such as accuracy, completeness, consistency, timeliness, and uniqueness.
- Data Steward
- A person responsible for managing a defined dataset β including quality, documentation, and access β within an organization's data governance structure.
- PII (Personally Identifiable Information)
- Any data that can be used, alone or combined with other data, to identify a specific individual β such as name, email address, or national ID number.
- Data Retention Policy
- A documented rule specifying how long a particular type of data must be kept and the procedure for securely deleting or archiving it afterward.
- Audit Trail
- A chronological record of who accessed, modified, or processed data, used to support accountability and compliance verification.
- Normalization
- Restructuring data to remove redundancy and ensure consistent formatting across records β for example, standardizing date formats or casing in text fields.
- Data Masking
- Replacing sensitive data values with anonymized or fictitious equivalents to protect PII during testing, reporting, or sharing with third parties.