- Data Pipeline
- An automated sequence of processes that moves, transforms, and loads data from source systems to a destination such as a data warehouse or data lake.
- ETL / ELT
- Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) — two approaches to ingesting raw data and preparing it for analysis.
- Data Warehouse
- A centralized repository that stores structured, processed data optimized for querying and reporting, such as Snowflake, BigQuery, or Redshift.
- Data Lake
- A storage system that holds large volumes of raw, unstructured, or semi-structured data until it is needed for processing or analysis.
- Orchestration
- The scheduling and coordination of data pipeline tasks and dependencies, commonly managed using tools like Apache Airflow or Prefect.
- Data Modeling
- The practice of designing how data is structured, related, and stored within a database or warehouse to support efficient querying and reporting.
- Streaming Data
- Continuous, real-time data flows processed as they arrive rather than in scheduled batch jobs — commonly handled using Kafka, Flink, or Spark Streaming.
- IC (Individual Contributor)
- An employee who delivers technical work directly without managing other staff — most senior data engineers are ICs unless the role explicitly includes people management.
- Data Governance
- The policies, standards, and processes that ensure data quality, security, lineage, and compliance across an organization's data assets.
- SLA (Service Level Agreement)
- A defined standard for pipeline reliability or data freshness that the data engineering team commits to delivering — for example, daily pipeline completion by 6:00 AM.
- Compensation Band
- The minimum-to-maximum salary range established for a role level, used to ensure consistent pay equity across equivalent positions.