# data engineering

> building data systems to collect and make data usable

**Wikidata**: [Q104659521](https://www.wikidata.org/wiki/Q104659521)  
**Wikipedia**: [English](https://en.wikipedia.org/wiki/Data_engineering)  
**Source**: https://4ort.xyz/entity/data-engineering

## Summary
Data engineering is the practice of designing and building infrastructure to collect, organize, and process raw data into a usable format for analysis. It focuses on creating reliable data systems and pipelines, serving as a foundational component of data science and analytics. This discipline ensures data is accessible, consistent, and actionable for businesses and researchers.

## Key Facts
- Data engineering is a subclass of data science, specializing in infrastructure rather than analysis.
- Core processes include extract, transform, load (ETL) to prepare data for use.
- Practiced by roles such as data engineers, data architects, and business intelligence engineers.
- Distinct from information engineering, which focuses on information systems rather than data pipelines.
- Related to companies like Aristek Systems (founded 2001 in Lithuania), which operates in AI and data engineering.
- Academic discipline and industry with courses and bootcamps available (e.g., via kanger.dev).
- Practitioners require skills in programming, database management, and distributed systems.

## FAQs
### Q: How does data engineering differ from data science?
A: Data engineering focuses on building systems to make data usable, while data science emphasizes analyzing data to extract insights.

### Q: What key processes are involved in data engineering?
A: The primary processes are extract, transform, and load (ETL), which prepare raw data for analysis.

### Q: What roles are central to data engineering?
A: Key roles include data engineers, data architects, data warehouse engineers, and business intelligence engineers.

## Why It Matters
Data engineering is critical in enabling organizations to leverage data for decision-making. Without robust data pipelines, raw data remains siloed and unusable, hindering insights in fields like artificial intelligence, business analytics, and scientific research. It bridges the gap between data collection and application, ensuring scalability, reliability, and accessibility. As data volumes grow, effective engineering practices prevent bottlenecks, reduce costs, and support real-time analytics, driving innovation in industries ranging from healthcare to finance.

## Notable For
- **Infrastructure Focus**: Prioritizes building scalable data architectures over analysis.
- **ETL Foundation**: Relies on extract, transform, load workflows to structure data.
- **Interdisciplinary Role**: Serves as a bridge between software engineering and data science.
- **Industry Applications**: Powers AI systems (e.g., Aristek Systems) and business intelligence tools.

## Body
### Definition and Scope
Data engineering is an academic discipline and industry centered on constructing systems to collect, process, and store data. It ensures data is refined into formats suitable for analysis, distinguishing it from data science, which interprets the processed data.

### Relationship to Data Science
As a subclass of data science, data engineering provides the infrastructure necessary for data scientists to perform analyses. While data science focuses on extracting insights, data engineering ensures the data is accessible and reliable.

### Practices and Roles
- **Key Processes**: ETL (extract, transform, load) pipelines are central to data engineering.
- **Practitioners**: Roles include data engineers, data architects, and business intelligence engineers, each requiring distinct technical skills (e.g., programming, database design).
- **Tools and Systems**: Involves working with databases, cloud platforms, and distributed systems to manage large-scale data workflows.

### Related Entities
- **Aristek Systems**: A Lithuanian software company (founded 2001) specializing in AI, data science, and data engineering.
- **Guillaume Pelletier**: Canadian biochemist and software engineer contributing to data engineering practices.
- **Bradford Tuckfield**: U.S. data scientist and programmer involved in data pipeline development.

## Schema Markup
```json
{
  "@context": "https://schema.org",
  "@type": "Thing",
  "name": "data engineering",
  "description": "building data systems to collect and make data usable",
  "sameAs": ["https://en.wikipedia.org/wiki/Data_engineering"],
  "additionalType": ["academic discipline", "industry", "academic major"]
}

## References

1. [Source](https://kanger.dev/stack/data-engineering-courses-bootcamp)
2. [Source](https://kanger.dev/career/data-engineer)
3. [Source](https://kanger.dev/career/data-architect)
4. [Source](https://kanger.dev/career/data-warehouse-engineer)
5. [Source](https://kanger.dev/career/business-intelligence-engineer)