Data Processing Pipeline Template

This template helps you design and implement scalable data processing pipelines for ETL operations, real-time streaming, and batch processing. It covers data ingestion, transformation, and output strategies.

Sample Prompt

Design a data processing pipeline for customer analytics:
- Real-time event ingestion from web applications
- Data validation and cleansing procedures
- Transformation logic for aggregations and enrichment
- Storage in data warehouse for analytics queries
- Error handling and data quality monitoring
- Scalable architecture for high-volume data processing

What This Template Covers

Data ingestion strategies for batch and real-time data sources
Transformation logic including validation, cleansing, and enrichment
Pipeline orchestration with workflow management and scheduling
Error handling and data quality monitoring procedures
Scalability design for high-volume and high-velocity data processing
Monitoring and observability for pipeline health and performance

Best Practices

Design for idempotency to handle reprocessing and failures gracefully
Implement comprehensive error handling with proper logging and alerting
Use schema validation to catch data quality issues early
Plan for scalability with horizontal scaling and parallel processing
Monitor data quality with automated checks and quality metrics
Implement proper backpressure handling for streaming pipelines
Use appropriate storage formats optimized for your access patterns
Document data lineage for compliance and debugging purposes

When to Use This Template

Building ETL pipelines for data warehousing
Implementing real-time data streaming and processing
Creating data integration solutions between systems
Processing large-scale batch data operations
Setting up analytics and reporting data workflows

The Script

peep data-processing-pipeline