Data Processing Pipeline Template

This template helps you design and implement scalable data processing pipelines for ETL operations, real-time streaming, and batch processing. It covers data ingestion, transformation, and output strategies.

Sample Prompt

Design a data processing pipeline for customer analytics:
- Real-time event ingestion from web applications
- Data validation and cleansing procedures
- Transformation logic for aggregations and enrichment
- Storage in data warehouse for analytics queries
- Error handling and data quality monitoring
- Scalable architecture for high-volume data processing

What This Template Covers

  • Data ingestion strategies for batch and real-time data sources
  • Transformation logic including validation, cleansing, and enrichment
  • Pipeline orchestration with workflow management and scheduling
  • Error handling and data quality monitoring procedures
  • Scalability design for high-volume and high-velocity data processing
  • Monitoring and observability for pipeline health and performance

Best Practices

  1. Design for idempotency to handle reprocessing and failures gracefully
  2. Implement comprehensive error handling with proper logging and alerting
  3. Use schema validation to catch data quality issues early
  4. Plan for scalability with horizontal scaling and parallel processing
  5. Monitor data quality with automated checks and quality metrics
  6. Implement proper backpressure handling for streaming pipelines
  7. Use appropriate storage formats optimized for your access patterns
  8. Document data lineage for compliance and debugging purposes

When to Use This Template

  • Building ETL pipelines for data warehousing
  • Implementing real-time data streaming and processing
  • Creating data integration solutions between systems
  • Processing large-scale batch data operations
  • Setting up analytics and reporting data workflows

The Script

peep data-processing-pipeline