Data Engineering

Enhancing Product Performance Analysis with Data Engineering and Machine Learning

Company Overview
  • US-based consumer e-commerce firm
  • $16B in revenue
Tech Overview
  • 100M+ Customers
  • 5M+ Products
  • 30k+ Global suppliers

Business & Technical Challenges

  • To identify high-performing products for targeted promotion
  • To assist suppliers in enhancing their product quality based on data-driven feedback
  • How to build an integrated ecosystem that enables
  • How to build an integrated ecosystem that enables
  • Streamlined data processing and transformation
  • Real-time analysis and visualization to support effective data driven decision-making
  • Automated workflow orchestration
  • High system performance across 30K+ external data sources

Canterr's Solution

  • Designed a GCP-based system that captures, processes, and integrates data from multiple sources for analysis
  • Developed async data streaming using Google Pub/Sub into DataProc for data cleansing, transformation & analysis using BigQuery
  • Orchestrated the entire data collection workflow using Cloud Composer to trigger downstream AI models for prediction of product performance
  • Built intuitive dashboards and insights, aiding in business decision-making using Looker
  • Facilitated the extraction and transformation of data from RDBMS sources using Debezium, enriching the dataset usedin the analysis
  • Designed for fault tolerance, high availability using GCP best practices

Results

  • Successfully predicting sales performance across large assortment drives inventory, advertising, and many other systems
  • Providing contextual feedback to suppliers helps them improve their offerings, in turn improving revenue and margin
  • 15+% improvement in ads ROI
  • Reduced time on 4% incremental revenue security operations
  • Automated data integration and CICD processes
  • Significantly scaled up analytics from previous manual efforts
  • Reduced onboarding time for suppliers and time to market for products