Simplifying Large Dataset Processing for Data Scientists
Prometheus, a revolutionary service design project developed under Insaite, aimed to simplify large dataset processing for data scients using machine learning techniques. Together, using multiple techniques from design, engineering and data, we brought to life a platform capable of easing pains of data processing and manipulation with the intent of training machine learning data models across the globe, and released an early beta version of it.
Background
In the ever-evolving landscape of data analysis, there exists a growing need for efficient tools to process large datasets. Recognizing this need, the Prometheus project was initiated to streamline data processing for data scientsts through the integration of machine learning techniques.
As the design lead for Prometheus, my responsibilities encompassed guiding the design team in processing insights from user interviews, collaborating with data engineering and data science to develop a sound process, and aligning the product with the company's business objectives.
Key Features
Prometheus boasts several key features designed to simplify large dataset processing:
- Ease of Processing: Users could set up data processing jobs in five minutes or less, ensuring efficiency and ease of use.
- Data Engineering Made Simple: The platform offered customisable data cleanup and preparation options, allowing users to tailor the process to their specific needs.
- Data Ingestion: Users could choose from a variety of data sources, including data sets in documents and API integrations for big data ingestion.
- Model Selection: State-of-the-art models were available for users to train their data on, with options for customisation or recommendations based on the problem type and available data.
Design Process
The design process for Prometheus followed a weekly sprint model, with feedback-derived changes implemented every two weeks. Regular meetings within a fixed multidisciplinary team facilitated collaboration and alignment, while progress presentations to the board ensured continued support and investment.
User research was conducted through one-on-one interviews with data teams, and insights were analyzed using a clustering method. These insights informed iterative design improvements aimed at creating a seamless and easy-to-understand product.
Collaboration with data science and engineering teams was central to the success of Prometheus. Regular meetings allowed for open communication and problem-solving, while innovative design solutions were employed to address user needs and industry challenges.
Outcomes and Impact
Although Prometheus launched in a pre-alpha stage during my tenure, initial feedback from industry leaders was positive. The tool was praised for its ease of use and seamless integration into existing workflows. Valuable feedback regarding customizability and infrastructure stability was received, informing plans for further iterations.
Prometheus represents a significant step forward in simplifying large dataset processing for data teams. Through collaborative design efforts and iterative improvements, Prometheus aims to revolutionise the way data teams work with large datasets, driving efficiency and innovation in the industry.
