In a groundbreaking move to address the pivotal role of data in artificial intelligence (AI), a consortium of major companies, led by the Data & Trust Alliance, has unveiled comprehensive standards for the origin, history, and legal rights associated with data. This initiative aims to alleviate concerns among big businesses, fostering increased confidence in adopting AI technologies by providing transparency about the underlying data.
The newly introduced data provenance standards function as a labeling system, offering insights into where, when, and how data was collected and generated, along with details about its intended use and any associated restrictions. The Data & Trust Alliance comprises influential entities such as American Express, Humana, IBM, Pfizer, UPS, Walmart, and select startups.Executives within the alliance liken the data-labeling system to fundamental standards for food safety, emphasizing the need for basic information on data origin and handling. They assert that clearer information about the data used in AI models will bolster corporate trust in the technology.
Ken Finnerty, President for Information Technology and Data Analytics at UPS, highlights the significance of managing data as an asset. He emphasizes the necessity of knowing the creation circumstances, intended purpose, and legal parameters of data, aligning with industry-wide efforts.Surveys underscore the demand for increased confidence in data and enhanced efficiency in data handling. Concerns about data lineage or provenance were identified as a key barrier to AI adoption in corporate CEO polls. Data scientists, in a separate survey, revealed spending nearly 40% of their time on data preparation tasks.
The data initiative primarily targets business data used for AI programs or selectively fed into AI systems from companies like Google, OpenAI, Microsoft, and Anthropic. The accuracy and trustworthiness of data play a pivotal role in generating reliable AI-driven solutions.In the wake of the rise of generative AI, which powers chatbots like OpenAI's ChatGPT, there has been a heightened focus on the responsible use of data. Generative AI systems, capable of human-like fluency, have sparked concerns about data misuse and hallucination, depending on the accessed and assembled data.
The newly introduced data standards consist of eight basic principles, encompassing lineage, source, legal rights, data type, and generation method, supplemented by detailed descriptions. Companies within the consortium have been rigorously testing and refining the standards, with plans to make them publicly available early next year.Labeling data by type, date, and source has been practiced individually, but the consortium asserts that these are the first detailed standards designed for universal application across all industries.
Thi Montalvo, a data scientist and Vice President of Reporting and Analytics at Transcarent, a startup within the consortium, emphasizes the benefits of enhanced transparency throughout the data supply chain. Montalvo anticipates increased efficiency, eliminating repetitive work and potentially reducing the time spent on data projects by 15% to 20%.The Data & Trust Alliance believes that the clarity provided by their data-labeling standards can address critical issues in the current AI market. Chris Hazard, Co-founder and Chief Technology Officer of Howso, a startup specializing in data-analysis tools and AI software, underscores the potential of these standards to resolve challenges in the AI landscape.