ABSTRACT

In the last decade, we’ve seen an extraordinary explosion in the volume of data that we, as a species, generate. The dramatic improvements in technology and computing infrastructure have also been critical to this data explosion. Generative AI capabilities are fuelled by large language models (LLMs), which are trained on vast datasets on high-performance technology platforms. These are now being explored for a diverse range of tasks, from text completion to image generation and scene understanding. This power of AI models that can help solve societal problems is significant.

Driven by the potential of these models, which rely on enormous amounts of data for their training, privacy issues have come to the fore. This has led to key laws like the European Union’s General Data Protection Regulation (GDPR), California’s Consumer Privacy Act (CCPA), and the European AI Act being enacted. India has also recently introduced the Data Protection and Privacy Act of 2023 (DPDP Act). These laws emphasize the importance of individuals’ right to privacy and the need for real-time, granular, and specific consent when sharing personal data. However, in addition to such privacy laws, a techno-legal approach for data sharing is needed.

In order to address these possibilities and benefits while safeguarding individuals’ privacy, we propose data empowerment and protection architecture (DEPA) for machine learning (ML) training (also known as DPI for AI) (DEPA Training Framework | DEPA World, 2023).