A universal and accessible library of every code ever produced to enter a new digital era

Big Code Lab is the research project funded by IFAB that seeks to drive the software world into the future

The digital revolution is all around us and continuously transforming: the use of technology and web infrastructures is part of daily life for the majority of people on the planet. All digital devices – from the simplest to the most complex, from the smartphone in our pockets to super computers – operate thanks to the common language of code. It is code that teaches machines, gets them to work, ensures that they have an impact on us and the society in which we live. Code is what lies behind the social relations triggered by a “like” on social media, behind the Spotify algorithm that suggests songs we might like; code gets the programmes of our computer to work, enables our robot vacuum to find its way around the house, allows doctors to accurately diagnose illnesses, enables cars to drive themselves and sends people into space. Though it may appear relatively unimportant, code has a continuous, concrete and tangible impact on our lives.

With this in mind, and because of its continually changing nature, the “Big Code Lab” project, funded by IFAB, has been launched thanks to the partnership between the ENEA Research Centre of Bologna and INRIA (Institut National de Recherche en Informatique et en Automatique) of Le Chesnay-Rocquencourt in France. The project aims to create a replica – or “mirror” to the use the correct jargon – of the world code archive, “Software Heritage”, created by INRIA. This will enable Italy, the Region of Emilia-Romagna and academics and professionals from the sector to have a continuously up-to-date copy of all the libraries of code that have been stored in the archive until now: a vast expanse of data and information which, in purely numerical terms, consists of over 11 billion files connected with approximately 168 million projects.

The project has two main goals. The first is to contribute to the creation of a global archive of all code that has ever been developed, through the cataloguing and conservation of this vast wealth of human knowledge, which will continue to increase exponentially day by day (the participation of UNESCO in the development of the project is quite significant).

The second is to make the archive widely available and useful, providing libraries of code to the software industry and anyone else interested, as part of the improvement, evolution and innovation of the entire digital sector, with all its applications in society. The existence of Software Heritage and the possibility of sharing open up infinite opportunities for the entire software engineering sector: in fact, beginning with the existing libraries of code in the archive it will be possible to speed up the creation of new code making a big difference in particular in the field of Big Code which studies languages thanks to the precious help of neural networks.

In other words, the “Big Code Lab” project aims to make the sharing of a global archive of code – already of enormous value in itself in terms of the conservation of human knowledge – a key factor for the evolution and innovation of all that is “digital”: last but not least, it wants to take advantage of the huge opportunity represented by the creation of Software Heritage to improve and increase the efficiency of the innumerable practical applications of software in people’s lives and society.

Goals of the project

  • Expanding and consolidating expertise on “Big Code” through an open infrastructure, accessible to research, industry and citizens communities.
  • Involving local institutions (the Emilia-Romagna region, businesses, museums, schools) and citizens;
  • Increasing the level of digital awareness.

Possible applications

  • The Bologna “mirror” with its source code unique dataset will give the opportunity to start a research and explore different areas, such as:
  • Source code mining from large databases using AI and ML to improve software production (development of automatic programmes, correctness, security… )
  • Creating a proper infrastructure to model, implement and manage big repositories. Multi-cloud systems using a set of architectures available locally;
  • training activities, linked to the development of softwares and computational thinking, including multi-disciplinary topics to broad and consolidate knowledge on “Big Code” and on advanced methodologies to implement Big Data archives;
  • involving young researchers in a new field of investigation (Big Code), with great growth opportunities and with a strong strategic role for the regional economic system and for qualified job placement. The idea is to give them the possibility to access computer infrastructures and data archives on an European level;
  • Address the problem of “digital heritage”, trialling new approaches to organise, store and index digital material and make it “resilient”;
  • Developing Open Science (Open Source-Data-Access-Methodology-Peer Review-Educational resources);
  • Promote the Software Heritage mirror at Bologna’s Technopole, creating an environment of high-quality research around it and promoting prestigious collaborations (SH, INRIA, UNESCO), and an environment of connections with the business fabric, involving all the technology transfer organisations and companies.

Research partners

  • IFAB
  • University of Bologna, Department of IT, Science and Engineering (DISI)
  • ENEA – National Agency for New Technologies, Energy and Sustainable Economic Development

For further information, please contact: barbara.vecchi@ifabfoundation.org

Stay updated on the latest IFAB events and projects.Subscribe to our monthly newsletter