License-Aware Web Crawling for Open Search AI (LAW4OSAI)

The goal of the LAW4OSAI project is to enable license-aware crawling of web content by automatically identifying and retrieving content licenses. The project aims at enabling open web search filtered by licenses and more importantly the development of open large language models for next-generation search technology, like conversational search or image generation, that respect the rights of authors and copyright.

The project was a collaboration with the Liquid Legal Institute, and fingolex. LAW4OSAI is part of the OpenWebSearch.EU community.

Results

  • LAW4OSAI Content License Extractor Library: This python library extracts open content licenses (like Creative Commons or GNU Free Documentation License) and the elements they apply to from websites.
  • LAW4OSAI Content License Annotation Browser Plugin: This Chromium browser plugin is designed to annotate content licenses on websites.
  • Braun, D., Cevc, B., & Waltl, B. (2025). Berücksichtigung von Lizenzen beim Crawlen von KI-Trainingsdaten. Legal Tech – Zeitschrift Für Die Digitale Anwendung (LTZ), 2.


The project has received funding from the European Union’s Horizon research and innovation programme under grant agreement No 101070014 OpenWebSearch.EU project within its Cascading Funding.
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union, granting authority. Neither the European Union nor the granting authority can be held responsible for them.


References

2025

  1. Berücksichtigung von Lizenzen beim Crawlen von KI-Trainingsdaten
    Daniel Braun, Baltasar Cevc, and Bernhard Waltl
    Legal Tech – Zeitschrift für die digitale Anwendung (LTZ) (2). Nomos, 2025