Data Licensing: Best Practices for Licensing your Business Data

Written By: Liz Gray (IP Lawyer) & Charlotte Tyhurst (Articling Student)

In today’s rapidly evolving digital landscape, data has emerged as one of the most valuable assets for businesses, driving innovation in artificial intelligence (AI) and data analytics.  The ability to collect, analyze, and use data is fueling advancements across industries, from healthcare to finance to retail.  

Yet, with great value comes great responsibility.  Protecting and monetizing your business data is of paramount importance.  This is why data licensing has emerged as a critical tool for companies looking to leverage these data, while maintaining control over their use.

The Importance of Data Licensing in the Digital Age

As organizations generate and collect vast amounts of data, the need to protect their data becomes paramount.   Without proper safeguards, companies risk losing control over their data, exposing themselves to legal challenges, and missing out on potential revenue streams.

Data licensing agreements provide formal frameworks that dictate how data can be used, shared, and monetized.  These contracts create essential safeguards for businesses, as well as drive business value and increase profit.

Using a standardized data licensing framework will help provide much-needed consistency and predictability to this evolving legal landscape.  In 2019, a group of AI researchers and legal professionals coauthored the publication, “Towards Standardization of Data Licenses: The Montreal Data License”, which offers valuable guidance to data licensors navigating the complexities of data licensing.

Top 5 Tips and Best Practices for Data Licensing

1. Understand the Specific Terminology in Data Licenses

Conceptual ambiguities can lead to disputes and misunderstandings, which makes it essential to define key terms precisely.

For instance, the Montreal Data License publication recommends providing precise definitions for terms such as “data”, “raw data”, and “processed data”.  With the explicit differentiation of such terms, licensors can minimize the risk of ambiguity and ensure that their data is used in ways that align with their expectations.

2. Clearly Define the Dataset Covered by the License

It’s essential to clearly outline the scope of the data covered by the license.  This includes specifying the type of data and its format, as well as adhering to all legal requirements for its collection, storage, and use.

In Canada, the Personal Information Protection and Electronic Documents Act (PIPEDA) establishes guidelines for how private-sector organizations collect, use, and disclose personal information in the course of commercial activities.  Additionally, certain datasets that include sensitive information such as names, ID numbers, employee files, credit records, and medical records may also be subject to additional regulations, such as the Economic and Fiscal Update Act in Ontario.

Further, legal requirements are structured differently across jurisdictions.  For instance, the General Data Protection Regulation (GDPR) provides the regulatory framework for the protection of personal data in Europe, whereas in the United States, data protection and privacy may be governed at the national, state, and/or local level.

Data licensors must also be mindful of specific trademarks, trade names, logos, and any other intellectual property that ought to be prohibited by the license. 

3. Explicitly State Permitted Uses of the Data

Many data licensing agreements grant the right to “use” data without clearly defining what this use entails.  This can lead to confusion and even misuse.  To avoid this, it’s crucial for the license to explicitly stipulate the permitted uses of the data.

The Montreal Data License Publication provides further guidance on different types of use rights to be clarified in a data licensing agreement, differentiating between rights for use of the data alone, and rights for use of the data in conjunction with AI models:

Rights to Data (stand-alone use)

  • Access
  • Labelling/Tagging
  • Distribution
  • Re-Representation

Rights to Data (use with AI models)

  • Benchmarking
  • Research
  • Publishing
  • Internal Use
  • Output commercialization
  • Model Commercialization

By clearly defining permitted uses, licensors can ensure that their data is used in ways that are consistent with their business objectives and align with legal requirements.

4. Address Derivative Works and Patent Ownership
Defining Derived Data

As AI technologies advance, the ownership of derivative works becomes an increasingly crucial aspect of data licensing.  As such, it is important that data licensing agreements differentiate between “original data” and “derived data”.  Original data refers to the raw data provided by the licensor, while derived data is new information generated through analysis, processing, or modification of the original data.

To safeguard the licensor’s interests, derived data should be defined in a way that ensures it cannot be reverse-engineered to reveal the original data or used as a commercial substitute.  This distinction helps maintain the value of the original data and ensures that derived data is treated appropriately within the agreement.  

Ownership of Derivative Works

Ownership of derivative works, which are modified versions of the original data, must also be explicitly addressed.  Under copyright law, the creator of a new compilation or modification of data may hold rights to these derivative works.   Without a clear agreement, this can lead to complex joint ownership issues, particularly if the licensee significantly alters the original data.  The licensing agreement should specify who owns these derivative works to avoid disputes over authorship and copyright.              

Patent Ownership

Additionally, the agreement should address whether the licensee has the right to claim patent rights over AI models trained with the licensor’s data, and whether these models can be used to create patentable inventions.  This is especially important in industries where AI-driven innovation is a key competitive advantage.

5. Specify Authorized Users
Sublicensing

It’s essential to precisely define who is authorized to use the data under the licensing agreement.  This includes addressing issues of exclusivity and sublicensing.  For instance, a licensor may want to limit the licensees’ ability to make the data available to third parties (i.e., a sublicense).  Such restrictions make data access exclusive to the licensee, thus prohibiting third-party contractors from accessing or working with the licensed data.

When sublicensing is authorized, ensure that the licensing agreement explicitly structures how responsibilities are allocated to the licensee and the sublicensee.

Geographical Restrictions

Geographical and territorial restrictions are an additional consideration.  For example, a common data license provision is to restrict storage and processing of the data to Canada, the US, and/or Europe i.e., where the licensor has robust legal protections.

Conclusion

In the age of AI and big data, safeguarding and optimizing your business data through licensing is more important than ever.  By following these best practices, businesses can protect their data, minimize legal risks, and unlock the full value of their data.

If you have any questions or need assistance with drafting or reviewing data licensing agreements, contact one of our experts today.