ChatGPT and privacy – How closely do OpenAI and Microsoft take it?
ChatGPT and privacy – How closely do OpenAI and Microsoft take it?
News

ChatGPT and privacy – How closely do OpenAI and Microsoft take it?

ChatGPT is an AI technology that understands and generates natural language. Learn how you can use ChatGPT in your organization and what OpenAI and Microsoft privacy policies apply.

ChatGPT models (GPT-3.5-turbo, GPT-4) are a family of Artificial Intelligence (AI) models that can understand and generate natural language. ChatGPT was originally developed by OpenAI, a U.S. company involved in artificial intelligence research. Microsoft is a technology company that offers various products and services, including ChatGPT-based solutions for businesses.

 

If you want to use ChatGPT in your business, you need to familiarize yourself with OpenAI and Microsoft’s privacy policies to understand how your data is collected, processed, and protected. These two vendors are currently the only ones in the world that provide direct access to ChatGPT models. In this blog post, we will use the following four questions to compare the key differences and similarities between the two privacy policies and provide you with a guide on how to make the best choice for your needs.

 

Comparison and guide

  • Are the ChatGPT models trained on the data I send?
  • Will the data I send be stored?
  • Is it possible to disable data storage?
  • In which regions are the models hosted?

 

Are the ChatGPT models trained with the data I send?

 

This question addresses concerns about whether enterprise data sent is used to optimize the model, which could potentially result in that data being embedded in the model and potentially viewed by third parties. The privacy policy clearly says “no” in both cases.

 

It is important to emphasize that the Azure OpenAI service is entirely under Microsoft’s control. For many companies already using and trusting the Azure Cloud, this can make all the difference.

 

Corresponding excerpt from the OpenAI API data usage policy: https://openai.com/policies/api-data-usage-policies

“OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.”

 

Corresponding excerpt from Microsoft Azure OpenAI Service’s Data Use Policy:

“Important. Your prompts (inputs) and completions (outputs), your embeddings, and your training data:

  • are NOT available to other customers.
  • are NOT available to OpenAI.
  • are NOT used to improve OpenAI models.
  • are NOT used to improve any Microsoft or 3rd party products or services.
  • are NOT used for automatically improving Azure OpenAI models for your use in your resource (The models are stateless, unless you explicitly fine-tune models with your training data).
  • Your fine-tuned Azure OpenAI models are available exclusively for your use.

 

The Azure OpenAI Service is fully controlled by Microsoft; Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).”

 

Will the data I send be stored?

 

Unlike the first question, this one focuses on whether the data sent to the models is stored in any way and, in the case of a positive answer, who has access to it. The answer is clearly a “yes.” Both companies carry out so-called abuse monitoring. In this process, the data is stored for up to 30 days. If Abuse Monitoring is triggered, selected employees of the companies can view certain texts. OpenAI uses both internal employees and external companies for this purpose. Microsoft, on the other hand, uses only Microsoft employees from the same region.

 

Corresponding excerpt from the OpenAI API data usage policy:

“OpenAI retains API data for 30 days for abuse and misuse monitoring purposes. A limited number of authorized OpenAI employees, as well as specialized third-party contractors that are subject to confidentiality and security obligations, can access this data solely to investigate and verify suspected abuse. OpenAI may still have content classifiers flag when data is suspected to contain platform abuse. Data submitted by the user through the Files endpoint, for instance to fine-tune a model, is retained until the user deletes the file.”

 

Corresponding excerpt from the Microsoft Azure OpenAI Service data usage policy:

“Azure OpenAI abuse monitoring detects and mitigates instances of recurring content and/or behaviors that suggest use of the service in a manner that may violate the code of conduct or other applicable product terms. To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days. (No prompts or completions are stored if the customer is approved for and elects to configure abuse monitoring off, as described below).

 

The data store where prompts and completions are stored is logically separated by customer resource (each request includes the resource ID of the customer’s Azure OpenAI resource). A separate data store is located in each region in which the Azure OpenAI Service is available, and a customer’s prompts and generated content are stored in the Azure region where the customer’s Azure OpenAI service resource is deployed, within the Azure OpenAI service boundary. Human reviewers assessing potential abuse can access prompts and completions data only when that data has been flagged by the abuse monitoring system. The human reviewers are authorized Microsoft employees who access the data via point wise queries using request IDs, Secure Access Workstations (SAWs), and Just-In-Time (JIT) request approval granted by team managers. For Azure OpenAI Service deployed in the European Economic Area, the authorized Microsoft employees are located in the European Economic Area.”

 

Is it possible to disable data storage?

 

This question refers to whether it is possible to avoid storage of and access to particularly sensitive data by third parties. In the case of OpenAI, it is not possible to disable data storage. Microsoft offers the possibility of deactivation, but a separate request must be made for this and Microsoft must approve it.

 

Corresponding excerpt from the OpenAI API data usage guidelines: 

There is nothing about this possibility in the data usage guidelines.

 

Corresponding excerpt from the Microsoft Azure OpenAI Service data usage guidelines

“Some customers may want to use the Azure OpenAI Service for a use case that involves the processing of sensitive, highly confidential, or legally-regulated input data but where the likelihood of harmful outputs and/or misuse is low. These customers may conclude that they do not want or do not have the right to permit Microsoft to process such data for abuse detection, as described above, due to their internal policies or applicable legal regulations. To address these concerns, Microsoft allows customers who meet additional Limited Access eligibility criteria and attest to specific use cases to apply to modify the Azure OpenAI content management features by completing this form.

 

If Microsoft approves a customer’s request to modify abuse monitoring, then Microsoft does not store any prompts and completions associated with the approved Azure subscription for which abuse monitoring is configured off. In this case, because no prompts and completions are stored at rest in the Service Results Store, the human review process is not possible and is not performed. See Abuse monitoring for more information.”

 

In which regions are the models hosted?

 

Given particularly sensitive data, legal considerations may dictate the need for this data to be stored exclusively in certain geographic regions.

 

OpenAI API:

Models are hosted in the United States.

 

Microsoft Azure OpenAI Service:

Models are hosted in either the U.S. or Europe.

 

Conclusion

Microsoft is characterized by particularly strict data protection guidelines and enjoys a high level of trust among many companies. Especially if the Abuse Monitoring has been adapted as described before, the data sent to the ChatGPT models is not stored and is not accessible to anyone. However, a major disadvantage of Microsoft Azure for practical use is that the wait time for access to GPT-4 can be quite long, depending on the size of the enterprise and its relationship with Microsoft. As long as access to GPT-4 has not been established, only the cheaper but lower quality GPT-3.5-turbo can be used.

 

OpenAI offers faster access to GPT-4, but the data protection regulations are less strict here. For non-sensitive data, such as advertising copy, OpenAI is an excellent alternative. For enterprise-wide use, however, it requires an introduction to users to make them aware of the specifics of the data protection situation.

 

The unique selling point of this technology (ChatGPT models) is that it can be used throughout the company on a completely individual basis without any prior technical knowledge and then quickly provides support in everyday life, for example in the creation of marketing texts, user stories or programming. We recommend starting this integration process as early as possible, because the earlier the process is started, the faster the potential of the technology can be used and next development steps of the technology can be integrated.

Our expert

for all topics around Artificial Intelligence

Holger Schlaps, Research and Development
Holger Schlaps
Research and Development