Responsible use of AI

Should you be afraid to work with AI? Absolutely not. But it is important to be aware of the risks so that you can make an informed decision for yourself.

What is the risk of entering confidential information into ChatGPT or other LLMs? In short: if you enter sensitive data into such a system, you run the risk of losing control over it.

“ Treat a public LLM as if you were in a bar: everything you say may be heard, remembered, or repeated by others. ”
Bart Heesink

Training a Large Language Model (LLM)

A Large Language Model (LLM) such as ChatGPT or Mistral works based on patterns it has learned from enormous amounts of text. The model itself does not “remember” individual conversations like a database does, but it can temporarily store data in the memory of a session to keep your current conversation flowing logically.

There are roughly two important phases in the training of AI models:

Initial training: the model is pre-trained on gigantic datasets (internet, books, articles, code, etc.).
Instruction or fine-tuning: additional training of the model with human input/feedback or specific datasets.

During training, text is not stored literally as in a Word document. The model converts the patterns in that text into mathematical values (weights) that calculate the probability that certain words or phrases belong together.

If you enter “Leukeleu BV develops application for Company X,” this is not stored one-to-one as text. It is converted into numbers/tokens that “learn” together by looking at the probability of which other words follow when they encounter these words.

This allows the model to create similar sentences without remembering your exact input. However, if you enter something unique and striking, the pattern may reappear in the output.

Where is the entered data stored?

Where the entered data is stored depends on the provider and the settings. With public AI services, your input may be temporarily stored on the provider's servers (often in the US or globally distributed data centers). This data may sometimes be used for model improvement (if permitted by their privacy policy). Business or enterprise versions (such as ChatGPT Enterprise, Azure OpenAI, Mistral Private) promise that this input will not be used for training and that the data will only be stored in that session or within an agreed retention period.

Please note: This is the theory... But as is often the case with these claims and promises, they are often violated in practice.

Is my data used to train the model?

With most free or standard AI services, your data is used to train the model unless you disable it or the policy states otherwise. With enterprise or API products, the input is usually not used for training unless you explicitly consent to it.

If data is used for training, it can theoretically reappear in other outputs. Not literally as “copy-paste,” but in summarized or differently formulated forms. This risk is even greater with unique or distinctive data, such as secret product names, legal documents, or internal strategies.

What are the practical risks?

For companies

Loss of IP: unique ideas or code can “leak” to an LLM
Data theft or leak: sensitive business information can end up on servers beyond your control
Compliance issues: possible violation of GDPR, contracts, or NDAs

For individuals

Privacy violation: personal data may end up in datasets
Identity fraud: sensitive PII may be misused
Unintentional sharing: private conversations or confidential documents may become accessible externally.

Basic rules for responsible use of AI

Do not enter confidential or identifiable data into public LLMs
Anonymize data before sharing it
Check the provider's privacy policy
Use enterprise versions with clear contracts regarding data storage and processing
Turn off “model improvement” if possible