Microsoft AI researchers inadvertently disclosed 38TB of data.

An unfortunate event leading to Microsoft AI researchers inadvertently disclosing 38TB worth of data, has been uncovered by a popular cybersecurity organization Wiz.

The incident which occurred as a result of misconfigured SAS token, and was reported on the 18th of September 2023, shows that the Microsoft’s AI GitHub repository exposed over 30,000 internal Microsoft Teams messages.


                       Over 30,000 internal Microsoft Teams messages. Image-source: Wiz


Wiz research team reported that while that Microsoft AI team published a bucket of open-source training data on GitHub, it unintentionally exposed 38-TB worth of private data; in addition, also were two disk backups of the organizations employee’s workstations.

Two disk backups of the organizations employee’s workstations, were exposed. Image-source: Wiz

Sensitive information’s such as secrets, passwords, private-keys, and internal Microsoft Teams conversation to the tune of 30,000 were leaked.  The Microsoft AI team researchers shared these files using an Azure feature known as SAS-tokens, which allows users to share data from Azure storage accounts.  

This feature, can be configured to only allow specific files to be accessed. A mistake occurred when the Microsoft AI research team configured the access token to share the entire storage account, which also included 38TB worth of private files.

In a statement from Wiz-blog:

“As part of the Wiz Research Team’s ongoing work on accidental exposure of cloud-hosted data, the team scanned the internet for misconfigured storage containers. In this process we found a GitHub repository under the Microsoft Organization named robust-models-transfer.”

In the repository that was provided by the Microsoft AI research team; users were instructed to download the repository from Azure (EntraID), unknown to the Microsoft AI team, that access was granted to not just the open-source models, but the entire storage and sensitive data as well.


                         Access was granted to not just the open-source models, but the entire storage and sensitive data as well.


Furthermore, Wiz stated:

 “Our scans shows that this account contained 38TB of additional data – including Microsoft employees’ personal computer backups. The backups contained sensitive personal data, including passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.”

In addition to the permission and access scope, it was discovered that the token was misconfigured to allow “full control.” This would enable an attacker to perform CRUD functions (create, read, update, and delete) on the files.

Initial intention of the repository provided, was to provide AI models, for the use of training codes. The SAS (Shared-Access-Signature) link was given to users to download a model data, and feed it into a script. The file format which is ckpt, is said to be formatted using python’s pickle formatter, and prone to arbitrary code execution (enabling a threat actor to inject arbitrary code into the models). This certainly would have affected every user who trust Microsoft’s GitHub repository.

More emphasis by Wiz:

“However, it’s important to note this storage account wasn’t directly exposed to the public; in fact, it was a private storage account.”

This unintended disclosure must have been as a result of a few SAS bad security practises such as permissions (where the token configuration grants high-level access to a storage account), Hygiene (situation like this are as a result of SAS token possessing a long or infinite-lifetime expiry, in this case Microsoft set its SAAS token expiry to 2051).

Wiz has suggested a few SAS (shared-access-signature) security recommendations such as:

  • SAS Token Management:

    SAS token, should be considered as sensitive as the account key itself, and hence the use of SAS account for external sharing should be avoided. If external sharing is to be considered, then using SAS with Stored Access Policy, should be practised; for sharing in a time limited manner, a User Delegation SAS should be used; if the use of SAS token, is to be avoided completely, then disabling SAS access, and blocking access to list Storage account keys should be practised.

  • SAS Token Monitoring:

    configuration such as Storage Analytics logs, should be enabled to track the usage of SAS Token. This will keep track of SAS token access, including the signing key, and assigned permissions, although this comes with additional cost for accounts that perform extensive activities. The use of Azure Metrics, can be used to monitor SAS tokens usage in storage accounts.

  • SAS Token Secret Scanning:

    Secret scanning tools should be used, to detect leaked or tokens with excessive privileges in public disclosed assets, such as mobile apps, websites, and GitHub repos.

Wiz also suggested that their customers can use its secret scanning tools, which has the power to detect, and explore permissions on SAS tokens in external and internal assets. In addition, they can also use Wiz CSPM to track storage accounts with SAS support.


Although Microsoft has gone ahead to disable the AI Model repo, here is the timeline for the disclosure.

  • SAS token first committed to GitHub; expiry set to Oct. 5, 2021 – July 20th, 2020.
  • SAS token updated by Microsoft to expires on Oct 6, 2051 – Oct 6th, 2021.
  • Wiz Research discovers and notifies MSRC of the issue – Jun 22nd, 2023.
  • Microsoft invalidates SAS token – – Jun 24th, 2023.
  • SAS token updated on GitHub – Jul 7th, 2023.
  • Microsoft concludes internal probe and impact – Aug 16th, 2023.
  • Public disclosure of incident – Sep 18th, 2023.

Summarizing the Disclosure:

The primary cause of the exposure, as evident, stems from the utilization of SAS tokens as a means of sharing data. This approach lacks adequate monitoring and governance, resulting in the inadvertent disclosure of data exceeding 38 terabytes. The security implications of this situation strongly suggest the need to impose limitations on the use of SAS tokens.

Furthermore, the absence of a centralized tracking system by Microsoft within Azure presents challenges in identifying compromised tokens. Consequently, the practice of configuring SAS token lifetimes should be ceased, as it renders the use of SAS tokens for external file sharing unsafe and should be discontinued. For more information, please refer to the linked source. SAS token disclosure by Wiz here, and Microsoft’s documentation here.




Put your comments below in the comment section on your thoughts about this.

Find this article and information helpful? Show some love and support  “Click-Here”
5 1 vote
Article Rating
Notify of
Inline Feedbacks
View all comments