In our latest research, we at Team Nautilus found that tens of thousands of user tokens are exposed via the Travis CI API, which allows anyone to access historical clear-text logs. More than 770 million logs of free tier users are available, from which you can easily extract tokens, secrets, and other credentials associated with popular cloud service providers such as GitHub, AWS, and Docker Hub. Attackers can use this sensitive data to launch massive cyberattacks and to move laterally in the cloud.
We disclosed our findings to Travis CI, which responded that this issue is “by design”, so all the secrets are currently available. All Travis CI free-tier users are potentially exposed, so we recommend rotating your keys immediately.
We also reported our findings to the respective service providers. Almost all of them were alarmed and quickly responded. Some initiated a wide key rotation, while others verified that at least 50% of our findings were still valid. A few of the vendors even offered us a bounty reward for disclosing the findings.
In this blog, we present our research findings, which shed light on lateral movement flows in the cloud.
Travis CI API: Past and current research
This issue was reported to Travis CI in the past and was published in the media in 2015 and 2019, but it has never been fully fixed. In 2015, Travis CI published a notification on an incident report stating:
“Currently undergoing a distributed attack on our public API that we believe is aimed at revealing GitHub authentication tokens. Countermeasures are holding, and we will update accordingly.”
When Travis CI resolved this issue, they published:
“Recently introduced adaptive rate limiting and blocking has been performing correctly.”
In 2019, a team of researchers (Justin Gardner, Corben Leo, and EdOverflow), wrote a blog about Travis CI API and how it can be exploited to extract tokens, secrets, and credentials. Their research focused on bug bounty and how hackers can use secrets in Travis CI logs.
In our research, we replicated some of their steps but also added a few other steps and in-depth analysis of the raw data. Our goal was to understand whether data on CI platforms can be used to launch a cyberattack and if we can observe a plausible lateral movement in the cloud.
Continuous integration, continuous deployment, and access tokens
These days, continuous integration (CI) and continuous delivery/deployment (CD) are a major part of modern development and cloud native application pipelines. For every change to an application, the code is regularly built, tested, and merged to a shared repository. Therefore, these environments usually store many secrets such as access tokens to automatically reach other parts in the cloud or pipeline. In some cases, these access tokens are set with high privileges to read, write, administrate, and more. If compromised, such access tokens can lead to data leaks, account takeovers, or even lateral movement across several cloud accounts.
Travis CI research: Main steps
Now that we know what CI/CD and access token are, let’s dive into the details of our research.
Our goal was to analyze CI pipelines to better understand the level of risk that could arise from using CI services. We focused on Travis CI, building on the existing research about how its API might render historic logs—up to seven years back—in clear text to an unauthenticated anonymous user.
Fetching the logs
We wanted to examine potential attack vectors in the cloud. In our research, we used the following API:
https://api.travis-ci.org/v3/job/[4280000-774807924]/log.txt
For instance,
https://api.travis-ci.org/v3/job/5248126/log.txt
Based on the Travis CI API manual, we discovered that a valid API call to fetch a clear-text log will require a log number. In this case, we can easily apply an enumeration script to fetch all the available logs between zero and infinity. This is not easy with other vendors because they require mentioning in the URL an application ID or customer ID (or both), which makes it difficult to run enumeration over the logs.
During our research, we examined various API calls to fetch logs. We used the one from the research mentioned above. However, we also found another API call, via documented API, in https://api.travis-ci.org/logs/XXX, which allowed us to access new logs that weren’t accessible before (maybe deleted historic logs)
The “XXX” marks a specific log number that ranges between one and hundreds of millions. This API call is designed to fetch archived logs by pointing to a Travis CI S3 bucket while generating a one-time token to that bucket. After running some API calls, we speculate that this method allows an unauthenticated anonymous user to fetch all the logs.
We found that when sending this API call, you get access to a sporadic log:
https://api.travis-ci.org/logs/1
which redirects to:
https://s3.amazonaws.com/archive.travis-ci.org/jobs/4670478/log.txt?X-Amz-Expires=30&X-Amz-Date=202206...
This same log can be fetched from this API with the corresponding log number:
https://api.travis-ci.org/v3/job/4670478/log.txt
The interesting finding is that we can fetch the logs that were previously unavailable via the API https://api.travis-ci.org/v3/job/<log_number>/log. Now, we can access and analyze these logs as well and find secrets in them.
Example:
https://api.travis-ci.org/logs/6976822
which redirects to:
https://s3.amazonaws.com/archive.travis-ci.org/jobs/13575703/log.txt?X-Amz-Expires=30&X-Amz-Date=202206…
This same log can’t be fetched via this API with the corresponding log number
https://api.travis-ci.org/v3/job/13575703/log.txt
We ran some experiments and found that the oldest logs were from January 2013 and the latest from May 2022 and that the valid range of logs is between about 4,280,000 and 774,807,924. This means that potentially about 770 million logs are exposed.
Extracting credentials from the logs
As the next step, we made two random samples. First, we randomly sampled 20,000 logs and found that while not all logs in this range are available, some return in clear text, and a few reveal access tokens and credentials. Then, we made another sample of about 8 million requests (about 1%) and after we cleaned the data we ended up with about 73,000 tokens, secrets, and various credentials in these log files. These secrets were associated with various cloud services, including GitHub, AWS, and Docker Hub. Below are some examples:
Travis CI slowed down the velocity of API calls, which hinders the ability to query the API. In this case however, this was not enough. A skilled threat actor can find a workaround to bypass this. In addition, some of the data in the historic logs is masked; you can occasionally see a string “[secure]” where once there was a clear-text secret.
There’s no doubt that Travis CI makes an effort to obfuscate secrets and tokens that are displayed in the logs. Indeed, during our research most of the tokens in the logs were censored. Furthermore, Travis CI provides recommendations on how to avoid leaking secrets such as deleting build logs manually and rotating secrets periodically.
However, combining the ease of accessing the logs via the API, incomplete censoring, accessing “restricted” logs, and a weak process for rate limiting and blocking access to the API, coupled with a large number of potentially exposed logs, results in a critical situation.
Nevertheless, there are many conventions to print secrets, passwords, and tokens in logs, and most of them remained in clear text. For instance, we found that in many cases “github_token” was masked and didn’t disclose any secrets. However, we found about 20 variations of this token that weren’t masked by Travis CI:
Using Aqua Trivy to scan logs for secrets
Even if developers follow coding best practices, they can sometimes unknowingly publish hard-coded secrets such as passwords or use third-party code and container images that might introduce them. With Trivy Open Source, you can easily scan targets for hard-coded secrets. Trivy scans any container image, filesystem, or Git repository for exposed passwords, API keys, or tokens. We used Trivy to scan the logs we obtained.
Our data shows that in 42% of the cases, we can get a valid log following a random API call, meaning that for every 100 API calls, we can end up with 42 valid logs. Out of these logs, we can extract secrets.
After sorting out these secrets, we found a lot of access tokens to various environments. Here are a few examples:
- Access tokens to GitHub that may allow privileged access to code repositories
- AWS access keys
- Sets of credentials, typically an email or username and password, that allow access to databases such as MySQL and PostgreSQL
- Docker Hub passwords, which may lead to account takeover if multi-factor authentication isn’t activated
Attacks in the cloud: Several use cases
We started this research by asking ourselves, “How can we detect lateral movement in the cloud?” We analyzed the CI/CD environments, so how can we now find secrets in them? We were surprised to see previous publications about clear-text logs via Travis CI API and amazed to find out that this is still possible after the issue was disclosed to Travis CI.
With the tokens obtained during our research, we simulated some attack scenarios for lateral movement in the cloud. Below are some key scenarios.
Use case: Lateral movement in the cloud
We found thousands of GitHub OAuth tokens. It’s safe to assume that at least 10% to 20% of them are live, especially those that were found in recent logs. We simulated in our cloud lab a lateral movement scenario, which is based on the following initial access scenario:
- Extraction of a GitHub OAuth token via exposed Travis CI logs
- Discovery of sensitive data (i.e., AWS access keys) in private code repositories using the exposed token
- Lateral movement attempts with the AWS access keys in the AWS S3 bucket service
- Cloud storage object discovery via bucket enumeration
- Data exfiltration from the target’s S3 to attacker’s S3
Use case: Discovery and intelligence collection
Logs may often contain restricted sensitive information, such as names of code packages and their dependencies. An attacker can collect the names of code packages and look for missing or problematic code packages or their dependencies. Then, an attacker can claim these code packages and plant them in the package manager, causing users to download a malicious code package.
In some of Travis CI logs, we found IP ranges that could help an attacker to discover further assets. In addition, we’ve seen internal Domain Name System (DNS) indications, which can also be exploited by a skilled attacker to expand the attack surface.
Furthermore, we also found some URLs to S3 buckets with access tokens. These may lead to data leaks from those specific buckets if network rules don’t restrict inbound traffic to the buckets.
Use case: Software supply chain attack with code repositories and registries
We found dozens of pairs of credentials to a specific container image registry. Some of them may belong to a large organization or a popular open-source project. On their own, these credentials looked aligned with security best practices (long password, uppercase characters, special characters, and digits). For an average threat actor, it would be challenging to guess these passwords. But if these credentials had fallen into the wrong hands, attackers would be able to get privileged access to public and private registries.
One possible scenario is to steal proprietary data from private registries. A more sinister goal could be to poison these container images with malicious code such as backdoors or malware, to allow access to the environments where they’re deployed. In case of a popular open source project, this could lead to a larger attack aimed at infecting the broader community.
Use case: Source code theft
On April 15, 2022, GitHub issued a severe warning, stating that an attacker was able to gain access to some repositories with the stolen OAuth tokens issued to Travis CI and Heroku. In most cases, they found that the attacker only listed all the user’s organizations. Then, the attacker selectively chose targets and listed the private repositories for user accounts of interest. Finally, the attacker proceeded to clone some of those private repositories.
Mapping the attacks to the MITRE ATT&CK framework
Here we map the components in the attacks described above to the corresponding techniques of the MITRE ATT&CK framework:
Mitigation
There are a few recommendations you can follow to mitigate these risks and protect your CI environments:
- Establish a rotation policy for keys, tokens, and other secrets.
- Apply the least-privilege principle to keys and tokens when applicable.
- Don’t print secrets, tokens, or credentials in logs.
- Regularly scan your artifacts for secrets.
- Use a cloud security posture management (CSPM) solution that indicates optimal time to rotate keys. You can scan your account, check the rotation cadence, and see if you applied the least-privilege policy.
- Scan your CI/CD environment with a supply chain security solution such as Argon to find exposed secrets, tokens, and credentials and make sure that your account configuration is aligned with best practices.