urlscan.io is a free and paid tool that is used to scan and analyse URLs. The tool is often used by Security Analysts and employees working in a SOC. It is also available as an integration add-on in several popular security toolings such as Splunk SOAR and Cortex XSOAR. This post will be focusing on the Search functionality in urlscan.io and how it can be abused to extract sensitive content due to tooling misconfigurations or accidental information leakage.
First, a small explanation on how search works on urlscan.io. On the home page, the default scan option is a ‘Public Scan’. What this means is that any URL submitted on the platform will be searchable by anyone on the Internet. Just like Google Dorking, the search functionality has filters which can be utilised to drill down to specific type of results. Fairly simple so far right, who’s gonna put sensitive link on a public scan? Well, that’s where the issue comes in, in specific, two use cases to consider.
Use case one: Manual submissions. Consider the screenshot below, there are two main properties to note, submitted URL and effective URL. A user has submitted a short link from their Vodafone Rewards site (submitted URL), which has then redirected to a unique URL (effective URL) containing a free movie ticket. This information is now available on the Internet for anyone to use, unless the user gets this removed from urlscan.io (unlikely that the user is aware of the scan being public in first place).
The result above (ensured expired at the time of publish) was discovered using the following search filter:
page.domain:"nz" AND page.url:"voucher"
Additionally, consider this scenario: A SOC Analyst is in a hurry and trying to analyse a URL, which was present in a potential phish to an executive. Unknown to the analyst, the link is legitimate and redirects to a unique site with sensitive information on it. The analyst then Googles for a URL scanning tool and lands on urlscan.io home page, which has an inviting wide search bar and a big green button. They then quickly enters the URL in the search bar and press enter, which has now resulted in an unintended information leakage due to the analyst not paying close attention. While the analyst is at fault here, is not helped by a lack of warning or confirmation before the public scans are submitted.
Here is another example where a password reset link has been made available for anyone to use, using an adapted search filter from above: page.domain:“nz” AND page.url:“reset”.
This technique can be adapted to search for password reset links, PDF invoices with personal details and endless other use cases that rely on unique URLs. Sensitive public submissions have been observed from various large corporations and government agencies, so not something that is limited to small shops. Sometimes it’s the end user submitting these links themselves, as opposed to a Security Analyst or someone reviewing a phish.
You can also further expand this technique with file hashes. For example, a phishing kit might be using the same CSS file across its deployment. You can get SHA256 hash of the CSS file and search urlscan.io for that hash in order to see submissions with that phishing kit. The same technique can be used against off the shelf software deployments, i.e. if WordPress password reset pages were using the same core CSS file, we can use the CSS file hash to find submissions that have exposed their password reset links.
Use case two: Misconfigured tooling, either due to lack of knowledge or budget. Using Cortex XSOAR urlscan.io integration as an example, the configuration field for scan type selected by default is ‘Public’ and unless an engineer is specifically changing that (or an analyst running the command with private parameter), the results are going to be searchable using the methods above. Note that this behavior may have changed with the updated version or no longer applicable with deprecated commands. Additionally, some smaller shops do not have the budget for Pro version of urlscan.io and default to free tier public for operational needs, in order to not hit the free tier limits imposed by other scan types.
This post is intended for educational purposes and the techniques mentioned here can be utilised by analysts or TI team members to search for information leakage against their own organisation domains.