web/ponderings/robots-aint-hacking/index.html

<!DOCTYPE html>
<html lang="en">

    <head>
        <title>Accused of Hacking | Paul&#x27;s Site of Stuff</title>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
    <meta name="robots" content="noodp"/>

    <link rel="stylesheet" href="https://paulwilde.uk/style.css">
    <link rel="stylesheet" href="https://paulwilde.uk/color/orange.css">

        <link rel="stylesheet" href="https://paulwilde.uk/color/background_blue.css">

    <link rel="stylesheet" href="https://paulwilde.uk/font-hack-subset.css">

    <meta name="description" content="">

    <meta property="og:description" content="">
    <meta property="og:title" content="Accused of Hacking | Paul's Site of Stuff">
    <meta property="og:type" content="article">
    <meta property="og:url" content="https://paulwilde.uk/ponderings/robots-aint-hacking/">

    <meta name="twitter:card" content="summary_large_image">
    <meta name="twitter:description" content="">
    <meta name="twitter:title" content="Accused of Hacking | Paul's Site of Stuff">
    <meta property="twitter:domain" content="paulwilde.uk">
    <meta property="twitter:url" content="https://paulwilde.uk/ponderings/robots-aint-hacking/">


        <link rel="alternate" type="application/atom+xml" title="RSS" href="https://paulwilde.uk/atom.xml">


        <link rel="shortcut icon" type="image/png" href="/favicon.png">

        <script defer data-domain="paulwilde.uk" src="https://plausible.io/js/script.js"></script>

    </head>
    <body class="">
        <div class="container">

            <header class="header">
                <div class="header__inner">
                    <div class="header__logo">

                        <a href="https://paulwilde.uk" style="text-decoration: none;">
                            <div class="logo">

                                Hello, I&#x27;m Paul

                            </div>
                        </a>
                    </div>
                </div>


                <nav class="menu">
            <ul class="menu__inner">
                <li><a href="/">home</a></li>

                <li><a href="/aboutme">about me</a></li>

                <li><a href="/ponderings">ponderings</a></li>

                <li><a href="/iuse">i use …</a></li>

                <li><a href="/tags">tags</a></li>

                <li><a href="/atom.xml">rss</a></li>
            </ul>
        </nav>


            </header>


    <div class="post">

<h1 class="post-title"><a href="https://paulwilde.uk/ponderings/robots-aint-hacking/">Accused of Hacking</a></h1>
<div class="post-meta-inline">

<span class="post-date">
    2025-01-06
    </span>

</div>


<span class="post-tags-inline">
    :: tags:&nbsp;
    <a class="post-tag" href="https://paulwilde.uk/tags/cyber-security/">#cyber security</a>&nbsp;
    <a class="post-tag" href="https://paulwilde.uk/tags/privacy/">#privacy</a>&nbsp;
    <a class="post-tag" href="https://paulwilde.uk/tags/rant/">#rant</a>&nbsp;
    <a class="post-tag" href="https://paulwilde.uk/tags/robots-txt/">#robots.txt</a>&nbsp;
    <a class="post-tag" href="https://paulwilde.uk/tags/tech/">#tech</a></span>


        <div class="post-content">
    <p>About 5 years years ago, during a security sweep of a new "app" a
client had started using, I discovered the hosting company's website
had a robots.txt giving the paths of many pages on their website containing
sensitive information.<br />
The whole thing is starting to snowball, so this is my statement.</p>
<span id="continue-reading"></span><h3 id="the-app">The "App"</h3>
<p>For those that don't know me, I am an IT Professional, providing IT consultancy
services to businesses around my local area of Devon, UK.<br />
I'm generally met with people who are happy to see me and trust my advice.
Sometimes though that is not the case.</p>
<p>My client in question is a medium sized business in Devon. They employ
approximately 70-100 staffmembers and have been in operation for over 60 years.<br />
They are a well known local business.<br />
Due to running a business this size, they naturally have outsourced some
workloads to third-parties, specifically for this case, Health and Safety. They
use the services of a Chartered Health and Safety Consultancy used by many other local businesses.<br />
For whatever reason, this H&amp;S company had decided they could make "apps" for their
clients to assist with entering H&amp;S information amongst other things. Seems like a good idea, sure... but...</p>
<p>...These "apps" are not apps at all, they are in fact simple html index pages linking to various Jotform Forms, Google Sheets, etc. These html index pages are, for the most part, <em>unauthenticated</em>. This means anybody who knows the link to that page can access those forms and spreadsheets.</p>
<p>As an example, if a random passer by on the internet was to go to a URL like <code>https://thecompanyswebsite.com/someothercompany_portal.html</code> they would be shown a list of links
to various Health and Safety, Vehicle checks, holiday request forms, etc. All of which could be filled out without having to prove who they are.</p>
<h3 id="why-is-this-bad">Why is this bad?</h3>
<p>OK, well maybe that doesn't look so bad. I mean, you'd have to know what the path of the page is (the <code>someothercompany_portal.html</code> bit) right? Well, yes, you would but that
should never be a defense, ever. Even if a URL is difficult to guess (it's not), it does not mean it's impossible.</p>
<h3 id="robots-txt">Robots.txt</h3>
<p>Yep, the portal URL may be difficult to guess (they're not), and if it was just that alone things may be OK.<br />
But...<br />
This Health and Safety Company also had to good sense not to let search engines index
those portal pages. We don't want people's sensitive information being indexed by Google do we! Very Good.</p>
<p>Ah...</p>
<p>The robots.txt looks something like this:
<img src="/images/robotstxt.png" alt="A robots.txt file, showing many lines of portal URL addresses" /><br />
There's about 510 lines like that.<br />
(You'll notice I have taken the care to pixelise out the paths. I appreciate
pixalisation can be undone, but due to the fact you'd need to know the base URL
for this to work, and if you had the base URL, you'd have the full robots.txt file
anyway, I feel no further actions on my part need to be taken here.)</p>
<p>For the uninitiated, this robots.txt means that Google, Bing, DuckDuckGo, etc. will never try to index any content on any of those paths. However, this does <em>absolutely nothing</em> to stop
a badly behaved or malicious webcrawler/bot - In fact, this file will act as a lovely index to tell that webcrawler/bot exactly what pages you <em>don't</em> want it to see.</p>
<h3 id="privacy">Privacy</h3>
<p>Once again, on face value this just looks like someone will gain access to fill out forms and be a nuisance to the client business. However, on further investigation I
have discovered that some of these URLs contain plaintext information such as
names, email addresses, and telephone numbers of staff members.
In one particularely negligent case (which I only found today), names of parents and their children, including dates of birth, telephone numbers, email addresses of the parent <em>plus</em> their signatures <em>and</em> names, email addresses, telephones numbers and signatures of that child's sporting coach.</p>
<p>That last case is extreme, but it's there, in plain sight, for anyone who happens
to find the right base URL and path of that particular case.</p>
<p>In short, This. Is. Negligent.</p>
<h3 id="what-i-have-done-so-far">What I have done so far</h3>
<p>As I say, I first noticed this about 5 years ago, and immediately informed my client, and
the Health and Safety company involved about it. I ended up being invited to a meeting with them all, and explained my findings and told them these pages should, at the very least, be behind some kind of authentication. From this they decided they would put in
some authentication - but only as groups, i.e. office@clientem.ail, warehouse@clientem.ail etc. and not per user.<br />
Shocking. But at least it was something.<br />
Although... it isn't something... they only did it for my client, so perhaps they just
wanted to shut me up. I don't know. But other paths remained without authentication.<br />
As time went on, new "apps" for my client appeared (once again without authentication) and so I warned my client about it again, and again, and again.<br />
Over the last week this has reared it's ugly head again. I once again provided warning to my client with reasons why.<br />
Humourously, my client forwarded my email to the Health and Safety company and their response to it went like the below:</p>
<pre data-lang="txt" style="background-color:#212121;color:#eeffff;" class="language-txt "><code class="language-txt" data-lang="txt"><span>We are in the process of modifying our own internal app and it is not normally
</span><span>publicly available, and has now been returned to password protection. To of
</span><span>found the URL for our own internal app would of taken more than bad acting,
</span><span>and probably more aligned with IT guru / junior hacker.
</span></code></pre>
<p>"Not normally publicy available"?! Well, I've seen it publicly available for 5 years.
"To have found the URL for our own interal app ... hacker"!? Erm... no. I've proved already this can be found by anyone just looking around, no hacking required (well, no authentication required, so nothing to hack).</p>
<h3 id="my-actions">My Actions</h3>
<p>OK, they haven't "accused" me of hacking, but it's close.</p>
<p>I have not dignified that email with a response. But, I have reported the Health and Safety company to the <a href="https://ico.org.uk/">ICO</a>, providing links to the robots.txt page
and the direct URL of the page containing parents and child information mentioned above. I'll let them deal with it from here.<br />
However, as the term "hacker" has been used, and I am <em>very</em> aware of <a href="https://arstechnica.com/tech-policy/2021/10/viewing-website-html-code-is-not-illegal-or-hacking-prof-tells-missouri-gov/">court cases that have happened</a> where a good samaritan trying to help protect people, has instead been victimised and made out to be the "malcious hacker"
because people simply do not understand how the internet works.</p>
<p>So, in the fear something like that could happen to me, this article is my statement.
I ask anyone actively involved in System Administration, Cyber Security, Data Protection, etc. to please get involved and share this article.<br />
If you like, you can join in on <a href="https://notnull.space/@paul/statuses/01JGYD9TM2QMDKEP55495DK0GH">this Fediverse/Mastodon thread</a> too.</p>
<p>Thank you for reading. I'm a bad writer and I don't write much, but I really needed to vent this, and put it somewhere for all to see.</p>

</div>


<div class="pagination">
    <div class="pagination__title">
        <span class="pagination__title-h">Thanks for reading! Read other posts?</span>
        <hr />
    </div>
    <div class="pagination__buttons">
        <span class="button previous">
            <a href="https://paulwilde.uk/ponderings/norg/">
                <span class="button__icon">←</span>&nbsp;
                <span class="button__text">Norg Backup Utility</span>
            </a>
        </span>


        <span class="button next">
            <a href="https://paulwilde.uk/ponderings/joining-debian-to-ad-domain/">
                <span class="button__text">Joining Debian Linux (Desktop) to an Active Directory Domain</span>&nbsp;
                <span class="button__icon">→</span>
            </a>
        </span>
        </div>
</div>

    </div>


            <footer class="footer">
                <div class="footer__inner">

                    <a href="https:&#x2F;&#x2F;notnull.space&#x2F;@paul" rel="me">fediverse (gts)</a>

                    <a href="https:&#x2F;&#x2F;snac.notnull.space&#x2F;paul" rel="me">fediverse (snac)</a>

                    <a href="https:&#x2F;&#x2F;codeberg.org&#x2F;pswilde" rel="me">codeberg</a>

                    <a href="https:&#x2F;&#x2F;keyoxide.org&#x2F;85633E30514CC1932E4268460ED12CF710BC42CA" rel="me">keyoxide</a>

                </div>
                <div class="footer__inner">
                    <div class="copyright">
                        <span>©
    2025
 Paul Wilde</span>
                        <span class="copyright-theme">
                            <span class="copyright-theme-sep">:: </span>
                            Theme: <a href="https://github.com/pawroman/zola-theme-terminimal/">Terminimal</a>
                        </span>
                    </div>
                    </div>
            </footer>

        </div>
        </body>

</html>