By using seoforum’s services you agree to our Cookies Use and Data Transfer outside the EU.
We and our partners operate globally and use cookies, including for analytics, personalisation, ads and Newsletters.

  • Join the best UK dedicated SEO Forum

    Provide or get advice on everything SEO, ask questions, gain confirmation or just become apart of a friendly, like minded community who love SEO and Online Marketing.


    Join 50,000 members!

soft 404 issue

T

TuranMirza

New Member
I have a custom 404 page as I felt that was the right thing to do. Then a few days ago I realised it should maybe be disallow'ed in the robots.txt file so they would not crawl the page.

Now, I just got a google email saying:

--
Page indexing issues detected on feel-good.today
To the owner of feel-good.today:
Search Console has identified that your site is affected by 1 Page indexing issue(s):
Top critical issues
Critical issues prevent your page or feature from appearing in Search results. The following critical issues were found on your site:
Submitted URL seems to be a Soft 404
We recommend that you fix these issues when possible to enable the best experience and coverage in Google Search.
--

the 404 page has been there for some time, so it seems me asking for it to be disallowed by the robots has in fact forced the page to be crawled. The simple answer is to take that line out of the robots.txt file but just wondering why this is happening.

Can anyone help clarify?

I've also asked for the thanks.html page to be disallowed as it is just the page that someone will see when they have submitted a webform i.e. "Thanks for submitting the form, I'll be in touch soon" msg. This page has not flagged an error (YET!).


Their help says "If the rendered page is blank, nearly blank, or the content has an error message, it could be that your page references many resources that can't be loaded (images, scripts, and other non-textual elements), which can be interpreted as a soft 404. Reasons that resources can't be loaded include blocked resources (blocked by robots.txt), having too many resources on a page, various server errors, or slow loading or very large resources."

I read that as "We found the page, but can't load it, maybe because it has been disallowed but the robots file"
Which is exactly right it has been blocked by the robots.txt file! This page is only pointed to by my control panel in my web hosting, i.e. no page links to it SO my understanding was no search engine would crawl to it!! If, google looks at all HTM / HTML files in a directory it might actually find it, even if it's not linked form any other page - hence I disallowed any bot from looking at that file.

Anyone?
 

Latest Products

  • [Rivmedia] Lazy Loader XF2
    [Rivmedia] Lazy Loader XF2
    Load images asynchronously on your forum, allowing images to be loaded only when they are in view
    • Rivmedia
    • Updated:
  • [Rivmedia] Guest Redirect & Profile unlink
    [Rivmedia] Guest Redirect & Profile unlink
    Forums which prevent member profile access for guests, redirction and unlinking for profiles
    • Rivmedia
    • Updated:
  • [Rivmedia] Simple Redirects
    [Rivmedia] Simple Redirects
    Simple redirects allows forum admins to make simple 301 or 302 redirects via their admin panel
    • Rivmedia
    • Updated:
  • [Rivmedia] Minimum Message Length
    [Rivmedia] Minimum Message Length
    Eradicate short, pointless posts with a minimum message length and improving content worth on a thre
    • Rivmedia
    • Updated:
Top