typo3 site crawler not indexing

3k views Asked by At

I am trying to get a page indexed with indexed search and sitecrawler on a old TYPO3 4.5 website - but I have tried almost anything to no avail.

I am running the site crawler and it will get a full list of URLs that can be crawled and I am running throught the entire queue

enter image description here

I have setup a "Index Configuration"

enter image description here

and a site crawler

enter image description here

But it will not index

enter image description here

the "cache_pages" table also seems to be empty - but caching is enable for all pages.

What could I be missing?

2

There are 2 answers

2
Tymoteusz Motylewski On

The screenshot showing crawler queue looks good. It seems the crawler is configured correctly, but indexed search is not indexing the pages. Empty page cache suggests you have cache disabled shomehow.

Indexed search is indexing pages when few conditions are met:

  1. page is cacheable (no page.config.no_cache = 1 in TypoScript, cache is not disabled in page properties, and cache is not disabled from PHP code)
  2. there are special markers in the source code <!--TYPO3SEARCH_begin--> and <!-- TYPO3SEARCH_end-->
  3. TypoScript page.config.index_enable = 1 is set
  4. The page is accessed user not logged to Backend or by Crawler

What you can check is:

  • Please verify in TypoScript Object Browser, whether the index_enable and no_cache values have correct values for not indexed pages.
  • Enable debug mode in Extension Manager for Crawler and Indexed search
  • Click on the number in the "queue id" column for some not indexed page and check the data shown there.
  • Double check if "session id" field is empty in indexed search configuration record before you start indexing
  • Remove date from "Next indexing date" field from indexed search configuration record

Few useful links:

0
gringo On

If none of Tymoteusz suggestions work, check if you are running your website in https mode, with a selfsigned certificate (for ex if you are developing on your local machine). If it is the case, just run your website without https and retest the crawler. I recently tested a TYPO3 6.2 website with crawler and indexed_search properly configured and the pages were never indexed. The error I received in the crawlerlog was an empty Error message and in the database the tx_crawler_queue table showed the value b:0; in the column "result_data". Once I passed to http everything worked just fine.

In production it should work just fine using https as long as you're using a valid certificate.