Sitemaps structure for large App Engine site

286 views Asked by At

I'm thinking on the best way to structure a large App Engine site (+1M urls).

I need a sitemaps.xml file in the root path of the domain file that links to sitemap[n].xml files.

The sitemaps.xml file can link up to 1000 sitemap[n].xml files and each of these sitemap[n].xml files has up to 50K urls.

Is there a way to dynamically generate the files with the 50K urls?

Any other way to do it without fetching 50K entities at a time?

Thanks!

PS: The files cannot be static because they have to be placed in the root path of the domain :(

2

There are 2 answers

2
Amir On

You're best bet is to generate them ahead of time. Maybe run a map-reduce over your data and store each sitemap[n].xml in a a blob in a separate datastore entity. Then the handler (which is mapped from - url: /sitemap(.*) ) simply returns the blob from the corresponding entity.

All of this really depends on how your urls are stored and/or generated.

You could also generate all the urls offline and put them in one huge file. Upload that file it to the blobstore along with a file that has the offsets for each group of 50k urls in that file. In the handler, simply take the corresponding group of 50k urls from the blobstore.

Also realize that it's probably not that useful (with respect to SEO) to have such huge sitemaps.

0
mcotton On

Why can't you add an entry in your app.yaml to redirect where the files go. Robots.txt should be in the root level but I keep it in /img

- url: /robots.txt  
  static_files: img/robots.txt
  upload: img/robots.txt

It is the exact same to any crawler.