Find Sitemap is a tool that helps you easily locate sitemaps on any website. It provides a quick and easy way to find the sitemap of a website, even if it is hidden deep within the website's directory structure. It can also detect multiple sitemaps, allowing you to view and analyze all the pages that are included in the site's sitemap.
>>> from Find_Sitemap import FindSitemap
>>> main = FindSitemap('google.com')
>>> main.crawl()
...
...
check 13801/13804: https://google.com/sitemap.xml
check 13802/13804: https://google.com/feed.xml
check 13803/13804: https://google.com/sitemap_index.xml
check 13804/13804: https://google.com/sitemapindex.xml
--------------------
Find sitemap urls len: 1
Find sitemap urls list: ['https://www.google.com/sitemap.xml']
🚀  Try now in Colab
Installing Requests on PyPI:
$ pip install Find-Sitemap
-
Show the subdomains, slugs_L1, slugs_L2, filetypes parameters.
>>> from Find_Sitemap import FindSitemap >>> main = FindSitemap('google.com') >>> main.subdomains {'www.'} >>> main.slugs_L1 {'/default', '/sitemap', '/feeds', '/api', '/contents' ...} >>> main.slugs_L2 {'/sitemap', '/stock', '/sitemap1', '/sitemap0', ...} >>> main.filetypes {'txt', 'xml', 'xml.gz', 'jsp', 'html', ...}
-
Add the subdomains, slugs_L1, slugs_L2, filetypes parameters.
>>> from Find_Sitemap import FindSitemap >>> main = FindSitemap('google.com') >>> main.subdomains.add("shop.") >>> main.slugs_L1.add("/node") >>> main.slugs_L2.add("/site") >>> main.filetypes.add("xml")
-
Remove the subdomains, slugs_L1, slugs_L2, filetypes parameters.
>>> from Find_Sitemap import FindSitemap >>> main = FindSitemap('google.com') >>> main.subdomains.remove("shop.") >>> main.slugs_L1.remove("/node") >>> main.slugs_L2.remove("/site") >>> main.filetypes.remove("xml")
-
Run the crawler.
>>> from Find_Sitemap import FindSitemap >>> main = FindSitemap('google.com') >>> main.crawl() ... ... check 13801/13804: https://google.com/sitemap.xml check 13802/13804: https://google.com/feed.xml check 13803/13804: https://google.com/sitemap_index.xml check 13804/13804: https://google.com/sitemapindex.xml -------------------- Find sitemap urls len: 1 Find sitemap urls list: ['https://www.google.com/sitemap.xml']
- See Contributing
- Author: [email protected]
- Website: How to find the Sitemap of any website?