Crawling websites and dynamic urls

Crawling websites and dynamic urls - php

Do search engine robots crawl my dynamically generated URLs? With this I mean html pages generated by php based upon GET variables in the url. The links would look like this:
http://www.mywebsite.com/view.php?name=something
http://www.mywebsite.com/view.php?name=somethingelse
http://www.mywebsite.com/view.php?name=something
I have tried crawling my website with a test crawler found here: http://robhammond.co/tools/seo-crawler but it only visits my view page once, with just one variable in the header.
Most of the content on my website is generated by these GET variables from the database so I would really like the search engines to crawl those pages.

Some search engines do, and some don't. Google for one does include dynamically generated pages: https://support.google.com/webmasters/answer/35769?hl=en
Be sure to check your robots.txt file to ensure files you do not want the crawlers to see are blocked, and that files you do want indexed are not blocked.
Also, ensure that all pages you want indexed are linked via other pages, that you have a sitemap, or submit individual URLs to the search engine(s) you want to index your site.

Yes, search engines will crawl those pages, assuming they can find them. Best thing to do is to simply create links to those pages on your website, particularly accessible, or at least traversable from the home page.

Related

SEO for dynamic webpages

I have created an advertising website in php mysql. I have nearly 200 files for each location. This 200 files will be for example : for selling cars, bikes etc. In all the title, head, keywords I used a variable x which is the location. Then I used a scripting language to open each of the 200 files, replace x with location name, save it in different names. For ex: location1_websitename_cars.php. There are more than million locations. I created 200× million files like these. But I cannot host my website economically due to file number limitation on shared hosting servers.
My intention for replicating 200 files for each location was that google search engine can find my pages when user searches the location name as keyword. As per my understanding google crawls through the existing pages in server and find the location name as keyword and this results in inclusion of webpage in search results. Since this approach wont work with shared hosting, I changed strategy.
I am able to generate files required for a location dynamically according to the user selected location from the home page of my website. In this case I just need to store 200 files in my server. All pages would be accessible from home page of website. But I don’t know whether that pages would accessible from google search. For ex: if user types : "location1 www.mywebsite.com cars ", that php page wont be displayed as this page don’t exist in server. It is to be dynamically created.
To simply put: " Is there a way of including my website pages in google search results if that page don’t exist in server. It would be dynamically created once user selects some input and submit it from the home page.

Search engines won't have any problem following your dynamically created pages, but you will need to first create links to those pages from another page that the search engines do know about (eg: your home page). Once you link to your dynamic pages, they can be indexed.
The more pages and sites (especially high values sites) that link to your pages, the higher up in the search results your pages will be (of course there are other factors that affect this as well). Also if you want to test any of this without wading through pages upon pages of search results, google: "site:www.yourwebsite.com yoursearchterms"

Google use URL as identifier for pages not the files on the server.
To detect URLs, Google use robots following links on the web (<a>, <link>, etc.).
If you want your page to get found and indexed by Google, do not worry about your files on server but on your URLs and internal linking. You need to create a navigation to all the possible pages to let robots access it.
NB: URLs with parameters work but it is preferable to rewrite them.

How would a completely dynamic (every page dynamically generated) web site remain search engine-friendly?

Most completely dynamic web sites allow nearly every page to be found, crawled and indexed by search engines. How would this be properly implemented to allow a completely dynamic web site to be search engine-friendly? Note that there is no directory structure, users can type in complex URLs (www.example.com/news/recent) but the folder structure doesn't actually exist, it is all handled by htaccess, which submits the url entered to the main web application for page generation.

Search engines access websites nearly the same way as a visitor. If the search engine web crawler gets to www.example.com/news/recent, it will index the results which will then be search-able.
Most websites have static links to point to content, so the top news article might be on www.example.com/news/recent, but it could also be on www.example.com/news/9234. That gives search engines somewhere permanent to link to. The search engine doesn't care if www.example.com/news/9234 really loads www.example.com/pages/newsitems.php?item=9234, that's all hidden.
Another handy way is through site maps, which provide the search engine a direct list/map of pages on the website that can be more complicated/less pretty.

For Google crawling purposes: Single PHP pull-page, or individual pages for each different item?

I am creating a site and want to have individual pages for each row in a database table. The information on each page is fairly useful and comprehensive, and it would be really nice if Google could index them.
My initial thought was to just create a single PHP template page and pull the correct information for whatever the user is looking at, but my fear is that search engines won't be able to index all of the pages.
My second thought was to batch-create/automate the process of creating the individual pages as html files (for the 2000+ rows in the table), because then I would be guaranteed that they'd be crawled. However, if I ever needed to make a change to the design, I'd have to re-process them all. Kind of a pain...
My final consideration was to just pick a page in my site and list all of the possible php pages in a hidden div, but I wasn't sure if search engines can index from that. I assume they just pull from the HTML, so it'd be able to find it, right?
Any suggestions? I would love it if I can just create a single page that populates based on what they user clicks, but I want them to be indexed.

Search engines can index dynamic pages so using one PHP file to create thousands of unique product pages will be fine for SEO. After all, each page/product will have a unique URL and will be seen as a unique page as a result. All you need to do is link to your product pages within your website and/or submit an XML sitemap so you can be sure they are found and indexed.
By linking your pages, I literally mean link to your product pages. Search engines find new content primarily through following links. So if you want your product pages to be found you need to link to them. Using form based search is not a good way to do it as search engines generally don't play to well with forms. But there are lots of way to make links to your pages including HTML sitemaps and product category pages which then can link to products in that category. Really, any way yo u an get a link to your product pages is a good way to help ensure they are found by the search engines.

You don't have to post links on invisible DIV!
Just create the page and have parameterized content fetching.
You can include the pages in the XML sitemap and submit to Google or you can include your page urls in the HTML sitemap too.

Generating Sitemap.xml file for Dynamic sites in PHP

How can one crawl a site for all Unique links and Make/Write an XML file to the root of that Respective domain. I want something like when i call mydomain.com/generatesitemap.php And this file crawls all the links in the domain and writes them to file sitemap.xml. Is this possible in PHP with cURL?

It depends on your site. If it is simple site -- then the task is simple. Grab your site root page via curl or file_get_contents, preg_match all the links (see here, for the reference http://www.mkyong.com/regular-expressions/how-to-extract-html-links-with-regular-expression/), then recursively grab all the links, which are inside your site, do not process links, which are allready processed.
The task become more complicated when JavaScript comes to play. If navigation uses JavaScript data, it will be difficult to obtain the links. There could be other navigation tricks, like select-combobox as dropdown menu.
The task could be even more complicated if you have pages with query strings. Say you have the catalogue section. And url are like this:
/catalogue
/catalogue?section=books
/catalogue?section=papers
/catalogue?section=magazines
Is it one page or not?
And what about this one?
/feedback
/feedback?mode=sent
So you should take care of this cases.
There are many examples of such a crawlers in google search. Look at this for instance:
http://phpcrawl.cuab.de/

How to make search engines index search results on my website?

I have a classifieds website.
It has an index.html, which consists of a form. This form is the one users use to search for classifieds. The results of the search are displayed in an iframe in index.html, so the page wont reload or anything. However, the action of the form is a php-page, which does the work of fetching the classifieds etc.
Very simple.
My problem is, that google hasn't indexed any of the search results yet.
Must the links be on the same page as index.html for google to index the Search Results? (because it is currently displayed in an iframe)
Or is it because the content is dynamic?
I have a sitemap which works, with all URLS to the classifieds in the sitemap, but still not indexed.
I also have this robots.txt:
Disallow: /bincgi/
the php code is inside the /bincgi/ folder, could this be the reason why it isn't being indexed?
I have used rewrite to rewrite the URLS of the classifieds to
/annons/classified_title_here
And that is how the sitemap is made up, using the rewritten urls.
Any ideas why this isn't working?
Thanks
If you need more input let me know.

If the content is entirely dynamic and there is no other way to get to that content except by submitting the form, then Google is likely not indexing the results because of that. Like I mentioned in a comment elsewhere, Google did some experimental form submission on large sites in 2008, but I really have no idea if they expanded on that.
However, if you have a valid and accessible Google Sitemap, Google should index your classifieds fine. I suggest to use the Google Webmaster Tools to find out how Google treats your site and to diagnose any potential problems with crawling.

To use ebay is probably a bad example as its not impossible that google uses custom rules for such a popular site.
Although it is worth considering that ebay has text links to categories and sub categories of auction types, so it is possible to find auction items without actually filling in a form.
Personally, I'd get rid of the iframe, it's not unreasonable when submitting a form to load a new page.

that question is not answerable with the information given, to many open detail questions. if you post your site domain and URLs that you want to get indexed.
based on how you use GWT it can produce unindexable content.

Switch every parameters to GET
Make html links to those search queries on "known by Googlebot" webpages
and they'll be index

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.