I have created an advertising website in php mysql. I have nearly 200 files for each location. This 200 files will be for example : for selling cars, bikes etc. In all the title, head, keywords I used a variable x which is the location. Then I used a scripting language to open each of the 200 files, replace x with location name, save it in different names. For ex: location1_websitename_cars.php. There are more than million locations. I created 200× million files like these. But I cannot host my website economically due to file number limitation on shared hosting servers.
My intention for replicating 200 files for each location was that google search engine can find my pages when user searches the location name as keyword. As per my understanding google crawls through the existing pages in server and find the location name as keyword and this results in inclusion of webpage in search results. Since this approach wont work with shared hosting, I changed strategy.
I am able to generate files required for a location dynamically according to the user selected location from the home page of my website. In this case I just need to store 200 files in my server. All pages would be accessible from home page of website. But I don’t know whether that pages would accessible from google search. For ex: if user types : "location1 www.mywebsite.com cars ", that php page wont be displayed as this page don’t exist in server. It is to be dynamically created.
To simply put: " Is there a way of including my website pages in google search results if that page don’t exist in server. It would be dynamically created once user selects some input and submit it from the home page.
Search engines won't have any problem following your dynamically created pages, but you will need to first create links to those pages from another page that the search engines do know about (eg: your home page). Once you link to your dynamic pages, they can be indexed.
The more pages and sites (especially high values sites) that link to your pages, the higher up in the search results your pages will be (of course there are other factors that affect this as well). Also if you want to test any of this without wading through pages upon pages of search results, google: "site:www.yourwebsite.com yoursearchterms"
Google use URL as identifier for pages not the files on the server.
To detect URLs, Google use robots following links on the web (<a>, <link>, etc.).
If you want your page to get found and indexed by Google, do not worry about your files on server but on your URLs and internal linking. You need to create a navigation to all the possible pages to let robots access it.
NB: URLs with parameters work but it is preferable to rewrite them.
Related
I need to get limited list of all pages, that belongs to some website Php. How would code look like? Limited means function(some url, limit of page).
There is no standard way to do this. Some web sites publish an XML sitemap and link to it from robots.txt, but most do not.
You may be able to assemble a partial list of pages on a site by crawling the site, e.g. requesting one page on the site, searching for links to other pages, and requesting those pages as well. However, this is not guaranteed to find all pages on a site -- some may not be reachable from the home page! -- and is a complex process.
Manually you can make PHP pages in your directory.
e.g.
Index.php
About.php
Contact.php
But with PHP frameworks like Laravel the pages do not exist in a file, they are in the database and are called when the user visits the page.
e.g.
If a person visits http://mywebsite.com/contact , the framework will look in the database for a page named 'contact' then output it to a user.
But how does Google (or other search engines) find those pages if they only exist in the database?
Google can index these fine as they are "server-side" generated. Files do not need to exist for Google to be able to index them, just exist at the server-side level.
Where Google has issues indexing is if your site is "client-side" based and uses something like AJAX to pull the content into the browser. A search engine spider can't execute JavaScript so they never find the content. However, Google has defined some guidelines for people to get this content indexed in their Web Masters Guide.
You have a static website address www.domain.com and that is real so once google come to know that there is a website named www.domain.com it will visit the site, now that google crawler is on your website it will look out for the links available on the home page of www.domain.com and hence they will be crawled. Thats simple
In Laravel, pages DO NOT exist in database, although they might be dynamically generated.
As pointed #expodax ,
Google will index LINKS for your web app, and links (URIs) are geneated in accordance with your routes.php file (found in app/Http/routes.php)
In essence, Google will index links / URIs available for the end user, it DOES NOT depend upon how you've organized files in your web app.
For detailed documentation about Routes in Laravel (how they can be generated or used) please check this.
http://laravel.com/docs/5.0/routing
A sitemap is a file where you can list the web pages of your site to tell Google and other search engines about the organization of your site content. Search engine web crawlers like Googlebot read this file to more intelligently crawl your site. more info
If you want to generate a sitemap for your laravel application you can do it manually or you can use a package like this: https://github.com/RoumenDamianoff/laravel-sitemap
Do search engine robots crawl my dynamically generated URLs? With this I mean html pages generated by php based upon GET variables in the url. The links would look like this:
http://www.mywebsite.com/view.php?name=something
http://www.mywebsite.com/view.php?name=somethingelse
http://www.mywebsite.com/view.php?name=something
I have tried crawling my website with a test crawler found here: http://robhammond.co/tools/seo-crawler but it only visits my view page once, with just one variable in the header.
Most of the content on my website is generated by these GET variables from the database so I would really like the search engines to crawl those pages.
Some search engines do, and some don't. Google for one does include dynamically generated pages: https://support.google.com/webmasters/answer/35769?hl=en
Be sure to check your robots.txt file to ensure files you do not want the crawlers to see are blocked, and that files you do want indexed are not blocked.
Also, ensure that all pages you want indexed are linked via other pages, that you have a sitemap, or submit individual URLs to the search engine(s) you want to index your site.
Yes, search engines will crawl those pages, assuming they can find them. Best thing to do is to simply create links to those pages on your website, particularly accessible, or at least traversable from the home page.
I have been building a tool from scratch to generate a visual graph of webpages in a particular domain name. If a page links to another page it is denoted by an edge in the graph. My project is to investigate how web developers link their pages inside a particular website. My aim is to run this tool on around 100 non profit websites and analyse the results.
There's a catch :
Some pages are not linked by any other page on the internet(They are standalone pages). Is there any way I can get a list of such webpages in a particular domain name or a particular directory in a domain name.
Example : Say we have www.example.com/abc/xyz.asp
xyz.asp is not linked by any other page on internet and also directory listing at the parent directory (www.example.com/abc/ ) is disabled. How do I get to know that a webpage exists in that particular location.
I m particularly interested in asp and php domains. My assumption is that linked pages will form a cluster and standalone pages will be left alone like stars in the sky. After generating the graph I need to calculate some co-efficients.
Most completely dynamic web sites allow nearly every page to be found, crawled and indexed by search engines. How would this be properly implemented to allow a completely dynamic web site to be search engine-friendly? Note that there is no directory structure, users can type in complex URLs (www.example.com/news/recent) but the folder structure doesn't actually exist, it is all handled by htaccess, which submits the url entered to the main web application for page generation.
Search engines access websites nearly the same way as a visitor. If the search engine web crawler gets to www.example.com/news/recent, it will index the results which will then be search-able.
Most websites have static links to point to content, so the top news article might be on www.example.com/news/recent, but it could also be on www.example.com/news/9234. That gives search engines somewhere permanent to link to. The search engine doesn't care if www.example.com/news/9234 really loads www.example.com/pages/newsitems.php?item=9234, that's all hidden.
Another handy way is through site maps, which provide the search engine a direct list/map of pages on the website that can be more complicated/less pretty.