Does a crawler like googlebot 'see' my rewritten urls? - php

I'm using htaccess to rewrite and redirect www.mysite.com/index.php?id=# to friendly urls like www.mysite.com/news. So all news-articles will be written as www.mysite.com/news/article1, etc.
Now I'm blocking off all directories on my server that it doesn't need to index with robots.txt. Since I'm using a cms these are directories like /core, /managers, /connectors, etc. But since the www.mysite.com/news directory doesn't actually exist, but is rewritten with htaccess, will blocking off all the directories like /core, etc. still allow a crawler to index my website?
So basically what I want to know is: does a crawler see my website urls as they are after they're rewritten? Or does it still need access to the other directories of my cms, like /core to be able to index my pages?

No, the rewritten URL is an internal mapping process only. It is only used by your web server to determine how to treat the user-friendly URL it receives.
The same way the URL remains unchanged in a browser address bar, the process is invisible to the client, be it a web browser or a bot.
URL Rewriting is not to be confused with Redirection. In the latter case, a client request receives a "301 Redirect" response containing the URL where the actual resource resides. This results in a second request from the client to the redirected URL. Then by definition the client will be aware of this process.

Related

Are external links from other sites also redirected?

I have a pretty straight forward question about 301 redirect.
So I've written 301 redirect for one folder on my site to another site:
RewriteRule ^example/folder https://example-new-site.com/ [L,R=301]
Now I'm wondering if other sites(which are not mine) that have links to that folder/pages will also be redirected or no?
The directives in .htaccess apply to all incoming HTTP requests.
When a user (or bot) follows a hyperlink on a website, this instructs the user's browser (user-agent) to make a request to the destination URL. In the case of an HTTP URL to your website then the user's browser makes an HTTP request - in a similar way to if the user had typed that URL into their browser.
So, yes, a user following a link on another site is also redirected as it's simply an HTTP request.
To be honest, if the external link (or rather, the user/bot following that external link) did bypass your redirect, then what's the point of implementing the redirect in the first place? Oridinarily, redirects of this nature are in place solely to redirect inbound requests for old URLs.

Tracking redirects Apache/PHP

Is there a way in PHP to know if the current script (page) came from an apache 301 redirect?
Additional details to set the stage for this question:
We just completed a website merge, retiring one web site and sending it's traffic to a similar sister website which included 100s of individual 301 redirects for products, categories. We would like to track transactions that originated from the retired website URLs vs those originating from the current URL. Basically, as an order completed, we want to know which URL it had originated from, redirected from the old vs direct traffic from the new.
Currently the 301 redirects are performed via a separate file included in the httpd.conf file. This separate file contains 100s of "Redirect 301 /source /destination" directives.
My plan was to have PHP sniff out these redirects, store the original URL domain as a session variable, then access that session variable at order completion in order to connect it to that original domain.
For this to work, the landing page script would need to know if it is the result of a redirect. Which leads to the question: Is there a way in PHP to know if the current script (page) came from an Apache 301 redirect?
Thanks in advance for your help.

Can Apache (or Nginx) allow 2 apps to selectively serve the same URLs?

Say I have a PHP app (WordPress) at http://example.com/.
It is not a simple blog. It is a large app with thousands of pages, multiple custom post types and "themes within a theme" to handle all of the custom content.
I want to integrate this with another PHP app (for this example let's say silverstripe, but it could be anything) in a way that will allow me to replace things in stages because I don't want to have to wait months until a totally new app is finished before deploying anything.
The problem is that this is a 10yr+ old site and has many legacy URLs that must be maintained. That means redirects are not allowed(for certain URLs) and I need either app to be able to respond to the same URLs that currently exist. Can't add in /wp/ or anything like that. The URLs need to be identical.
So for instance we currently have a page at http://example.com/page.html that is being generated by WordPress. I would like to replace this with a page that is generated by silverstripe.
Is there a way to configure Apache or Nginx so that if the silverstripe app understands the request (has a route defined) for http://example.com/page.html it will be generated by silverstripe and if it doesn't understand the request it will "fall back" to WordPress and be served from the same URL http://example.com/page.html by WordPress. Not http://example.com/wordpress/page.html
Thanks!
Apache mod_rewrite's "passthrough" can handle this. It's frequently used for URL shortening for SEO purposes (e.g. to rewrite /some/crazily/long/path to /short), and could also be used by you to use /page.html as /wordpress/page.html, e.g:
RewriteRule ^/page.html$ /wordpress/page.html [PT]
So go ahead and setup the separate directory for your wordpress instance, and use RewriteRule. The URL will still show up as '/page.html' in the brower's location bar, but it will serve '/wordpress/page.html'.
More on PT: http://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_pt
This will take care of your 'served from same URL' requirement. As far as serving it only if you can't serve it from the first app, take a look at this existing thread: Redirect requests only if the file is not found?

PHP how to detect user coming from sister website if referrer disabled

E.g. I have two domains:
http://www.computer.com
http://www.computers.com
http://www.computer.com is main website and http://www.computers.com is just dummy that will redirect to main website (http://www.computer.com).
So my goal with pure PHP is to detect that user comes from http://www.computers.com rather than google or other website or directly typing url without using refer since it can be disabled.
Both sites are on same hosting, but I cannot access file system of one site from another. And $_SESSION is or $_COOKIE are domain specific variables too.
You could set your redirect script to push to www.computer.com?ref=sister or some similar URL, then log accordingly and, if desired, redirect back to home to keep it transparent to the user.
Are you asking for an htaccess example though? Or just ideas on how to do it?
You say pure php in your computers.com index file:
<?php
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.computer.com?ref=sister");
?>
Your best bet is to to address your issues on multiple levels. First you should point DNS for www.computers.com to point at the IP address for www.computer.com. Now both of your requests are being served by the same server or cluster of servers(if behind a load balancer).
Second, if you actually want to rewrite the URL to utilize a single domain (to consolidate your cookies, sessions, etc.), then you could use webserver redirection (i.e. mod_rewrite for Apache) to redirect all the requests to the final domain.
Finally, in that rewrite, you should send 301 codes on rewrite to make sure that spiders and such know that this is a permanent redirection.

Ghost directories/files with php?

I was wondering how Wordpress and other random forums and sites, like facebook, create like phantom directories? Eg, a blog might have http://www.joesblog.com/2010/11/12/this-is-my-post.php
Does that file and directory resource actually exist? Also, how does Facebook, for example, have like http://www.facebook.com/-usernamehere- ? Is that a physical directory, or is it simply a scripting trick? How can I do this with PHP?
That kind of functionality is normally achieved by instructing the web server to "link" certain URL patterns to a specific controller.
See .htaccess, for example.
EDIT: This article on Rewrite engine might also help.
So no, no directories actually exist. The web server receives a request for a specific URI and redirects that request to a delegated controller (that can be a PHP script, for example) that in turn returns a result based on the URI and action requested by the user.
PHP can certainly handle this, but it's the web server that needs to be instructed on how to handle those types of request.
If you're using apache you might want to take a look at some mod_rewrite tutorials.

Categories