I have a website that is totally in French. I plan to have an english version since few months. Everything work fine, people can switch their language and the session handle their language.
The problematic is that now I have bought a domain that is different of the french one and I guess it will require to point to the same sub-domain of the host.
I guess that I will need to make some code to check the domain name and if the user is coming to the server with the english one to switch the session to the english language otherwise to use the french language. Am I wrong or not?
I think I will proceed that way but many other part of the website might be completly different like Ads and Images. Is the best way to handle multilanguage website is to make many comparison of the language for these Ads and Images? or to make a duplicate of the website on an other sub-domain and link the new domain to the new folder (I really think that duplicating will be worst for the long run).
Any advice would be appreaciate.
(Both questions in this question are bolded)
If what you mean is a new domain name, point it to the same server as your first domain, and do the language checking (or whatever is required) in the PHP script:
if ($_SERVER["HTTP_HOST"] == "my_first_domain_name.fr")
{ // use french site }
elseif ($_SERVER["HTTP_HOST"] == "my_second_domain_name.fr")
{ // use english site }
you could also think about a solution that splits french content into a directory named /fr, and english content in /en.
Every site that I have built to support multiple languages detected the user's language and then stored it in their session information. How you detect their language is up to you (from their IP, defaulting to a language, etc.), but make sure you provide the user an easy way to change languages. Then, based on the session information, we would update the site copy (ie., put up a different translation), experience (ex., only show products or news stories from that locale), etc.
Having multiple copies of the site on different [sub-]domains is a viable option, though one I don't like: you will have to support and release to all those different domains.
You could also set the session variable if the user comes from your new domain. Just have both domains point to the same place.
You should aim to duplicate as little as possible.. duplicating your site will lead to maintainability problems in future.
You can point both domain names at the same server IP, and have conditional server-side code to determine which content is served to the user.
In PHP, the variable $_SERVER['SERVER_NAME'] is populated with the server name from the client's http request (e.g. 'google.com').
If people access the same php script via different domain names, you could use the value of this to decide which content to present (e.g. have an html template, with the relevant content populated from the database according to site).
In terms of advertisements, you could do the same. Something like google ads will likely take care of this for you.
More generally, we're talking about virtualhosts here. There are lots of different ways to achieve what you're after, and methods vary according to the specifics of the problem, platform, hosting constraints etc.
A lot of sites base the default language choice (and advertisements, currencies used etc) on geoip, falling back to some default.
There are a lot of ways to cut this cookie. Note that since sessions are controlled by cookies (by default at least), your users will get different sessions depending on which domain they request.
Related
Im not talking about extracting a text, or downloading a web page.
but I see people downloading whole web sites, for example, there is a directory called "example" and it isnt even linked in web site, how do I know its there? how do I download "ALL" pages of a website? and how do I protect against?
for example, there is "directory listing" in apache, how do I get list of directories under root, if there is a index file already?
this question is not language-specific, I would be happy with just a link that explains techniques that does this, or a detailed answer.
Ok so to answer your questions one by one; how do you know that a 'hidden' (unlinked) directory is on the site? Well you don't, but you can check the most common directory names, whether they return HTTP 200 or 404... With couple of threads you will be able to check even thousands a minute. That being said, you should always consider the amount of requests you are making in regards to the specific website and the amount of traffic it handles, because for small to mid-sized websites this could cause connectivity issues or even a short DoS, which of course is undesirable. Also you can use search engines to search for unlinked content, it may have been discovered by the search engine on accident, there might have been a link to it from another site etc. (for instance google site:targetsite.com will list all the indexed pages).
How you download all pages of a website has already been answered, essentially you go to the base link, parse the html for links, images and other content which points to a onsite content and follow it. Further you deconstruct links to their directories and check for indexes. You will also bruteforce common directory and file names.
Well you really effectively can't protect against bots, unless you limit user experience. For instance you could limit the number of requests per minute; but if you have ajax site, a normal user will also be producing a large number of requests so that really isn't a way to go. You can check user agent and white list only 'regular' browsers, however most scraping scripts will identify themselves as regular browsers so that won't help you much either. Lastly you can blacklist IPs, however that is not very effective, there is plenty of proxies, onion routing and other ways to change your IP.
You will get directory list only if a) it is not forbidden in the server config and b) there isn't the default index file (default on apache index.html or index.php).
In practical terms it is good idea not to make it easier to the scraper, so make sure your website search function is properly sanitized etc. (it doesn't return all records on empty query, it filters % sign if you are using LIKE mysql syntax...). And of course use CAPTCHA if appropriate, however it must be properly implemented, not a simple "what is 2 + 2" or couple of letters in common font with plain background.
Another protection from scraping might be using referer checks to allow access to certain parts of the website; however it is better to just forbid access to any parts of the website you don't want public on server side (using .htaccess for example).
Lastly from my experience scrapers will only have basic js parsing capabilities, so implementing some kind of check in javascript could work, however here again you'd also be excluding all web visitors with js switched off (and with noscript or similar browser plugin) or with outdated browser.
To fully "download" a site you need a web crawler, that in addition to follow the urls also saves their content. The application should be able to :
Parse the "root" url
Identify all the links to other pages in the same domain
Access and download those and all the ones contained in these child pages
Remember which links have already been parsed, in order to avoid loops
A search for "web crawler" should provide you with plenty of examples.
I don't know counter measures you could adopt to avoid this: in most cases you WANT bots to crawl your websites, since it's the way search engines will know about your site.
I suppose you could look at traffic logs and if you identify (by ip address) some repeating offenders you could blacklist them preventing access to the server.
I have a small search engine site and I was wondering if there was any way of displaying my site in the users language. I am looking for an inventive and quick way that can also reside on just one URL.
I hope you can understand my question.
You could use the HTTP header "Accept-Language", to detect which languages the user has choosen as its prefered ones, in his browser.
In PHP, this will be available (if sent by the browser) in $_SERVER, which is an array that contains (amongst other things) HTTP headers sent by the client.
This specific header should be available as $_SERVER['HTTP_ACCEPT_LANGUAGE'].
I am assuming you already have different versions of the site in various languages. Most sites seem to just ask the user what their language is and then save that in a cookie. You can probably guess a users language using an ip to location tool.
You are probably more interested in this though: http://techpatterns.com/downloads/php_language_detection.php. This php script allows you to detect the users language based on info sent from their browser. It might not be completely accurate though, so you should always have an option to switch the language.
If you don't have translations of your page, you can redirect users to a google translate page.
There is a really easy solution for this. Just use Google's Translate Elements JS addon. You drop the JS on the page and Google takes care of the rest.
http://translate.google.com/translate_tools
The only downside is that they cannot fully interact with the site using this. By that I mean they cannot input something in their own language and you get back the input in yours. Also searches will have to be done in the sites native language. So really this just depends on what you are trying to accomplish here.
You could use a script which checks for a language cookie.
If language-cookie is set, you can use that value for using the right language-vars,
if not you find out the users current language by a way, you prefer. I think there are lot of ways, dont know which is the best.
Additional you would place a form somewhere on the site, where the user can klick a language, and u give that by post to a script which then sets a cookie, or overwrites the current cookie, if there is one allready.
This method obviously works with one url for all your languages, which i think is quite nice about it...
My main Site (hostsite) has an IFRAME with a registration site (regsite) hosted on a different domain.
I want to host the registration on a different domain, because I feel storing the DB login information on hostsite is not safe as many people have access to the backend.
All browsers accept the login session-cookie coming from regsite - Internet Explorer 8 does not. The only way to make IE accept this cookie is by adding both sites to "Trusted Sites" which is not what i want.
Is there any way I can work around the cross-domain issue besides a local browser setting or is my only option to move the registration to hostsite (curl is not an option as it's not static HTML I'm displaying on the registration site, but PHP files)?
I think this can be solved without moving anything, and little programming. Just with some DNS rules.
For example, you can create a new subdomain called register.hostsite(.com) pointing to the ip where regsite is.
Then re-direct the IFRAME to that new subdomain.
It will get the same bits from the same server, but now it will be inside hostsite's domain.
That should (at least in theory) be enough to satisfy IE. I'm not 100% sure though, I haven't used IFRAMES in a long time.
If that doesn't work, I'd suggest asking on serverfault, too.
EDIT: I was looking for another issue and found this 'micro-proxy' PHP implementation by yahoo. It's their recommended way of resolving this kind of issues:
http://developer.yahoo.com/javascript/samples/proxy/php_proxy_simple.txt
The problem with the iframe and IE is that IE considers the iframe's content as 3'rd party (as in advertisements etc).
To have IE actually store the cookies set by this domain document you need to have the other domain emit an P3P-header stating its intentions. This is easy to do and requires only a single http-header to be added.
I'm, not sure what you mean by cross domain issues though as there are none - you simply have two different documents from two different domains. You have not stated whether you are trying to have one domain set cookies for the other, or one page access the other.
There are a couple of approaches that come to mind:
GET. I.e. something like www.domain.com, www.domain.com/lang/de
Session.
Database.
I am curious what is an industry standard in this area.
The Cookie method can become really annoying on a public machine (Internet cafe, etc).
For the best SEO results you should set up your URLs for different languages anyways, but I normally also store the selected language in session. Of course that expires on browser close, but when a user comes to the site, I always check the Accept-Language header, and decide the starting language based on that.
If you have something like an auto-login feature, you can save the selected language in the database too.
You can't store information on GET and plus you need to rewrite the URLS to look like this www.domain.com/lang/de But its really good for make your application search engine friendly.
And for your answer, I will use what you call it GET + Cookies.
Database - you will need a table for that or if you have already a table for your user you will need at least one more field.
Session the users should set they are Language every-time its a bit boring!
The GET method would require the user to go to the start page every single time, even if he has a direct link to something inside. Otherwise you cant use get, or you'll have to include it in every single link. The session method will expire and forget about the user. The database is nice, but the user will have to login AND THEN see the website in his own language. I think the best thing to do is use a cookie.
You could change de default application locale in the bootstrap based on the value stored in a session. This way you get more flexibility if you want to add a new language, you just have to create a new translation file if all of you code is built based on the user local. This will also ease the date/time and money display process since those are also based on the locale.
french locale ie : fr_FR or fr_CA (for french France of french Canada)
english locale id : en_US or en_UK (for english United States or english United Kindom)
and then if you have to display money, the local will set the good money symbol (but won't do the price conversion) (ie fr_FR -> € and fr_CA -> $)
You could base the default language selection based on the geolocation of the user based on his IP address
Well...
with a custom route setup in ZF - and you can probably omit the 'lang' portion.
it's also really easy to configure and good for SEO.
sessions are good way to setup #1 when a user returns to your site
database storage is a good way to setup #2 and #1 when a user logs in with their own
preferences set.
What is also a good way to attempt to do the initial setup is use Zend_Locale to detect the users language via their OS/browser settings. But, if you do this you should always let users override your detected value (and save it in a session or db setting)
This is something that is entirely up to you. You'll be hard pressed to find a "standard."
I have seen a few answers to this on SOF but most of these are concerned with the use of subdomains, of which none have worked for me. The common one being that the use of session.cookie_domain, which from my understanding will only work with subdomains.
I am interested in a solution that deals with deals with entirely different domains (and includes the possibility of subdomains). Unfortunately project deadlines being what they are, time is not on my side, so I turn to SOF's expertise and experience.
The current project brief is to be able to log into one site which currently only stores the user_id in the session and then be able to retrieve this value while on a different domain within the same server enviroment. Session data is being stored/retrieved from a database where the session id is the primary key.
I am hoping to find a "light wieght" and "easy" to implement solution.
The system is utlising an in-house Model View Controller design pattern, so all requests (including different domains) are run through a single bootstrap script. Using the domain name as a variable, this determines what context to display to the user.
One option that did look like to have potential is the use of a hidden image and using the alt tag to set the user id. My first impressions suggest this immediately seems "too easy" (if possible) and riddled with security flaws. Disscuss?
Another option which I considered is using the IP and User Agent for authentication but again I feel this not going to be a reliable option due to shared networks and changing IP addresses.
My third option (and preferred) which I considered and as yet not seen discussed is using htaccess to fool the user into thinking that they are on a different domain when infact apache is redirecting; something like
www.foo.com/index.php?domain=bar.com&controller=news/categoires/1
but displays to the user as
www.bar.com/news/categories/1
foo.com represents the "main site domain" which all requests are run through and bar.com is what the user thinks they are accessing. The controller request dictates the page and view being requested. Is this possible?
Are there other options? Pros/Cons?
Thanks in advance!!!
Have you thought about using session_set_save_handler. You can store your sessions in a database and access them from any domain.
Define a main session server (I do this in combination with memcached)
use Ajax / JSON(P) to request a session from this server, this allows you to share sessions over multiple domains
Reuse this session
For the benefit for anyone else interested in this functionality, there is no simple answer I am afraid. Google "Single Sign On" and it will come back with the technology and some solutions avialable.
As for using htaccess to hide the domain name, this is not possible as it could be used for malicious activities.
I have now successfully implemented a system to achive my requirements.