Fetch contents from a unicode URL - php

I would like to fetch the contents (HTML) from this URL: http://www.tvsporedi.si/spored.php?id=Vaš kanal. I've tried with File_Get_Contents and cURL.
No matter how I construct the URL in my code the page always returns a blank page (a page with header&menu but no content). I tried url encoding the id parameter, leaving the id as it is, without any luck. The only change I can make to the url (it seems) is encoding the space (with %20. Encoding the š does not work.
So I guess what I'm asking is why does PHP "eat" the š? The PHP file is saved with UTF-8 encoding ...

Try fetching:
http://www.tvsporedi.si/spored.php?id=Va%C5%A1%20kanal
This works for me in a browser, and I would guess with whatever method you use.
I used Firebug to see how the browser was encoding the request...

works for me with urlencode:
readfile('http://www.tvsporedi.si/spored.php?id=' . urlencode('Vaš kanal'));
demo

Related

PHP file_get_contents() Chinese character ERROR CODE

I use file_get_contents() to download a JSON. There're some Chinese characters in the URL, I tried to print the URL out, it's OK. But when I ran the program, the URL I put in the function became error code. How do I know that is this URL links to a JSON that links to a MySQL request, and in the console of MySQL, I saw the URL became error code. I tried lots of ways to change URL string to UTF-8 or GB2312, etc, but none of that works. I Wish I could get help here, thanks.
Its very difficult to understand your question. I think i understood the first part of your question:
I use file_get_contents() to download a JSON. There're some Chinese
characters in the URL, I tried to print the URL out, it's OK. But when
I ran the program, the URL I put in the function became error code.
You try to access a URL containing chinese characters using file_get_contents():
The answer to this is:
You need to encode the part of the url containing chinese characters using urlencode() or rawurlencode().
The main difference between urlencode()and rawurlencode() is, that urlencode() converts spaces to +. rawurlencode() converts spaces to %20.
urlencode is used for Query Parameters as example ?q=my+search+key, in every other case you use rawurlencode.
Example:
$test = 'http://www.example.com/'.rawurlencode('以怎么下载').'.html';
print_r($test);
// $html = file_get_contents($test);
// output:
http://www.example.com/%E4%BB%A5%E6%80%8E%E4%B9%88%E4%B8%8B%E8%BD%BD.html
I hope it solves your problem.

How to decode utf-8 charected in codeigniter?

I'm developing a site with codeigniter that support multilanguage. When a user search with their native language I got the first result when I paginate the result the character is not decoding.
This is the url which is used to paginate.
When I print the uri segment I got %E0%B4%AE
I tried the url encode and url decode that time I got a different charecter like à´®
Can any one tell me how can I decode this type of charecterset?
While urldecode is what you should be using, the reason that you are getting the wrong output printed is probably because the output page's encoding hasn't been set to UTF-8, and is thus defaulting to ISO-8859-1. Hence, while the characters have been decoded correctly by PHP, the browser then interprets the characters in the wrong encoding, resulting in incorrect display.
To fix the problem, send a charset in the Content-type header before any output like so:
header('Content-type: <type>; charset=utf-8');
If your output page is HTML, you could alternatively use this tag in the head:
<meta charset="utf-8">
If you take the second option, be sure to place the tag as early as possible in the head, as browsers do not scan past the first 1024 bytes of the page for this declaration.

Turn URL encoded string into UTF-8

Yesterday I faced problem about getting some Unicode strings from URL. Actually I use CodeIgniter and the URL segments will be passed into controller/function(parameters). I don't know is it the CI changes the encoding or it's something else. I have the right encoding in my HTML page in the content and in the address bar till I get those Unicode segments.
For example:
localhost/df-gamez/news/افتتاح-جدید-سایت-تیم-دریم-فکتوری
The last segment is perisan characters which is under UTF-8. It's present correctly in everywhere but when I get it in my code It will turn to something like this:
%d8%a7%d9%81%d8%aa%d8%aa%d8%a7%d8%ad-%d8%b3%d8%a7%db%8c%d8%aa-%d8%ac%d8%af%db%8c%d8%af-
%d8%aa%db%8c%d9%85-%d8%af%d8%b1%db%8c%d9%85-%d9%81%da%a9%d8%aa%d9%88%d8%b1%db%8c
I tried to change it to UTF-8 with mb_encode but it didn't. Both my HTML page and Controller file are formatted as UTF-8 Without BOM.
Peace Out!
Use urldecode function to decode.
echo urldecode("%d8%a7%d9%81%d8%aa%d8%aa%d8%a7%d8%ad-%d8%b3%d8%a7%db%8c%d8%aa-%d8%ac%d8%af%db%8c%d8%af-%d8%aa%db%8c%d9%85-%d8%af%d8%b1%db%8c%d9%85-%d9%81%da%a9%d8%aa%d9%88%d8%b1%db%8c");
will give you افتتاح-سایت-جدید-تیم-دریم-فکتوری

URL as URL's get parameters - problem with "&"

There is script that receives another url as GET parameter:
script.php?file=http://www.google.com&id=123
The problem is:
when url has parameter itself, it is used as script's parameter, not that url's parameter:
script.php?file=http://www.google.com?q=adsf&lang=en&id=123
URL is http://www.google.com?q=adsf&lang=en, but it is chopped after &, because it is viewed as related to script.php itself.
What can I do about this?
I tired to replace & with %26, but url get broken with it.
You need to encode the value with the percent-encoding.
If you’re using PHP, use rawurlencode (or urlencode if application/x-www-form-urlencoded is expected):
$url = 'http://www.google.com?q=adsf&lang=en';
echo 'script.php?file='.rawurlencode($url);
You need to URL encode the entire URL that you are passing as a parameter to another url (your script). %26 is the correct encoding for an &. Just make sure you decode it server-side before using it. You don't say what language(s) you're using, but most, inc javascript and php have native URL encoding functions.
Try to encode every special character like this:
script.php?file=http%3a%2f%2fwww.google.com%3fq%3dadsf%26lang%3den&id=123
although it might be better and easier to use rawurlencode().
Also, read this about URL encoding.

PHP get url with special characters without urlencode:ing them!

I would like file_get_contents to get a url that looks like this: http://wapedia.mobi/sv/Gröt
The problem is that it requests (can't post entire link, sorry): ...wapedia.mobi/sv/Gr%C3%B6t which you can see has been urlencoded, that page does not give me any results.
How can I do this?
According to the PHP manual, you must specifically encode a URL if it contains special characters. This means the function itself should do no special encoding. Most likely your URL is being encoded before being passed to the function, so pass it through urldecode first and see what happens.
Edit: You're saying the encoding is being messed up. Again the PHP manual specifically states that you need to encode urls prior to passing them to file_get_contents. Try encoding the URL, then passing it to the function.
$url = urlencode($url);
file_get_contents($url);

Categories