utf-8 url parsing in php:file_get_content and browser

utf-8 url parsing in php:file_get_content and browser - php

I want to get a a URL content using file_get_contents($url); when I copy the URL from browser address bar it is like this:
$url="http://www.mashadhome.com/fa-estate-39855-tags-%D9%81%D8%B1%D9%88%D8%B4-%D8%A2%D9%BE%D8%A7%D8%B1%D8%AA%D9%85%D8%A7%D9%86-%D8%A8%D9%84%D9%88%D8%A7%D8%B1%20%D8%B5%DB%8C%D8%A7%D8%AF%20%D8%B4%DB%8C%D8%B1%D8%A7%D8%B2%DB%8C";
but when I automatic get the url using
$homepage1 = file_get_contents($url_value);
$doc1 = new DOMDocument;
$doc1->preserveWhiteSpace = false;
#$doc1->loadHTML($homepage1);
$xpath1 = new DOMXpath($doc1);
$nodes1 = $xpath1->query("//html/body/section/div/div/section/figure/a");
$href = $node1->getAttribute('href');
it is sothing like this:
$href="http://www.mashadhome.com/fa-estate-39855-tags-فروش-آپارتمان-بلوار صیاد شیرازی";
I use code like above to get content of this link, but the file_get_contents($href) don't work for second URL, either when I copy second address to browser it works good;
so question is this: why second address doesn't work? how to convert first address to second type?

Url can accept restricted character set, namely ASCII letter, digits, hyphen. To access such url, it needs to be encoded to the format accepted by your server, like in your first example. Have a look at urlencode() function.
Of course you need to use urlencode only on parts that are not url special characters (like :, /). In this instance, you would use urlencode on the fa-estate-39855-tags-فروش-آپارتمان-بلوار صیاد شیرازی part only.

Related

How to convert an url into a hyperlink with php urlencode()?

Trying to convert a plain URL text into a valid link.
The problem I have is that my link might contain both English (A-Z/a-z) and Hebrew (אבגדהוזחטיכךלמםנןסעפףצץקרשת) letters.
Using PHP's urlencode() function I was able to get the correct format for Hebrew, yet I cannot find the right way in which I convert it into a link.
My code so far (does not work with Hebrew letters):
$replyText = preg_replace('#(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)#', '$1', $replyText);
An example for a URL I need to convert into a link:
google.co.il%2F%D7%A9%D7%9C%D7%95%D7%9D_Hello.html
Will become:
google.co.il%2F%D7%A9%D7%9C%D7%95%D7%9D_Hello.html

Despite what I believe you have posted to represent the desired output, if this was my task, I think I would have a urlencoded href value in the <a> tag and human-readable link text.
Code: (Demo)
$replyText = "google.co.il%2F%D7%A9%D7%9C%D7%95%D7%9D_Hello.html";
echo '', urldecode($replyText), '';
Source Code Output:
google.co.il/שלום_Hello.html
Effective Output:
google.co.il/שלום_Hello.html
Notice that when you mouseover the link, your browser's status bar will show the un-encoded url anyhow.

You just need to replace %2F => /, so your link will be: google.co.il/%D7%A9%D7%9C%D7%95%D7%9D_Hello.html
link

why GET method didn't decode attributes in case url has attributes have spaces?

kindly I have two links,
when using both of the links in another page, the first link is decoded automatically by GET Method and the second didn't.
the problem is that if there is a space in any attribute, the get don't decode automatically the URL and if there are no spaces, the get automatically decoding the URL which is the correct behaviour
tip : the only encoded attribute is BodyStr and encoded via URLENCODE PHP function.
another tip: the difference between both is the space in subjectStR Attribute
I want to know why spaces in URL prevent GET Global Variable from automatically decoding all the attributes
$message=urlencode($message);
http://localhost/test4.php?me=ahmed&y=1&clientid=55&default=1&Subjectstr=**Email From Contactuspage`**&BodyStr=$message
http://localhost/test4.php?me=ahmed&y=
1&clientid=55&default=1&Subjectstr=**EmailFromContactuspage**&BodyStr=$message

Space isn't allowed in URL query strings. If you put an unencoded space in SubjectStr, the URL ends at that point, so the server never sees the BodyStr parameter.
You need to URL-encode SubjectStr. Replace the spaces with + or %20.
$message=urlencode($message);
$url = "http://localhost/test4.php?me=ahmed&y=1&clientid=55&default=1&Subjectstr=Email+From+Contactuspage&BodyStr=$message"
The reason why it stops at space is because of the HTTP protocol. The client sends:
GET <url> HTTP/1.1
This request line is parsed by looking for the space between the URL and the HTTP version token. If there's a space in the URL, that will be treated as the end of the URL.

laravel. Replace %20 in url

I have simple problem, I have to replace %20 and other crap from URL. At the moment it looks like this http://exmaple/profile/about/Eddies%20Plumbing. As you can see it's profile link.
Yes I could add str_replace values before every hyperlink, but I have like 10 of them and I think it's bad practice. Maybe there is better solution? What solution would you use? Thanks.

That is not crap, that is a valid unicode representation of a space character. And it's encoded because it's one of the characters that are deemed unsafe by RFC1738:
All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.
So in order to have pretty URLs, you should avoid using reserved and unsafe characters which need encoding to be valid as part of a URL:
Reserved characters: $ & + , / : ; = ? #
Unsafe characters: Blank/empty space and < > # % { } | \ ^ ~ [ ] `
Instead replace spaces with dashes, which serve the same purpose visually while being a safe character, for example look at the Stack Overflow URL for this question. The URL below looks just fine and readable without spaces in it:
http://exmaple/profile/about/eddies-plumbing
You can use Laravel's str_slug helper function to do the hard work for your:
str_slug('Eddies Plumbing', '-'); // returns eddies-plumbing
The str_slug does more that replace spaces with dashes, it replaces multiple spaces with a single dash and also strips all non-alphanumeric characters, so there's no reliable way to decode it.
That being said, I wouldn't use that approach in the first place. There are two main ways I generally use to identify a database entry:
1. Via an ID
The route path definition would look like this in your case:
/profiles/about/{id}/{slug?} // real path "/profiles/about/1/eddies-plumbing"
The code used to identify the user would look like this User::find($id) (the slug parameter is not needed, it's just there to make the URL more readable, that's why I used the ? to make it optional).
2. Via a slug
The route path definition would look like this in your case:
/profiles/about/{slug} // real path "/profiles/about/eddies-plumbing"
In this case I always store the slug as a column in the users table because it's a property relevant to that user. So the retrieval process is very easy User::where('slug', $slug). Of course using str_slug to generate a valid slug when saving the user to the database. I usually like this approach better because it has the added benefit of allowing the slug to be whatever you want (not really needing to be generated from the user name). This can also allow users to choose their custom URL, and can also help with search engine optimisation.

The links are urlencoded. Use urldecode($profileLink); to decode them.

I am parsing the url tha i got in this way ->
$replacingTitle = str_replace('-',' ',$title);
<a href="example.com/category/{{ str_slug($article->title) }}/" />

In your view ...
{{$comm->title}
and in controller using parsing your url as
public function showBySlug($slug) {
$title = str_replace('-',' ',$slug);
$post = Community::where('title','=',$title)->first();
return view('show')->with(array(
'post' => $post,
));
}

json decode of url request which contain special character

I have a url request like this:
http://localhost/pro/api/index/update_profile?data={"id":"51","name":"abc","address":"stffu fsagu asfhgui fsahgiu3#$#^^##%^3 6\"\"wkgforqf\";rqgjrg..,,,rqwgtr''qwrgtrw'trwqt'rqwtqwr trqt\n"}
I am trying to json decode of this url.I use following code to decode url.It is working perfect if url not contain special character. but how to decode it if it contains special character.
$string = htmlspecialchars($_REQUEST['data'], ENT_QUOTES);
$jsonFix = urldecode($string);
$string = htmlentities($jsonFix, ENT_QUOTES | ENT_IGNORE, "UTF-8");
$json = json_decode($string, true);
print_r($json);exit;
I tried this code but it is not working.when i am try following:
print_r($_REQUEST['data']);exit;
output is:
{"id":"51","name":"ds"","address":"stffu fsagu asfhgui fsahgiu3
means it is bracking from # character.
(sidenote: i am working on api for iphone so request came from iphone,framework:CI)
so how to get url which contain special character and how to decode it?

The # character marks the beginning of the fragment part of the URL.
You need to properly URL-encode the URL for this to work.
For example, your JSON, when correctly URL-encoded, becomes:
%7B%22id%22%3A%2251%22%2C%22name%22%3A%22abc%22%2C%22address%22%3A%22stffu%20fsagu%20asfhgui%20fsahgiu3%23%24%40%5E%5E%40%23%25%5E3%206%5C%22%5C%22wkgforqf%5C%22%3Brqgjrg..%2C%2C%2Crqwgtr%27%27qwrgtrw%27trwqt%27rqwtqwr%20trqt%5Cn%22%7D
The entire URL becomes:
http://localhost/pro/api/index/update_profile?data=%7B%22id%22%3A%2251%22%2C%22name%22%3A%22abc%22%2C%22address%22%3A%22stffu%20fsagu%20asfhgui%20fsahgiu3%23%24%40%5E%5E%40%23%25%5E3%206%5C%22%5C%22wkgforqf%5C%22%3Brqgjrg..%2C%2C%2Crqwgtr%27%27qwrgtrw%27trwqt%27rqwtqwr%20trqt%5Cn%22%7D
Check the documentation of your language of choice to find the correct method for URL-encoding characters.
For example, in PHP, this is rawurlencode and in JavaScript this is encodeURIComponent.
If necessary, there are also plenty of URL coders online, such as this website.

You are manipulating the $data in some ways that aren't really necessary. htmlspecialchars() and htmlentities() make sense if applied to specific values - not the whole JSON. The danger is that they mess up the JSON, it is only important here to urldecode()!
$jsonFix = urldecode($data);
$json = json_decode($jsonFix, true);
This already works and doesn't leave any character out.
If you plan to post something of that and want to escape it, you can do it like so
htmlspecialchars($json['address'], ENT_QUOTES)

Can't you just replace the "#" character with something like "&hashtagChar;" before you process, and put it back afterwards?

about use regex to convert url to link

I need to convert the url in the article to the 3g domain.
for example, i need to convert
here is the link:http://www.mydomain.com/index thanks
to
here is the link:<a href='http://3g.mydomain.com$4' target='_self'>http://3g.$3.com$4</a> thanks
don't convert the other domain, just mydomain. here is the code:
$c = "/([^'\"=])?http:\/\/([^ ]+?)(mydomain)\.com([A-Za-z0-9&%\?=\/\-\._#]*)/";
$b=preg_replace($c, "$1<a href='http://3g.$3.com$4' target='_self'>http://3g.$3.com$4</a>",$b);
it works very well,but if the text like this:
a link
it will return the wrong result like this:
a link
but l need the result of
a link
how should i do?

You should do the following:
Strip target attributes from existing hyperlinks
Rewrite hyperlinks in href attributes
Rewrite any other hyperlinks
$plain = "http://([^ ]+?)(mydomain)\.com(/?[^'\"\s]*(?=['\"\s]))";
$plain_replace = "http://3g.$3.com$4";
$in_href = "href=(['\"])" + plain + "(['\"])";
$in_href_replace = "href='http://3g.$3.com$4' target='self'";
$strip_target = "target=['\"][^'\"]*['\"]";
...
So:
Replace $strip_target with ""
Replace $in_href with $in_href_replace
Replace $plain with $plain_replace
(The regexes are tested to work in C#, you might have to adjust the \ escaping to suit the php regex rules.)

Get rid of the first ? in your regular expression. That allows for the absence of a preceding character.
Or, perhaps more to your intention, if you want to allow URLs at the beginning, you can replace:
([^'\"=])?
with:
(^|[^'\"=])
...which will allow a link if at the very beginning, or if not preceded by a quote, etc., but not otherwise.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.