I am stuck big time on this problem and google has been of no help to me so far. I am trying to find a way to preserve white space in a URL with moderate to no luck.
I have a form that needs to gather post data, mail it, and then append the post data to the URL as comma separated value and redirects them to a page where they download a product.
Once the user presses download that page reads the data in the URL and applies it to a billing invoice (the program is billed on time usage).
A simplified example:
$addressOne = $_POST['addressOne'];
$newURL = "http://subdomain.domain.com/connectnow=on?" . ", Address1=" . $addressOne;
If(mailSent) {
header("Location: $newURL")
}
There are a lot more values obviously, but the address is one of the areas that I am having this issue.
I have tried doing something like:
$newURL = str_replace(" ", " ", $newURL);
That worked as far as preserving the whitespace in the URL visually, but when the program that gets downloaded reads the URL it replaces the as %C2%.
I have also tried:
$newURL = str_replace(" ", " \40", $newURL);
That made the spaces in the URL convert back to %20.
Any guidance would be appreciated.
URL:
www.site.com/my spaces preserved/
urlencode()
www.site.com%2Fmy+spaces+preserved%2F
urldecode()
www.site.com/my spaces preserved/
Related
I want to make like a proxy page (not for proxy at all) and as i knew i need to change all URLS SRC LINK and so on to others - for styles and images grab from right play, and urls goto throught my page going to $_GET["url"] and then to give me next page.
But iv tied to preg_replace() each element, also im not so good with it, and if on one website it works, on another i cant see CSS for example...
The first question is there are any PHP classes or just scripts to make it easy? (I was trying to google hours)
And if not help me with the following code :
<?php
$url = $_GET["url"];
$text = file_get_contents($url);
$data = parse_url($url);
$url=$data['scheme'].'://'.$data['host'];
$text = preg_replace('|<iframe [^>]*[^>]*|', '', $text);
$text = preg_replace('/<a(.*?)href="([^"]*)"(.*?)>/','<a $1 href="http://my.site/?url='.$url.'$2" $3>',$text);
$text = preg_replace('/<link(.*?)href="(?!http:\/\/)([^"]+)"(.*?)/', "<link $1 href=\"".$url."/\\2\"$3", $text);
$text = preg_replace('/src="(?!http:\/\/)([^"]+)"/', "src=\"".$url."/\\1\"", $text);
$text = preg_replace('/background:url\(([^"]*)\)/',"background:url(".$url."$1)", $text);
echo $text;
?>
Replacing with "src" №4 i need to denied replace when starts from double slash, because it could starts like 'src="//somethingdomain"' and not need to replace them.
Also i need to ignore replace №2 when href is going to the same domain, or it looks like need.site/news.need.site/324244
And is it possible to pass action in form throught my script? For example google search query.
And one small problem one web site is openning corrent some times before, but after iv open it hundreds times by this script in getting unknown symbols (without any divs body etc...) ��S�n�#�� i was trying to encode to UTF-8 ANSI but symbol just changing,
maybe they ban me ? oO
function link_replace($url,$myurl) {
$content = file_get_contents($url);
$content = preg_replace('#href="(http)(.*?)"#is', 'href="'.$myurl.'?url=$1$2"', $content);
$content = preg_replace('#href="([^http])(.*?)"#is', 'href="'.$myurl.'?url='.$url.'$1$2"', $content);
return $content;
}
echo link_replace($url,$myurl);
I'm not absolutely sure but I guess the result is just compressed e.g. with gzip try removing the accepted encoding headers while proxying the request.
Something I have noticed on the StackOverflow website:
If you visit the URL of a question on StackOverflow.com:
"https://stackoverflow.com/questions/10721603"
The website adds the name of the question to the end of the URL, so it turns into:
"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"
This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.
What I wanted to Achieve after seeing this Implementation on StackOverflow
I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.
My Code so Far
Please see it working by clicking here
// Set the title of the page article (This could be from the database). Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');
// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);
// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);
// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);
// Show the finished name on the screen
print_r($removed_dashes);
The Problem
I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.
It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.
In summary, the code achieves the following:
Removes any "strange" characters and replaces them with a space
Replaces any spaces with a dash to make it URL friendly
Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()
Thanks for your help!
Potential Solutions
I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.
I found this library on Google code which appears to work well in the first instance.
There is also a notable question on this on SO which can be found here, which has other examples.
I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages.
What I ended up doing was simply trimming the title, and using urlencode
$url_slug = urlencode($title);
Also I had to add those:
$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded
There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator
Indeed, you can do that:
$original_name = ' How to get file creation & modification date/times in Python with-dash?';
$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');
To deal with other alphabets you can use this pattern instead:
~\P{Xan}++~u
or
~[^\pL\pN]++~u
Currently i was facing a problem while developing a dictionary where i can show google image with meanings and it works fine.
The problem was the API was showing this warning message while submitting more than 1 words to the URL in PHP.
Warning: file_get_contents(https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=Pakistani Flag)
In the above example the API works fine to find out picture for Pakistani but adding Flag creating problems to show the message given above.
$encoded = urlencode('Pakistani Flag');
We can reslove the problem by replacing the empty spaces with %20 in PHP, for example your words are stored in a varible $word
$word = "Pakistani Flag";
convert the words with
$word_con = str_replace(" ", "%20", $word);
Finally we have
https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=$word_con
which works absolutely perfect!
My code is:
$rawhtml = file_get_contents( "site url" );
$pat= '/((http|ftp|https):\/\/[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]/i';
preg_match_all($pat,$rawhtml,$matches1);
foreach($matches1[1] as $plinks)
{
$links_array[]=$plinks;
}
After testing several situations I noted that the function had some "leaks". The link gets broken if there is whitespace.
For example I have this text URL in a variable:
$rawhtml = " http://www.filesonic.com/file/2185085531/TEST Voice 640-461 Test Cert Guide.epub
"
The result should be one link by line:
http://www.filesonic.com/file/2185085481/TEST Voice (640)+461 Test Cert Guide.pdf
but the result is
http://www.filesonic.com/file/2185085531/TEST
Sometimes extracted links also contains , or ' or " at the end. How to get rid of these?
how to get rid of those commas,quotes or double quotes from the extracted links
One could use (?<![,'"]) to exclude something at the end. But your problem is that you simply shouldn't use the trailing character class:
[^\w#$&+,\/:;=?#.-]
That's what matches " and '.
As a hackish workaround to the other problem, the first character class could be augmented with a space.
[\w#$&+,\/:;=?#. -]+
▵
As said, that's probably not a good solution and might lead to other mismatches.
I wrote a php site (it's still a prototype) and I placed a Digg button on it. It was easy but...
The official manual says: "the URL has to be encoded". I did that with urlencode(). After urlencode, my URL looks like this:
http%3A%2F%2Fwww.mysite.com%2Fen%2Fredirect.php%3Fl%3Dhttp%3A%2F%2Fwww.othersite.rs%2FNews%2FWorld%2F227040%2FRusia-Airplane-crashed%26N%3DRusia%3A+Airplane+crashed
So far it's good, but when I want to submit that URL to Digg, it is recognized as an invalid URL:
http://www.mysite.com/en/redirect.php?l=http://www.othersite.rs/News/World/227040/Rusia-Airplane-crashed&N=Rusia:+Airplane crashed
If I place a "+" between "Airplane" and "crashed" (at the end of the link), then Digg recognizes it without any problems!
Please help, this bizarre problem is killing my brain cells!
P.S. For purpose of this answer, urls are changed (to nonexisting ones) because, in the original, non-english sites are involved.
After you've urlencode()ed it, encode the resulting plus signs as well:
$encoded_url = urlencode($original_url);
$final_url = str_replace('+', '%2B', $encoded_url);
Or alternatively, you could replace spaces in your URL with + first, and then urlencode() the result:
$spaceless_url = str_replace(' ', '+', $original_url);
$final_url = urlencode($spaceless_url);
If your own site required the parameters in the query string to be encoded in the first place, you wouldn't have the issue (since there wouldn't be an unencoded space in the original URL).