Replace spaces in all URLs with %20 using Regex - php

I have a large block of HTML that contains multiples URLs with spaces in them. How do I used Regex to replace any space that occurs in a URL, with a '%20'. The good thing is that all of the URLs end with '.pdf'.
Looking for something I could run in BBedit/Text Wrangler, or even PHP.
Example: http://www.site-name.com/dir/file name here.pdf
Need to return: http://www.site-name.com/dir/file%20name%20here.pdf

Instead of Regex you could use could use urlencode in PHP to achieve this which escapes the url for you. Similar to encodeURI in JavaScript.

I was faced with exactly the same problem. I solved it with this:
$text = preg_replace("/http(.*) (.*)\.pdf/U", "http$1%20$2.pdf", $text);
This looks for a space between http and pdf and then replaces the space with %20.
If your URLs have multiple spaces, then simply run the code over and over until all the spaces are gone:
while(preg_match("/http(.*) (.*)\.pdf/U", $text))
{
$text = preg_replace("/http(.*) (.*)\.pdf/U", "http$1%20$2.pdf", $text);
echo('testing testing');
}
However, I've found this will overwrite text if there are two or more URLs on the same line. I haven't found a solution for this yet.

Related

PHP Remove all non letters

I have several strings that look like this:
Lasklé
Jones & Jon
I am trying to send them via the foursquare API to be matched, however it is failing with these characters. Is there a way to sanitise these so they only include English letters i.e. the results would be:
Lasklé
Jones Jon
As it appears using file_get_contents requests both with the 'é' and the '&' in the URL is causing issues.
I checked how the request was sent and realised that the '&' is uneeded and is causing the issues, is it possible to remove all non Letters/Numbers from the name?
What do the strings look like before you pass them? If your string looks like 'Lasklé' then I think you are using the wrong character set when reading the string, try using UTF-8.
If the string looks correct before you pass it on you should try urlencode the string first.
you can use preg_replace() function to replace the part of string using regex
to keep only letters you can use as follow it will also remove space( add \s from expression to keep space)
preg_replace('/[^a-zA-Z]/','',$string);
to keep space in the string or any character to keep you can add it in []
preg_replace('/[^a-zA-Z\s]/','',$string);
Use this to escape (space and '-'). Good for making a custom URL
$string=preg_replace("/[^A-Za-z0-9\s\/\-]/", '', $string);

Preg_replace for url and links

Right now
I'm using
$content = preg_replace('#(https?://([-\w\.]+)+(:\d+)?((/[\w/_\.%\-+~]*)?(\?\S+)?)?)#', '$1', $content);
for replace url with links but it doesn't works with some symbols like # and so many other
and also i want that if the content appears like this
http://www.abc.com/
then the preg_replace skip this otherwise it will duplicate the same and produces wrong result.
The text helper class from Kohana has a function for this that would probably be a good starting point: https://github.com/kohana/core/blob/3.2/master/classes/kohana/text.php#L362
Why not just look for anything starting with http:// or https:// up until any whitespace character?
https?://[^\s]+
That is obviously pretty forgiving, the only problem is that you might get some false positives.

preg_replace image src with full url

I have seen lots of similar queries to this, but am struggling to get them to work in my application because I still don't fully understand regular expressions!
I'm using the old FCKEditor WYSIWYG to upload an image, but need to store the src as the full URL rather than the relative path.
At the time I need to do the replace, I've already replaced quotes with " so the pattern I'm looking for needs to be:
src=\"/userfiles/
This needs to be replaced with
src=\"http://mydomain.com/userfiles/
Thanks for your suggestions!!
you can actually do this with a str_replace and it'd be simpler but here's a preg.
$html = preg_replace('!src="/userfiles/!', 'src="http://mydomain.com/userfiles", $html)
here's the str_replace
$html = str_replace('src="/userfiles/', 'src="http://mydomain.com/userfiles", $html)
if there are spaces here and there you'll need the preg and you'll want to add
\s* in the places that have spaces.

Link converting pregmatch working with markdown

function makeLinks($text) {
$text = preg_replace('%(?<!href=")(((f|ht){1}(tp://|tps://))[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
It misses if I have something like this: - www.website.org (a hyphen then a space) at the beginning of a line. If I have - www.website.org - www.website.org it catches the second one.
Shouldn't that be covered by the space in the second preg_replace?
I also tried %(\s\n\r(){})
I am running it through markdown, but not till after (markdown(makeLinks($foo))) so I thought that shouldn't interfere, but when I take the markdown off and everything just echos out in one line, it does make links out of them. If i put makeLinks(markdown($foo)) it behaves the same as initially.. not making links out of the ones that begin with www at the beginning of list items.
Thats some pretty dodgy regex work there. Here is a regex I would recommend instead for URL dectection:
%(?<!href="?)(((f|ht)(tp://|tps://))?[a-zA-Z0-9-].[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i
Should be a lot more reliable than the two you have now.

Regex to change spaces in images into entities

I'm having a lot of difficulty matching an image url with spaces.
I need to make this
http://site.com/site.com/files/images/img 2 (5).jpg
into a div like this:
.replace(/(http:\/\/([^\s]+\.(jpg|png|gif)))/ig, "<div style=\"background: url($1)\"></div>")
Here's the thread about that:
regex matching image url with spaces
Now I've decided to first make the spaces into entities so that the above regex will work.
But I'm really having a lot of difficulty doing so.
Something like this:
.replace(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/ig, "http://$1/$2%20$3$4")
Replaces one space, but all the rest are still spaces.
I need to write a regex that says, make all spaces between http:// and an image extension (png|jpg|gif) into %20.
At this point, frankly not sure if it's even possible. Any help is appreciated, thanks.
Trying Paolo's escape:
.escape(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/)
Another way I can do this is to escape serverside in PHP, and in PHP I can directly mess with the file name without having to match it in regex.
But as far as I know something like htmlentities do not apply to spaces. Any hints in this direction would be great as well.
Try the escape function:
>>> escape("test you");
test%20you
If you want to control the replacement character but don't want to use a regular expression, a simple...
$destName = str_replace(' ', '-', $sourceName);
..would probably be the more efficient solution.
Lets say you have the string variable urlWithSpaces which is set to a URL which contains spaces.
Simply go:
urlWithoutSpaces = escape(urlWithSpaces);
What about urlencode() - that may do what you want.
On the JS side you should be using encodeURI(), and escape() only as a fallback. The reason to use encodeURI() is that it uses UTF-8 for encoding, while escape() uses ISO Latin. Same problems applies for decoding.
encodeURI = encodeURI || escape;
alert(encodeURI('image name.png'));

Categories