How to replace http:// or www with <a href.. in PHP - php

I've created this regex
(www|http://)[^ ]+
that match every http://... or www.... but I dont know how to make preg_replace that would work, I've tried
preg_replace('/((www|http://)[^ ]+)/', '\1', $str);
but it doesn't work, the result is empty string.

You need to escape the slashes in the regex because you are using slashes as the delimiter. You could also use another symbol as the delimiter.
// escaped
preg_replace('/((www|http:\/\/)[^ ]+)/', '\1', $str);
// another delimiter, '#'
preg_replace('#((www|http://)[^ ]+)#', '\1', $str);

When using the regex codes provided by the other users, be sure to add the "i" flag to enable case-insensitivity, so it'll work with both HTTP:// and http://. For example, using chaos's code:
preg_replace('!(www|http://[^ ]+)!i', '\1', $str);

First of all, you need to escape—or even better, replace—the delimeters as explained in the other answers.
preg_replace('~((www|http://)[^ ]+)~', '\1', $str);
Secondly, to further improve the regex, the $n replacement reference syntax is preferred over \\n, as stated in the manual.
preg_replace('~((www|http://)[^ ]+)~', '$1', $str);
Thirdly, you are needlessly using capturing parentheses, which only slows things down. Get rid of them. Don't forget to update $1 to $0. In case you are wondering, these are non-capturing parentheses: (?: ).
preg_replace('~(?:www|http://)[^ ]+~', '$0', $str);
Finally, I would replace [^ ]+ with the shorter and more accurate \S, which is the opposite of \s. Note that [^ ]+ does not allow spaces, but accepts newlines and tabs! \S does not.
preg_replace('~(?:www|http://)\S+~', '$0', $str);

Your main problem seems to be that you are putting everything in parentheses, so it doesn't know what "\1" is. Also, you need to escape the "/". So try this:
preg_replace('/(www|http:\/\/[^ ]+)/', '\1', $str);
Edit: It actually seems the parentheses were not an issue, I misread it. The escaping was still an issue as others also pointed out. Either solution should work.

preg_replace('!((?:www|http://)[^ ]+)!', '\1', $str);
When you use / as your pattern delimiter, having / inside your pattern will not work out well. I solved this by using ! as the pattern delimiter, but you could escape your slashes with backslashes instead.
I also didn't see any reason why you were doing two paren captures, so I removed one of them.
Part of the trouble in your situation is that you're running with warnings suppressed; if you had error_reporting(E_ALL) on, you'd have seen the messages PHP is trying to generate about your delimiter problem in your regex.

If there are multiple url contained in a string a separated by a line break instead of a space, you have to use the \S
preg_replace('/((www|http:\/\/)\S+)/', '$1', $val);

Related

Explode and/or regex text to HTML link in PHP

I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP
"text1(text1)":http://www.example.com/mypage
Notes:
text1 is always identical to the text in parenthesis
The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.
Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.
I need to turn these into basic links, like
text1
How do I do this? Do I need explode or regex or both?
"(.*?)\(\1\)":(.*\/[a-zA-Z0-9]+)(?=\?|\,|\.|$)
You can use this.
See Demo.
http://regex101.com/r/zF6xM2/2
You can use this replacement:
$pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~';
$replacement = '\1';
$result = preg_replace($pattern, $replacement, $text);
pattern details:
([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:
it allows to use a greedy quantifier, that is faster
since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)
\S+ means all that is not a whitespace one or more times
(?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.
You can use this regex:
"(.*?)\(.*?:(.*)
Working demo
An appropriate Regular Expression could be:
$str = '"text1(text1)":http://www.example.com/mypage';
preg_match('#^"([^\(]+)' .
'\(([^\)]+)\)[^"]*":(.+)#', $str, $m);
print ''.$m[2].'' . PHP_EOL;

RegEx preg_match_all() and white spaces in PHP

I have a Problem with RegEx and WhiteSpaces.
I want to split a Text in an Array which is marked with (....)
preg_match_all("/\([a-z0-9\s]+\)/i", $str,$a);
To catch whithe spaces I tried to use [\040] [\s] but nothing worked for me!
Is there a posibiltity to say [ANY Character DIGIT and WHITESPACES and special character] ?
greetz
fluxa
You can also said "any characters but not a )"
With your example :
preg_match_all("/\([^\)]+\)/i", $str, $a);
You could use a lazy search using.*? instead (which will stop matching as soon as it can, compared to greedy, .* which will match as much as possible).
Regex: \((.*?)\)
In code:
preg_match_all("#\((.*?)\)#", $str, $a);

Replacing multiple slashes with exception in regex

There are quite a few questions on removing multiple slashes using regex in PHP. However, I have a special case I would like to exclude.
I have a full URL as my input: http://localhost/path/to/whatever
I have written to regex to convert backslashes to forward slashes, and then remove multiple consecutive slashes:
$cleaned = preg_replace('/(\\\+)|(\/+)/', "/", trim($input));
This works fine for the most part, however I need to be able to exclude the :// case, otherwise using that expression will result in which is not the intended result:
http:/localhost/path/to/whatever
I have tried using /(\\\+)|^[:](\/+)/, but this doesn't seem to work.
How can I exclude the :// case in my expression?
$cleaned = preg_replace('~(?<!https:|http:)[/\\\\]+~', "/", trim($input));
The subexpression inside the lookbehind can't use quantifiers, so the obvious approach - (?<!https?:) - won't work. But it can be made up of two or more fixed-length alternatives with different lengths. For example:
(?<!https:|http:) # OK
Be aware that the alternation has to be at the top level of the lookbehind, so this won't work:
(?<!(https:|http:)) # error
There is something called "negative look behind" (also available in positive or look ahead)
http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html
With this you could add an exception by something like
(?<=^https?:)
Then your expression will only match in places NOT preceded by "http:"
Simply a negative look-behind for a colon, preceding two or more forward or backward slashes:
$cleaned = preg_replace('/(?<!:)(?:\\/|\\\\){2,}/', "/", trim($input));

rexexp solution for php

I have tried to work this out myself (even bought a Kindle book!), but I am struggling with backreferences in php.
What I want is like the following example:
var $html = "hello %world|/worldlink/% again";
output:
hello world again
I tried stuff like:
preg_replace('/%([a-z]+)|([a-z]+)%/', '\1', $html);
but with no joy.
Any ideas please? I am sure someone will post the exact answer but I would like an explanation as well please - so that I don't have to keep asking these questions :)
The slashes "/" are not included in your allowed range [a-z]. Instead use
preg_replace('/%([a-z]+)\|([a-z\/]+)%/', '\1', $html);
Your expression:
'/%([a-z]+)|([a-z]+)%/'
Is only capturing one thing. The | in the middle means "OR". You're trying to capture both, so you don't need an OR in there. You want a literal | symbol so you need to escape it:
'/%([a-z]+)\|([a-z\/]+)%/'
The / character also needs to be included in your char set, and escaped as above.
Your regex (/%([a-z]+)|([a-z]+)%/) reads this way:
Match % followed by + (= one or
more) a-z characters (and store this
into backreference #1).
Or (the |):
Match + (= one or more) a-z
characters (and store this into
backreference #2) followed by a
%.
What you are looking for is:
preg_replace('~%([a-z]+)[|]([a-z/]+)%~', '$1', $html);
Basically I just escaped the | regex meta character (you can do this by either surrounding it with [] like I did or just prepending a backwards slash \, personally I find the former easier to read), and added a / to the second capture group.
I also changed your delimiters from / to ~ because tildes are much more unlikely to appear in strings, if you want to keep using / as your delimiter you also have to escape their occurrences in your regex.
It's also recommended that you use the $ syntax instead of \ in your replacement backreferences:
$replacement may contain references
of the form \\n or (since PHP 4.0.4)
$n, with the latter form being the
preferred one.
Here is a version that works according to the OPs data/information provided (using a non-slash delimiter to avoid escaping slashes):
preg_replace('#%([a-z]+)\|([a-z/]+)%#', '\1', $html);
Using a non slash delimiter, would alleviate the need to escape slashes.
Outputs:
hello world again
The Explanation
Why yours did not work. First up the | is an OR operator, and, in your example, should be escaped. Second up, since you are using /'s or expect slashes it is better to use a non-slash delimiter, such as #. Third up, the slash needed to be added to list of allowed matches. As stated before you may want to include a bit more options, as any type of word with numbers underscores periods hyphens will fail / break the script. Hopefully that is the explanation you were looking for.
Here's what works for me:
preg_replace('/%([a-z]+)\|([a-z\/]+)%/', '\1', $html);
Your regular expression doesn't escape the |, and doesn't include the proper characters for the URL.
Here's a basic live example supporting only a-z and slashes:
preg_replace('/%([a-z]+)\|([a-z\/]+)%/', '\1', $html);
In reality, you're going to want to change those [a-z]+ blocks to something more expressive. Do some searches for URL-matching regular expressions, and pick one that fits what you want.
$html = "hello %world|/worldlink/% again";
echo preg_replace('/([A-ZA-z_ ]*)%(.+)\|(.+)%([A-ZA-z_ ]*)/', '$1$2$4', $html);
output:
hello world again
here is a working code : http://www.ideone.com/0qhZ8

Can I use preg_replace to strip the trailing \n in multiline mode?

I just answered a question here, which has left me asking my own question.
Basically, the OP wanted any line to be removed if it contained a string. Here is a regex I came up with...
$str = preg_replace('/^.*\b["(]?hello["(]?\b.*$/m', '', $str);
It works great, but because $ matches before the trailing \n, they remain when replaced and there are blank lines, e.g.
string(90) "
What it shouldhello do is:
could be words in between brackets and inverted commas also."
I can't use \z because I'm in multiline mode (at least that is what I think).
If I use s modifier, the . become too greedy and don't work across the newlines.
I have tried a few things (such as [^\n] and [\s\S]), and now I am stumped.
How can I match that trailing \n here so it is removed with the replace?
Use \n instead of $.
$str = preg_replace('/^.*\b["(]?hello["(]?\b.*\n/m', '', $str);
This misses the last line if it had hello in it, so here's a step further.
$str = preg_replace('/^.*\b["(]?hello["(]?\b.*(\n|$)/m', '', $str);
The issue now is that the last line is removed, but there is still a \n character (similar issue you're already having).
Note: I'm not a regular expression expert at all, just can usually do enough for my needs.
I feel pretty silly for not figuring this one out, but I used Jacob's answer as a basis...
$str = preg_replace('/^.*\b["(]?hello["(]?\b.*\n?/m', '', $str);
Looks like I was just too keen to use the end of line anchor.
This one allows for the optional \n at the end. It also leaves a \n at the end of the last non matched line, but I could always then trim($str).
Please try this:
$str = preg_replace('/^.*?\b["(]?hello["(]?\b.*?$/s', '', $str);
.*? makes it non-greedy.

Categories