Pull out second url from string via regex - php

I am looking for a way within PHP to pull the second link http://secure.hello.com out of the following string via regex.
http://hello.com/http://secure.hello.com
The string may or may not contain https in either of the spots.

Try this regular expression:
://\S*/(\w+://\S*)
It searches for the first :// then searches for a slash followed by some word characters, then ://, then anything apart from spaces. The text you want is in the first capturing group.
In a PHP string literal it can be written as:
'#://\S*/(\w+://\S*)#'
See it working online: ideone
If you want to restrict to http or https, change the \w+ to https?.

Related

Match string not followed by another string

I'm trying to match a string. For example, in the following strings, first should match, the second should not.
/users/akinuri/
/users/akinuri/asd/
I've tried a negative lookahead, but it didn't work. Obviously I'm doing something wrong.
if (preg_match("/\/users\/.*?\/(?!.*\/)/", "/users/akinuri/asd/")) {
echo "match";
}
I'm experimenting (trying to create a route system). Right now, I'm just trying to determine if the requested uri is valid. If a visitor requests the second string, it should return false and I'll return a not found page. For example, test this on SO. The second url returns a not found page.
https://stackoverflow.com/users/2202732/akinuri/
https://stackoverflow.com/users/2202732/akinuri/asdasd
So, can I do this check using regex? If so, how? What am I doing wrong?
Or should I split the text and then do further checks? This seems a bit redundant.
You don't need a lookahead for this. You can use a negated character class [^/]+ to match 1+ of any character except /. You also need to use anchors in regex to make sure you match complete input.
For PHP code, you can use this regex:
'~^/users/[^/]+/?$~'
Note that /?$ makes trailing slash an optional match.
RegEx Demo

PHP Regex that matches URLs without invalid characters

I'm looking for a Regex that will find URLs in a string but ignore pre/following characters which are not part of the URL.
for example, from the string:
example.co.uk (main site: example.com),
The Regex will find:
example.co.uk and exaple.com.
In order to find URLs within a given string, I use the Regex '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i'.
The problem is that if I use this regex with the given string above, it finds example.co.uk and example.com) with the closing bracket at the end.
Is there any Regex that can find URLs in a string, not matter what characters it has from both sides?
Thanks!
You may have to use a word boundary (\b) ...
(?:www\.|https?:\/\/)?[a-z0-9]+\.[a-z0-9]{2,4}\S*\b
^
regex demo

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

Adding any pattern to match in regex

I have a regex that works pretty well except in one particular situation;
$message = preg_replace("#(^(http(s)?://)(?!img.youtube.com/vi/)([-a-zA-Z?-??-?()0-9#:%_+.~\#?&;//=,]+(\.jpg|\.jpeg|\.gif|\.bmp|\.png)))#i",
"<p><a href='/viewpost.php?messageid=$message_id'><img src='$1' width=100%></a>", $message);
This pattern does several things, 1) It exactly matches http or https, 2) it ignores any string that includes img.youtube.com/vi/ , and 3) it looks for popular image file types in the links. It works the way it should only if their are no character before a sting (a sting like http://exampleaddress/exampleimage.jpeg). If the string is in the middle of a paragraph, it fails.
I need to keep the ^(http(s)?://) as an exact match (removing ^ fixes my problem but causes a conflict with a subsequent regex rule. So, it looks like the problem is that this exact match situation does not want any carriage returns, spaces, or anything to precede ^(http(s)?://) . How can I make the regex work so that nothing before of after the string is relevant, but when you see exactly http or https to apply the rule?
As you know, the ^ anchor requires the string to appear exactly at the beginning of the input string. You can achieve a similar restriction anywhere inside the input string with a \b word boundary. It matches a zero-length string at the start of a word - for example after but not including whitespace.
I'll note also that you do not need to surround the s in a () group, since the ? will match only the single preceding character.
\bhttps?://...

How to exclude a word or string from an URL - Regex

I'm using the following Regex to match all types of URL in PHP (It works very well):
$reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
But now, I want to exclude Youtube, youtu.be and Vimeo URLs:
I'm doing something like this after researching, but it is not working:
$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I want to do this, because I have another regex that match Youtube urls which returns an iframe and this regex is causing confusion between the two Regex.
Any help would be gratefully appreciated, thanks.
socodLib, to exclude something from a string, place yourself at the beginning of the string by anchoring with a ^ (or use another anchor) and use a negative lookahead to assert that the string doesn't contain a word, like so:
^(?!.*?(?:youtube|some other bad word|some\.string\.with\.dots))
Before we make the regex look too complex by concatenating it with yours, let;s see what we would do if you wanted to match some word characters \w+ but not youtube or google, you would write:
^(?!.*?(?:youtube|google))\w+
As you can see, after the assertion (where we say what we don't want), we say what we do want by using the \w+
In your case, let's add a negative lookahead to your initial regex (which I have not tuned):
$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I took the liberty of making the regex case insensitive with (?i). You could also have added i to your s modifier at the end. The youtu\.?be expression allows for an optional dot.
I am certain you can apply this recipe to your expression and other regexes in the future.
Reference
Regex lookarounds
StackOverflow regex FAQ

Categories