Regex trouble, capture until find space or endline - php

I'm trying to capture the following match:
"url: https://www.anysite/anything"
But sometime the string comes:
"url: https://www.anysite/anything another word"
But i just only want to match
"url: https://www.anysite/anything"
whether or not the "another word" comes.
So, my logic is capture until find the first space after the url address, or end of string.
My REGEX IN PHP is:
preg_match("/(Url|url)(\:|\b)(\s\b|\b).+(\s|$)/",$linestring,$url_string);
But it always bring the "another word" too, instead of bring only until space.

The . is greedy unless the quantifier is made ungreedy with a ? or the U modified.
(Url|url)(\:|\b)(\s\b|\b).+?(\s|$)
Your actually can simplify it a bit further:
[Uu]rl(?::|\b)\s?\b.+?(?:\s|$)
If you want the URL bit capture the .+? with ().
[Uu]rl(?::|\b)\s?\b(.+?)(?:\s|$)
https://regex101.com/r/urq2fM/2/

One way to capture until the first space is to use \S+, which matches any sequence of one or more non-space characters:
url:?\s*(\S+)
By using the i flag we can avoid having to test for Url or url or URL etc. We can use preg_replace to simplify usage, replacing the string with just the captured group:
$url = preg_replace('/url:?\s*(\S+).*/i', '$1', $string);
e.g.
$strings = array("url: https://www.anysite/anything",
"url: https://www.anysite/anything another word");
foreach ($strings as $string) {
$url = preg_replace('/url:?\s*(\S+).*/i', '$1', $string);
echo "$url\n";
}
Output:
https://www.anysite/anything
https://www.anysite/anything
Demo on 3v4l.org

Related

i used preg_replace to replace . with (dot) but it is empty

I tried to use preg_replace but it does not work.
I wrote the code below, but it returns as empty.
$str = 'amin.m is 1.2 ^.j ';
echo $str.'<br>';
echo preg_replace('/(\D|\d)\.(\D|\d)', '\1\(dot\)\2', $str);
A couple notes...
Your pattern is using a pipe | between \D and \d. These two characters are exact opposites, so I think what you may be wanting is \S which would match any non-white space character. By removing your pipes, your pattern reduces its step count (improves efficiency).
You don't need to escape your parentheses in your replacement string -- unless you want to see those slashes in the output.
New Pattern: /(\S)\.(\S)/ Demo Link
New PHP: (Demo Link)
$str = 'amin.m is 1.2 ^.j ';
echo $str.'<br>';
echo preg_replace('/(\S)\.(\S)/','\1(dot)\2',$str);
Output:
amin.m is 1.2 ^.j
amin(dot)m is 1(dot)2 ^(dot)j

PHP regex: find the first occurrence of a given pattern in a string

my string can be
new-york-10036
or
chicago-55036
the desired result is
new-york
chicago
and i basically want to remove all the string that come after the first dash - followed by a number
seems easy but i don't know how
You can use Negative Lookahead, like so:
(.+)(?=\-\d)
The regex reads: "get me everything that is not followed by exactly one dash and exactly one number after that".
Given the input new-york-10036 the regex is going to capture only new-york. In PHP you can get the matched string with:
$string = 'new-york-10036';
$regex = '/(.+)(?=\-\d)/';
preg_match($regex, $string, $return);
echo $return[0] . "\n";
It outputs new-york.
See the regex working here.

is it possible to combine seperate pattern groups to one group in regex

for example:
"I am living in Germany." - (I\sam)\sliving\s(in\sGermany)
This gives
1 - I am
2 - in Germany
Is it possible to get the String "I am in Germany" from preg_match ?
You cannot match non-continuous text within one match operation.
Instead, you can use preg_replace with capturing groups around those subpatterns that you want to keep, and restore them in the replacement pattern with backreferences.
So, use
Regex: '~(I\sam\s)living\s(in\sGermany)~'
Replacement: '\1\2'
See demo.
You can also get output without regex:
$mystr = "I am living in Germany.";
$res = str_replace(".", "", "".join(explode("living ", $mystr)));
print $res;

PHP preg_replace - in case of match remove the beginning and end of the string partly matched by regex with one call?

In PHP I try to achive the following (if possible only with the preg_replace function):
Examples:
$example1 = "\\\\\\\\\\GLS\\\\\\\\\\lorem ipsum dolor: T12////GLS////";
$example2 = "\\\\\\GLS\\\\\\hakunamatata ::: T11////GLS//";
$result = preg_replace("/(\\)*GLS(\\)*(.)*(\/)*GLS(\/)*/", "REPLACEMENT", $example1);
// current $result: REPLACEMENT (that means the regex works, but how to replace this?)
// desired $result
// for $example1: lorem ipsum dolor: T12
// for $example2: hakunamatata ::: T11
Have consulted http://php.net/manual/en/function.preg-replace.php of course but my experiments with replacement have not been successful yet.
Is this possible with one single preg_replace or do I have to split the regular expression and replace the front match and the back match seperatly?
If the regex does not match at all I like to receive an error but this i may cover with preg_match first.
The main point is to match and capture what you need with a capturing group and then replace with the back-reference to that group. In your regex, you applied a quantifier to the group ((.)*) and thus you lost access to the whole substring, only the last character is saved in that group.
Note that (.)* matches the same string as (.*), but in the former case you will have 1 character in the capture group as the regex engine grabs a character and saves it in the buffer, then grabs another and re-writes the previous one and so on. With the (.*) expression, all the characters are grabbed together in one chunk and saved into the buffer as one whole substring.
Here is a possible way:
$re = "/\\\\*GLS\\\\*([^\\/]+)\\/+GLS\\/+/";
// Or to use fewer escapes, use other delimiters
// $re = "~\\\\*GLS\\\\*([^/]+)/+GLS/+~";
$str = "\\\\\\GLS\\\\\\hakunamatata ::: T11////GLS//";
$result = preg_replace($re, "$1", $str);
echo $result;
Result of the IDEONE demo: hakunamatata ::: T11.

Regular Expression - php - getting spaces not preceded and not followed by a word

Having something like this:
'This or is or some or information or stuff or attention here or testing'
I want to capture all the [spaces] that aren't preceded nor followed by the word or.
I reached this, I think I'm on the right track.
/\s(?<!(\bor\b))\s(?!(\bor\b))/
or this
/(?=\s(?<!(\bor\b))(?=\s(?!(\bor\b))))/
I'm not getting all the spaces, though. What is wrong with this? (the second one was a tryout to get the "and" going")
Try this:
<?php
$str = 'This or is or some or information or stuff or attention is not here or testing';
$matches = null;
preg_match_all('/(?<!\bor\b)[\s]+(?!\bor\b)/', $str, $matches);
var_dump($matches);
?>
How about (?<!or)\s(?!or):
$str='This or is or some or information or stuff or attention here or testing';
echo preg_replace('/(?<!or)\s(?!or)/','+',$str);
>>> This or is or some or information or stuff or attention+here or testing
This uses negitive lookbehind and lookahead, this will replace the space in Tor operator for example so if you want to match only or add trailing and preceding spaces:
$str='Tor operator';
echo preg_replace('/\s(?<!or)\s(?!or)\s/','+',$str);
>>> Tor operator
Code: (PHP Demo) (Pattern Demo)
$string = "You may organize to find or seek a neighbor or a pastor in a harbor or orchard.";
echo preg_replace('~(?<!\bor) (?!or\b)~', '_', $string);
Output:
You_may_organize_to_find or seek_a_neighbor or a_pastor_in_a_harbor or orchard.
Effectively the pattern says:
Match every space IF:
the space is not preceded by the full word "or" (a word that ends in "or" doesn't count), and
the space is not followed by the full word "or" (a word that begins with "or" doesn't count)

Categories