php preg_match partial matching - php

I'm trying to parse a string into components. My solution works for full strings, but I want to be able to account for strings with potentially fewer components. For instance, I want to be able to match G02F 1/1335, G02F 1, G02F, etc. With preg_match, if not all the capturing groups match, the entire output is invalid.
$string = 'G02F 1/1335';
$string = strtoupper(preg_replace('/\s+/', '', $string));
preg_match('%^([A-H])([0-9]{1,2})([A-Z])([0-9]{1,4})/([0-9]{1,6})$%', $string, $parsed);

As #mario suggested in comment, make subpatterns optional with ?:
preg_match( '%^([A-H])(\d{1,2})([A-Z])\s*(\d{1,4})?/?(\d{1,6})?$%', $string, $parsed );

Related

How to not perform preg_replace if subject starts with quote

I'm trying to convert plain links to HTML links using preg_replace. However it's replacing links that are already converted.
To combat this I'd like it to ignore the replacement if the link starts with a quote.
I think a positive lookahead may be needed but everything I've tried hasn't worked.
$string = 'test http://www.example.com';
$string = preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $string);
var_dump($string);
The above outputs:
http://www.example.com">test</a> http://www.example.com
When it should output:
test http://www.example.com
You might get along with lookarounds.
Lookarounds are zero-width assertions that make sure to match/not to match anything immediately around the string in question. They do not consume any characters.
That being said, a negative lookbehind might be what you need in your situation:
(?<![">])\bhttps?://\S+\b
In PHP this would be:
<?php
$string = 'I want to be transformed to a proper link: http://www.google.com ';
$string .= 'But please leave me alone ';
$string .= '(https://www.google.com).';
$regex = '~ # delimiter
(?<![">]) # a neg. lookbehind
https?://\S+ # http:// or https:// followed by not a whitespace
\b # a word boundary
~x'; # verbose to enable this explanation.
$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>
See a demo on ideone.com. However, maybe a parser is more appropriate.
Since you can use Arrays in preg_replace, this might be convenient to use depending on what you want to achieve:
<?php
$string = 'test http://www.example.com';
$rx = array("&(<a.+https?:\/\/[\w]+[^ \,\"\n\r\t<]*>)(.*)(<\/a\>)&si", "&(\s){1,}(https?:\/\/[\w]+[^ \,\"\n\r\t<]*)&");
$rp = array("$1$2$3", "$2");
$string = preg_replace($rx,$rp, $string);
var_dump($string);
// DUMPS:
// 'testhttp://www.example.com'
The Idea
You can split your string at the already existing anchors, and only parse the pieces in between.
The Code
$input = 'test http://www.example.com';
// Split the string at existing anchors
// PREG_SPLIT_DELIM_CAPTURE flag includes the delimiters in the results set
$parts = preg_split('/(<a.*?>.*?<\/a>)/is', $input, PREG_SPLIT_DELIM_CAPTURE);
// Use array_map to parse each piece, and then join all pieces together
$output = join(array_map(function ($key, $part) {
// Because we return the delimiter in the results set,
// every $part with an uneven key is an anchor.
return $key % 2
? preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $part)
: $part;
}, array_keys($parts), $parts);

PHP exploding url from text, possible?

i need to explode youtube url from this line:
[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]
It is possible? I need to delete [embed] & [/embed].
preg_match is what you need.
<?php
$str = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
preg_match("/\[embed\](.*)\[\/embed\]/", $str, $matches);
echo $matches[1]; //https://www.youtube.com/watch?v=L3HQMbQAWRc
$string = '[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]';
$string = str_replace(['[embed]', '[/embed]'], '', $string);
See str_replace
why not use str_replace? :) Quick & Easy
http://php.net/manual/de/function.str-replace.php
Just for good measure, you can also use positive lookbehind's and lookahead's in your regular expressions:
(?<=\[embed\])(.*)(?=\[\/embed\])
You'd use it like this:
$string = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
$pattern = '/(?<=\[embed\])(.*)(?=\[\/embed\])/';
preg_match($pattern, $string, $matches);
echo $match[1];
Here is an explanation of the regex:
(?<=\[embed\]) is a Positive Lookbehind - matches something that follows something else.
(.*) is a Capturing Group - . matches any character (except a newline) with the Quantifier: * which provides matches between zero and unlimited times, as many times as possible. This is what is matched between the groups prior to and after. This are the droids you're looking for.
(?=\[\/embed\]) is a Positive Lookahead - matches things that come before it.

preg_replace everything but # sign

I've searched for an example of this, but can't seem to find it.
I'm looking to replace everything for a string but the #texthere
$Input = this is #cool isn't it?
$Output = #cool
I can remove the #cool using preg_replace("/#(\w+)/", "", $Input); but can't figure out how to do the opposite
You could match #\w+ and then replace the original string. Or, if you need to use preg_replace, you should be able to replace everything with the first capture group:
$output = preg_replace('/.*(#\w+).*/', '\1', $input);
Solution using preg_match (I assume this will perform better):
$matches = array();
preg_match('/#\w+/', $input, $matches);
$output = $matches[0];
Both patterns above do not address the issue how to handle inputs which match multiple times, such as this is #cool and #awesome, right?

Get integer value from malformed query string

I'm looking for an way to parse a substring using PHP, and have come across preg_match however I can't seem to work out the rule that I need.
I am parsing a web page and need to grab a numeric value from the string, the string is like this
producturl.php?id=736375493?=tm
I need to be able to obtain this part of the string:
736375493
$matches = array();
preg_match('/id=([0-9]+)\?/', $url, $matches);
This is safe for if the format changes. slandau's answer won't work if you ever have any other numbers in the URL.
php.net/preg-match
<?php
$string = "producturl.php?id=736375493?=tm";
preg_match('~id=(\d+)~', $string, $m );
var_dump($m[1]); // $m[1] is your string
?>
$string = "producturl.php?id=736375493?=tm";
$number = preg_replace("/[^0-9]/", '', $string);
Unfortunately, you have a malformed url query string, so a regex technique is most appropriate. See what I mean.
There is no need for capture groups. Just match id= then forget those characters with \K, then isolate the following one or more digital characters.
Code (Demo)
$str = 'producturl.php?id=736375493?=tm';
echo preg_match('~id=\K\d+~', $str, $out) ? $out[0] : 'no match';
Output:
736375493
For completeness, there 8s another way to scan the formatted string and explicitly return an int-typed value. (Demo)
var_dump(
sscanf($str, '%*[^?]?id=%d')[0]
);
The %*[^?] means: greedily match one or more non-question mark characters, but do not capture the substring. The remainder of the format parameter matches the literal sequence ?id=, then greedily captures one or more numbers. The returned value will be cast as an integer because of the %d placeholder.

split email from string with PHP

I need to be able to split a string that contains email's From information. From the string I need to extract $NAME and $EMAIL or whatever is available.
The string can be in the following formats:
"Santa Clause" <santa#example.com>
Santa Clause <santa#example.com>
<santa#example.com>
preg_match('#(?:"(?<name>[^"]+)"|(?<name>.+))?<(?<email>.+)>#U', $string, $matches);
echo var_dump($matches);
preg_match('#(?:"(?<name>[^"]+)"|(?<name>.+))?<(?<email>[^>]+)>#U', $string, $matches);
echo var_dump($matches);
Try one of the above. The former will allow more valid emails, whereas the latter is faster.
$string_to_check = '"Santa Clause" <santa#npole.com>'
$matches = array();
preg_match('/?([^<"]*)"?\s*<(\S*)>/',$string_to_check,$matches);
$matches[1] //=> Santa Claus
$matches[2] //=> santa#npole.com
If the separator is always the same character (e.g. the semicolon):
$items = explode($separator, $from);
Otherwise, browse around in the preg_XXX functions for regex-based string splitting.
For the mail adress, have a look at http://php.net/manual/en/function.preg-match.php. This is a function that matches a string against a regular expression. Here's a short intro into how to use regular expressions with PHP.
If you want to match the name also, it will be some effort, so I suggest you first develop a regular expression that can extract an email address out of your string and then augment it to find the name also.
Found this and it works great!
$parts = preg_split('/[\'"<>]( *[\'"<>])*/', $text, -1, PREG_SPLIT_NO_EMPTY);

Categories