How to do a Regular Expression exclude comma or periods - php

I'm having issues with the line of PHP. I need to have it select all letters and numbers following the #, but ignore "," or "." (commas or periods). Currently it's including them and I can't seem to get them to exclude them.
Ex: #3431A or #4561AB (but ignore and , or . behind them)
preg_match_all( apply_filters( "wpht_regex_pattern", '/#(\S+)/u' ), strip_tags($content), $hashtags );

You can try "/#[0-9A-Za-z]+/", if you want to select hashtags only having letters and digits.
You may try "/#[^\s,\.]+/", if you want to grab hashtags starting with # and ending just before a whitespace (or tab), comma or period is encountered.
Below is sample PHP code and result:
$content="I need to have it select all letters and numbers following the #, but ignore ',' or '.' (commas or periods). Ex: #3431A D, #3431AB or #4561AB.";
echo "<h2>Regex-1:</h2>";
preg_match_all( "/#[0-9A-Za-z]+/", $content, $hashtags );
print_r($hashtags);
echo "<h2>Regex-2:</h2>";
preg_match_all( "/#[^\s,\.]+/", $content, $hashtags );
print_r($hashtags);
Result:
Regex-1:
Array ( [0] => Array ( [0] => #3431A [1] => #3431AB [2] => #4561AB ) )
Regex-2:
Array ( [0] => Array ( [0] => #3431A [1] => #3431AB [2] => #4561AB ) )

You are matching \S+ which is 1 or more of any non-whitespace character. In your question, you said you wanted sequences of numners and letters. To get letters and numbers, you need a different pattern.
function testFilter($test) {
$content = $test['test'];
echo "Testing {$content}\n";
preg_match_all( apply_filters( "wpht_regex_pattern", '/#([A-Za-z0-9]+)/u' ), strip_tags($content), $hashtags );
$expect = $test['expect'];
echo " ";
if ( ! empty($expect) ) {
$tmp = implode(',', $hashtags[1]);
if ( $tmp != $expect ) echo "FAIL ";
else echo "PASS ";
}
else {
echo " ";
}
echo 'Hashtags: '. implode(',', $hashtags[1]);
echo PHP_EOL;
}
$contentTest = [
['test' => '#shoes, #friends, #beach', 'expect' => 'shoes,friends,beach'],
['test' => '#shoes, #friends6, #2beach', 'expect' => 'shoes,friends6,2beach'],
['test' => '#shoes, #frie_nds, #be^ach', 'expect' => 'shoes,frie,be'],
['test' => 'blah blah #shoes, #friends, #beach', 'expect' => 'shoes,friends,beach'],
['test' => '#shoes, #friends, #beach,', 'expect' => 'shoes,friends,beach'],
['test' => '#shoes, #friends, #beach,#', 'expect' => 'shoes,friends,beach'],
['test' => '#shoes, #friends, #beach som trailing text', 'expect' => 'shoes,friends,beach'],
['test' => '#3431A, #345ADF', 'expect' => '3431A,345ADF'],
['test' => 'The quick brown #fox gave the #99dogs codes #A00BZ90A #45678blah #0569509 #09XX09', 'expect' => 'fox,99dogs,A00BZ90A,45678blah,0569509,09XX09'],
];
foreach ($contentTest as $t) {
testFilter($t);
}
Output:
Testing #shoes, #friends, #beach
PASS Hashtags: shoes,friends,beach
Testing #shoes, #friends6, #2beach
PASS Hashtags: shoes,friends6,2beach
Testing #shoes, #frie_nds, #be^ach
PASS Hashtags: shoes,frie,be
Testing blah blah #shoes, #friends, #beach
PASS Hashtags: shoes,friends,beach
Testing #shoes, #friends, #beach,
PASS Hashtags: shoes,friends,beach
Testing #shoes, #friends, #beach,#
PASS Hashtags: shoes,friends,beach
Testing #shoes, #friends, #beach som trailing text
PASS Hashtags: shoes,friends,beach
Testing #3431A, #345ADF
PASS Hashtags: 3431A,345ADF
Testing The quick brown #fox gave the #99dogs codes #A00BZ90A #45678blah #0569509 #09XX09
PASS Hashtags: fox,99dogs,A00BZ90A,45678blah,0569509,09XX09

Related

PHP regular expression, match the last occurence

I have a php function that splits product names from their color name in woocommerce.
The full string is generally of this form "product name - product color", like for example:
"Boxer Welbar - ligth grey" splits into "Boxer Welbar" and "light grey"
"Longjohn Gari - marine stripe" splits into "Longjohn Gari" and "marine stripe"
But in some cases it can be "Tee-shirt - product color"...and in this case the split doesn't work as I want, because the "-" in Tee-shirt is detected.
How to circumvent this problem? Should I use a "lookahead" statement in the regexp?
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/–|[\p{Pd}\xAD]|(–)/", $currenttitle);
return $splitted;
}
I'd go for a negative lookahead.
Something like this:
-(?!.*-)
that means to search for a - not followed by any other -
This works if in the color name there will never be a -
What about counting space characters that surround a dash?
For example:
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/\s(–|[\p{Pd}\xAD]|(–))\s/", $currenttitle);
return $splitted;
}
This automatically trims spaces from split parts as well.
If you have - as delimiter (note the spaces around the dash), you may simply use explode(...). If not, use
\s*-(?=[^-]+$)\s*
or
\w+-\w+(*SKIP)(*FAIL)|-
with preg_split(), see the demos on regex101.com (#2)
In PHP this could be:
<?php
$strings = ["Tee-shirt - product color", "Boxer Welbar - ligth grey", "Longjohn Gari - marine stripe"];
foreach ($strings as $string) {
print_r(explode(" - ", $string));
}
foreach ($strings as $string) {
print_r(preg_split("~\s*-(?=[^-]+$)\s*~", $string));
}
?>
Both approaches will yield
Array
(
[0] => Tee-shirt
[1] => product color
)
Array
(
[0] => Boxer Welbar
[1] => ligth grey
)
Array
(
[0] => Longjohn Gari
[1] => marine stripe
)
To collect the splitted items, use array_map(...):
$splitted = array_map( function($item) {return preg_split("~\s*-(?=[^-]+$)\s*~", $item); }, $strings);
Your sample inputs convey that the neighboring whitespace around the delimiting hyphen/dash is just as critical as the hyphen/dash itself.
I recommend doing all of the html and special entity decoding before executing your regex -- that's what these other functions are built for and it will make your regex pattern much simpler to read and maintain.
\p{Pd} will match any hyphen/dash. Reinforce the business logic in the code by declaring a maximum of 2 elements to be generated by the split.
As a general rule, I discourage declaring single-use variables.
Code: (Demo)
function product_name_split($prod_name) {
return preg_split(
"/ \p{Pd} /u",
strip_tags(
html_entity_decode(
$prod_name
)
),
2
);
}
$tests = [
'Tee-shirt - product color',
'Boxer Welbar - ligth grey',
'Longjohn Gari - marine stripe',
'En dash – green',
'Entity – blue',
];
foreach ($tests as $test) {
echo var_export(product_name_split($test, true)) . "\n";
}
Output:
array (
0 => 'Tee-shirt',
1 => 'product color',
)
array (
0 => 'Boxer Welbar',
1 => 'ligth grey',
)
array (
0 => 'Longjohn Gari',
1 => 'marine stripe',
)
array (
0 => 'En dash',
1 => 'green',
)
array (
0 => 'Entity',
1 => 'blue',
)
As usual, there are several options for this, this is one of them
explode — Split a string by a string
end — Set the internal pointer of an array to its last element
$currenttitle = 'Tee-shirt - product color';
$array = explode( '-', $currenttitle );
echo end( $array );

Capture multiple repetitive group in regex

I'm using /{(\w+)\s+((\w+="\w+")\s*)+/ pattern to capture all attributes.
The problem is that it matches the input but can't group attribute one by one and just groups the last attribute.
[person name="Jackson" family="Smith"]
or
[car brand="Benz" type="SUV"]
The \G (continue) metacharacter is the hero to call upon here.
Code: (PHP Demo) (Regex101 Demo)
$tag = '[person name="Jackson" family="Smith"]';
var_export(preg_match_all('~(?:\G|\[\w+) (\w+)="(\w+)"~', $tag, $out) ? array_combine($out[1], $out[2]) : []);
Output:
array (
'name' => 'Jackson',
'family' => 'Smith',
)
If you need to pool the attributes&values with the tag name, only one loop is necessary for this too.
Code: (Demo)
$text = 'some text [person name="Jackson" family="Smith"] text [vehicle brand="Benz" type="SUV" doors="4" seats="7"]';
foreach (preg_match_all('~(?:\G(?!^)|\[(\w+)) (\w+)="(\w+)"~', $text, $out, PREG_SET_ORDER) ? $out : [] as $matches) {
if ($matches[1]) {
$tag = $matches[1]; // cache the tag name for reuse with subsequent attr/val pairs
}
$result[$tag][$matches[2]] = $matches[3];
}
var_export($result);
Output:
array (
'person' =>
array (
'name' => 'Jackson',
'family' => 'Smith',
),
'vehicle' =>
array (
'brand' => 'Benz',
'type' => 'SUV',
'doors' => '4',
'seats' => '7',
),
)
Due to the concerns of #Thefourthbird and #Jan, I have included a lookahead to match the closing square brace. I have also built in accommodation for the possibility of zero attributes in the tag. If given more time (sorry, don't have more), I could probably refine the following snippet to be slightly cleaner, but I believe I am accurately validating and extracting.
Code: (Demo)
$text = 'some text [person name="Jackson" family="Smith"] text [vehicle brand="Benz" type="SUV" doors="4" seats="7"] and [invalid closed="false" monkeywrench [lonetag] text [single gender="female"]';
foreach (preg_match_all('~\[(\w+)(?=(?: \w+="\w+")*])(]?)|(?:\G(?!^) (\w+)="(\w+)")~', $text, $out, PREG_SET_ORDER) ? $out : [] as $matches) {
if ($matches[2]) {
$result[$matches[1]] = [];
} elseif (!isset($matches[3])) {
$tag = $matches[1];
} else {
$result[$tag][$matches[3]] = $matches[4];
}
}
var_export($result);
Output:
array (
'person' =>
array (
'name' => 'Jackson',
'family' => 'Smith',
),
'vehicle' =>
array (
'brand' => 'Benz',
'type' => 'SUV',
'doors' => '4',
'seats' => '7',
),
'lonetag' =>
array (
),
'single' =>
array (
'gender' => 'female',
),
)
You can try \[\S+ ((?:[^"]+"){2}) ((?:[^"]+"){2})\]
Explanation:
\[ - match [ literallly
\S+ - mach one or more of non-whitespace characters
(?...) - non-capturing group
[^"]+" - match one or more characters other from " and repeat pattern two times due to {2}
\] - match ] literally
In first capturing group will be your first attribute, in second there will be the second attribute.
Demo
Better use two expressions (or a parser altogether) instead. Consider the following:
<?php
$junk = <<<END
lorem ipsum lorem ipsum
[person name="Jackson" family="Smith"]
lorem ipsum
[car brand="Benz" type="SUV"]
lorem ipsum lorem ipsum
END;
$tag = "~\[(?P<tag>\w+)[^][]*\]~";
$key_values = '~(?P<key>\w+)="(?P<value>[^"]*)"~';
preg_match_all($tag, $junk, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo "Name: {$match["tag"]}\n";
preg_match_all($key_values, $match[0], $attributes, PREG_SET_ORDER);
print_r($attributes);
}
?>
Here we have
\[(?P<tag>\w+)[^][]*\]
for likely tags and
(?P<key>\w+)="(?P<value>[^"]*)"
for key/value pairs. The rest is a foreach loop.

Parsing parameters from command line with RegEx and PHP

I have this as an input to my command line interface as parameters to the executable:
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
What I want to is to get all of the parameters in a key-value / associative array with PHP like this:
$result = [
'Parameter1' => '1234',
'Parameter2' => '1234',
'param3' => 'Test \"escaped\"',
'param4' => '10',
'param5' => '0',
'param6' => 'TT',
'param7' => 'Seven',
'param8' => 'secret',
'SuperParam9' => '4857',
'SuperParam10' => '123',
];
The problem here lies at the following:
parameter's prefix can be - or --
parameter's glue (value assignment operator) can be either an = sign or a whitespace ' '
some parameters may be inside a quote block and can also have different, both separators and glues and prefixes, ie. a ? mark for the separator.
So far, since I'm really bad with RegEx, and still learning it, is this:
/(-[a-zA-Z]+)/gui
With which I can get all the parameters starting with an -...
I can go to manually explode the entire thing and parse it manually, but there are way too many contingencies to think about.
You can try this that uses the branch reset feature (?|...|...) to deal with the different possible formats of the values:
$str = '-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"';
$pattern = '~ --?(?<key> [^= ]+ ) [ =]
(?|
" (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) "
|
([^ ?"]*)
)~x';
preg_match_all ($pattern, $str, $matches);
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
demo
In a branch reset group, the capture groups have the same number or the same name in each branch of the alternation.
This means that (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) is (obviously) the value named capture, but that ([^ ?"]*) is also the value named capture.
You could use
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\)"
|
\h+(?P<value>\H+)
)
See a demo on regex101.com.
Which in PHP would be:
<?php
$data = <<<DATA
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
DATA;
$regex = '~
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\\\)"
|
\h+(?P<value>\H+)
)~x';
if (preg_match_all($regex, $data, $matches)) {
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
}
?>
This yields
Array
(
[Parameter1] => 1234
[Parameter2] => 38518
[param3] => Test \"escaped\"
[param4] => 10
[param5] => 0
[param6] => TT
[param7] => Seven
[param8] => secret
[SuperParam9] => 4857
[SuperParam10] => 123
)

Make unique array from different preg_match_all applied to array of string

i've a logical problem that i don't know how to solve, i'm a lot confused about it.
I've an array composed in this way:
titoli[
1 => 'NFL'
2 => 'Johnny Depp'
3 => 'Institute of Technology'
4 => 'Another text'
]
Now, I need to apply different regex to that array, how can i do that and have a single final array?
For now i've written this:
for($i=0;$i<sizeof($titoli);$i++)
{
if(str_word_count($titoli[$i]) > preg_match_all('/([A-Z][a-zA-Z0-9-]*)([\s][A-Z][a-zA-Z0-9-]*)+/', $titoli[$i]))
{
preg_match('/([A-Z][a-zA-Z0-9-]*)([\s][A-Z][a-zA-Z0-9-]*)+/', $titoli[$i], $result[$i]);
$i++;
}
if(str_word_count($my_array[$i]) > preg_match_all('/^[A-Z][a-z]* [a-z]+ [A-Z][a-z]*$/', $titoli[$x]) && preg_match_all('/^[A-Z][a-z]* [a-z]+ [A-Z][a-z]*$/', $titoli[$i]) > 0) //controlla che nel titolo non siano state messe tutte le parole con l'iniziale maiuscola
{
preg_match('/^[A-Z][a-z]* [a-z]+ [A-Z][a-z]*$/', $titoli[$x], $result_b[$y], PREG_PATTERN_ORDER);
$y++;
}
}
Well, you could merge the arrays then extract the unique values:
$merged = array_merge($array1, $array2, $array3);
$unique = array_unique($merged);
Where $array1, $array2 and $array3 are the results of the preg_match_all functions.
I am going to post an answer, but I have very little confidence that it will give you what you are looking for. I tried to construct a method that mirrors your intent -- but I could be dead wrong.
Input:
$titoli=[
1 => 'NFL',
2 => 'Johnny Depp',
3 => 'Institute of Technology',
4 => 'Another text'
];
Method:
foreach($titoli as $t){
if($t==strtoupper($t)){ // every letter is uppercase
$result['acronyms'][]=$t;
}elseif($t==ucwords(strtolower($t))){ // every word starts with an uppercase letter
$result['names'][]=$t;
}else{ // has at least one word begins with a lowercase letter
$result['other'][]=$t;
}
}
var_export($result);
Output:
array (
'acronyms' =>
array (
0 => 'NFL',
),
'names' =>
array (
0 => 'Johnny Depp',
),
'other' =>
array (
0 => 'Institute of Technology',
1 => 'Another text',
),
)

PHP string convert to array

$paypal_details = "array (
'last_name' => 'Savani',
'item_name' => 'Description and pricing details here.',
'item_number' => '101',
'custom' => 'localhost',
'period' => '1',
'amount' => '10.01'
)";
Here is sample string in which contain full array.
Is this possible to convert string to array as it is?
You really should try to get the information in JSON or XML format instead, which both can be parsed natively by PHP. If that is not possible you can use the code snippet below to get a PHP array from the string. It uses regular expressions to turn the string into JSON format and then parses it using json_decode.
Improvements should of course be made to handle escaped single quotes within values etc., but it is a start.
$paypal_details = "array (
'last_name' => 'Savani',
'item_name' => 'Description and pricing details here.',
'item_number' => '101',
'custom' => 'localhost',
'period' => '1',
'amount' => '10.01'
)";
# Transform into JSON and parse into array
$array = json_decode(
preg_replace(
"/^\s*'(.*)' => '(.*)'/m", # Turn 'foo' => 'bar'
'"$1": "$2"', # into "foo": "bar"
preg_replace(
"/array \((.*)\)/s", # Turn array (...)
'{$1}', # into { ... }
$paypal_details
)
),
true
);
echo "Last name: " . $array["last_name"] . PHP_EOL;
Output:
Last name: Savani
You can use the explode() function.
<?php
$str = "Hello world. It's a beautiful day.";
print_r(explode(" ", $str));
http://www.w3schools.com/php/func_string_explode.asp

Categories