I'm looking for a function, class or collection of functions that will assist in the process of pattern matching strings as I have a project that requires a fair amount of pattern matching and I'd like something easier to read and maintain than raw preg_replace (or regex period).
I've provided a pseudo example in hopes that it will help you understand what I'm asking.
$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
pattern_match($subject, $pattern, 0);
would return "$2,500".
$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
pattern_match($subject, $pattern, 1);
would return an array with the values: [$2,500], [$1,250], [$1,250]
The function — as I'm trying to write — uses 'n' for numbers, 'c' for lower-case alpha and 'C' for upper-case alpha where any non-alphanumeric character represents itself.
Any help would be appreciated.
<?php
// $match_all = false: returns string with first match
// $match_all = true: returns array of strings with all matches
function pattern_match($subject, $pattern, $match_all = false)
{
$pattern = preg_quote($pattern, '|');
$ar_pattern_replaces = array(
'n' => '[0-9]',
'c' => '[a-z]',
'C' => '[A-Z]',
);
$pattern = strtr($pattern, $ar_pattern_replaces);
$pattern = "|".$pattern."|";
if ($match_all)
{
preg_match_all($pattern, $subject, $matches);
}
else
{
preg_match($pattern, $subject, $matches);
}
return $matches[0];
}
$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
$result = pattern_match($subject, $pattern, 0);
var_dump($result);
$result = pattern_match($subject, $pattern, 1);
var_dump($result);
Here is the function with no regexp that should work ('C' and 'c' recognize only ascii chars) , enjoy:
function pattern_match($subject, $pattern, $result_as_array) {
$pattern_len = strlen($pattern);
if ($pattern_len==0) return false; // error: empty pattern
// translate $subject with the symboles of the rule ('n', 'c' or 'C')
$translate = '';
$subject_len = strlen($subject);
for ($i=0 ; $i<$subject_len ; $i++) {
$x = $subject[$i];
$ord = ord($x);
if ( ($ord>=48) && ($ord<=57) ) { // between 0 and 9
$translate .= 'n';
} elseif ( ($ord>=65) && ($ord<=90) ) { // between A and Z
$translate .= 'C';
} elseif ( ($ord>=97) && ($ord<=122) ) { // between a and z
$translate .= 'c';
} else {
$translate .= $x; // othre characters are not translated
}
}
// now search all positions in the translated string
// single result mode
if (!$result_as_array) {
$p = strpos($translate, $pattern);
if ($p===false) {
return false;
} else {
return substr($subject, $p, $pattern_len);
}
}
// array result mode
$result = array();
$p = 0;
$n = 0;
while ( ($p<$subject_len) && (($p=strpos($translate,$pattern,$p))!==false) ) {
$result[] = substr($subject, $p, $pattern_len);
$p = $p + $pattern_len;
}
return $result;
}
Update: This is an incomplete answer that doesn't hold up against several test patterns. See #Frosty Z's answer for a better solution.
<?php
function pattern_match($s, $p, $c=0) {
$tokens = array(
'$' => '\$',
'n' => '\d{1}',
'c' => '[a-z]{1}',
'C' => '[A-Z]{1}'
);
$reg = '/' . str_replace(array_keys($tokens), array_values($tokens), $p) . '/';
if ($c == 0) {
preg_match($reg, $s, $matches);
} else {
preg_match_all($reg, $s, $matches);
}
return $matches[0];
}
$subject = "$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).";
$pattern = '$n,nnn';
print_r(pattern_match($subject, $pattern, 0));
print_r(pattern_match($subject, $pattern, 1));
$pattern = 'cc-cccc';
print_r(pattern_match($subject, $pattern));
print_r(pattern_match($subject, $pattern, 1));
?>
Output:
$2,500
Array
(
[0] => $2,500
[1] => $1,250
[2] => $1,250
)
on-time
Array
(
[0] => on-time
[1] => on-time
)
Note: Make sure to use single-quotes for your $pattern when it contains $, or PHP will try to parse it as a $variable.
The function you're looking for is preg_match_all, although you'll need to use REGEX patterns for your pattern matching.
Sorry, but this is a problem for regex. I understand your objections, but there's just no other way as efficient or simple in this case. This is an extremely simple matching problem. You could write a custom wrapper as jnpcl demonstrated, but that would only involve more code and more potential pitfalls. Not to mention extra overhead.
Related
There are quite a few questions on SO asking about how to parse a regex pattern and output all possible matches to that pattern. For some reason, though, every single one of them I can find (1, 2, 3, 4, 5, 6, 7, probably more) are either for Java or some variety of C (and just one for JavaScript), and I currently need to do this in PHP.
I’ve Googled to my heart’s (dis)content, but whatever I do, pretty much the only thing that Google gives me is links to the docs for preg_match() and pages about how to use regex, which is the opposite of what I want here.
My regex patterns are all very simple and guaranteed to be finite; the only syntax used is:
[] for character classes
() for subgroups (capturing not required)
| (pipe) for alternative matches within subgroups
? for zero-or-one matches
So an example might be [ct]hun(k|der)(s|ed|ing)? to match all possible forms of the verbs chunk, thunk, chunder and thunder, for a total of sixteen permutations.
Ideally, there’d be a library or tool for PHP which will iterate through (finite) regex patterns and output all possible matches, all ready to go. Does anyone know if such a library/tool already exists?
If not, what is an optimised way to approach making one? This answer for JavaScript is the closest I’ve been able to find to something I should be able to adapt, but unfortunately I just can’t wrap my head around how it actually works, which makes adapting it more tricky. Plus there may well be better ways of doing it in PHP anyway. Some logical pointers as to how the task would best be broken down would be greatly appreciated.
Edit: Since apparently it wasn’t clear how this would look in practice, I am looking for something that will allow this type of input:
$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');
– and printing $possibleMatches should then give something like this (the order of the elements is not important in my case):
Array
(
[0] => chunk
[1] => thunk
[2] => chunks
[3] => thunks
[4] => chunked
[5] => thunked
[6] => chunking
[7] => thunking
[8] => chunder
[9] => thunder
[10] => chunders
[11] => thunders
[12] => chundered
[13] => thundered
[14] => chundering
[15] => thundering
)
Method
You need to strip out the variable patterns; you can use preg_match_all to do this
preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches);
/* Regex:
/(\[\w+\]|\([\w|]+\))/
/ : Pattern delimiter
( : Start of capture group
\[\w+\] : Character class pattern
| : OR operator
\([\w|]+\) : Capture group pattern
) : End of capture group
/ : Pattern delimiter
*/
You can then expand the capture groups to letters or words (depending on type)
$array = str_split($cleanString, 1); // For a character class
$array = explode("|", $cleanString); // For a capture group
Recursively work your way through each $array
Code
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
Additional functionality
Expanding nested groups
In use you would put this before the "preg_match_all".
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;
Output:
This happen(s|ed) to (become|be|have|having) test case 1?
Matching single letters
The bones of this would be to update the regex:
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
and add an else to the prepOptions function:
} else {
$array = [$cleanString];
}
Full working example
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
} else {
$array = [$cleanString];
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex);
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
Output:
This happens to become test case 1
This happens to become test case
This happens to be test case 1
This happens to be test case
This happens to have test case 1
This happens to have test case
This happens to having test case 1
This happens to having test case
This happened to become test case 1
This happened to become test case
This happened to be test case 1
This happened to be test case
This happened to have test case 1
This happened to have test case
This happened to having test case 1
This happened to having test case
function extractConnect($str,$connect_type){
$connect_array = array();
$connect_counter = 0;
$str = trim($str).' ';
for($i =0; $i<strlen($str);$i++) {
$chr = $str[$i];
if($chr==$connect_type){ //$connect_type = '#' or '#' etc
$connectword = getConnect($i,$str);
$connect_array[$connect_counter] = $connectword;
$connect_counter++;
}
}
if(!empty($connect_array)){
return $connect_array;
}
}
function getConnect($tag_index,$str){
$str = trim($str).' ';
for($j = $tag_index; $j<strlen($str);$j++) {
$chr = $str[$j];
if($chr==' '){
$hashword = substr($str,$tag_index+1,$j-$tag_index);
return trim($hashword);
}
}
}
$at = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
$hash = extractConnect("#stackoverflow is great. #google.com is the best search engine","#");
print_r($at);
print_r($hash);
What this method does is it extracts # or # from a string and return an array of those words.
e.g input #stackoverflow is great. #google.com is the best search engine and outputs this
Array ( [0] => google.com ) Array ( [0] => stackoverflow )
But it seems like this method is to slow is there any alternative ?
You could use a regex to achieve this:
/<char>(\S+)\b/i
Explanation:
/ - starting delimiter
<char> - the character you're searching for (passed as a function argument)
(\S+) - any non-whitespace character, one or more times
\b - word boundary
i - case insensitivity modifier
/ - ending delimiter
Function:
function extractConnect($string, $char) {
$search = preg_quote($char, '/');
if (preg_match('/'.$search.'(\S+)\b/i', $string, $matches)) {
return [$matches[1]]; // Return an array containing the match
}
return false;
}
With your strings, this would produce the following output:
array(1) {
[0]=>
string(10) "google.com"
}
array(1) {
[0]=>
string(13) "stackoverflow"
}
Demo
You can do it like this:
<?php
function extractConnect($strSource, $tags) {
$matches = array();
$tags = str_split($tags);
$strSource = explode(' ', $strSource);
array_walk_recursive($strSource, function(&$item) {
$item = trim($item);
});
foreach ($strSource as $strPart) {
if (in_array($strPart[0], $tags)) {
$matches[$strPart[0]][] = substr($strPart, 1);
}
}
return $matches;
}
var_dump(extractConnect(
"#stackoverflow is great. #twitter is good. #google.com is the best search engine",
"##"
));
Outputs:
This seemed to work for me. Provide it with the symbol you want.
function get_stuff($str) {
$result = array();
$words = explode(' ', $str);
$symbols = array('#', '#');
foreach($words as $word) {
if (in_array($word[0], $symbols)) {
$result[$word[0]][] = substr($word, 1);
}
}
return $result;
}
$str = '#stackoverflow is great. #google.com is the best search engine';
print_r(get_stuff($str));
This outputs Array ( [#] => Array ( [0] => stackoverflow ) [#] => Array ( [0] => google.com ) )
I have the following possible strings:
NL DE
NL,DE
nl DE
nl de
NL/DE
NL,mismatch,DE
I'm looking for the preg_match that produces the following output given the inputs above.
array(
[0]=>"NL",
[1]=>"DE"
);
I've tried the following code:
preg_match_all('/(\w{2,2})/ui', $info["country"], $m);
but that seems to also cut up the word mismatch, which is undesired.
The regex should only match two letter country codes, everything else should be ignored.
How can I do this using preg_match in PHP?
Here's how you explode the string:
$string = 'NL DE
NL,DE
nl DE
nl de
NL/DE
NL,mismatch,DE';
Using explode and filter:
$string = explode("\n",str_replace(array(",","/"," ","\r"), "\n", strtoupper($string)));
$string = array_unique(array_filter($string,function($v){$v = trim($v); return strlen($v) === 2;}));
var_dump($string);
If you want to play around with the string, try this:
$s = ",\n\t \r";
$t = strtok(strtoupper($string), $s);
$l = array();
while ( $t !== false ) {
in_array($t, $l) OR strlen($t) == 2 AND $l[] = $t AND $t = strtok($s);
}
var_dump($l);
Output:
array
0 => string 'NL' (length=2)
1 => string 'DE' (length=2)
// #claudrian Variant
function SplitCountries($string){
// Sanity check
if(!is_string($string)) return false;
// Split string by non-letters (case insensitive)
$slices = preg_split('~[^a-z]+~i', trim($string));
// Keep only 2-letter words
$slices = preg_grep('~^[a-z]{2}$~i', $slices);
// Keep uniques
$slices = array_unique($slices);
// Done
return $slices;
}
// #Wiseguy Variant
function SplitCountries($string){
// Sanity check
if(!is_string($string)) return false;
// Capture only two letter packs
if(!preg_match_all('~\\b[a-z]{2}\\b~i', trim($string), $slices)){
return false;
}
// Keep uniques
$slices = array_unique($slices[0]);
// Done
return $slices;
}
Hope it helps.
This should work
$result = array();
preg_match_all('/([a-z]{2})(?:.*)?([a-z]{2})/i',$text, $matches);
$result = array( strtolower( $matches[1][0] ), strtolower( $matches[2][0] ) );
You'll have the results in the $result array
I like to convert string with a price to a float value. The price comes from different languages and countries and can look like this:
1,00 €
€ 1.00
1'000,00 EUR
1 000.00$
1,000.00$
1.000,00 EURO
or whatever you can think of...
Not sure I got the full range of possibilities with my examples. I am also not sure if it is possible to make in international convert blindly, maybe I have to use a language code? So for the start Euro and Dollar would be enough.
floatval() is kind of stupid so I need something more here. I think I should first remove all chars beside numbers, , and .. Then fix the , / . and use floatval finally.
Has someone done this before and can help me a little?
I would prefer a solution without regexp ;)
Ok, I tried it myself. What do you think of this?
function priceToFloat($s){
// is negative number
$neg = strpos((string)$s, '-') !== false;
// convert "," to "."
$s = str_replace(',', '.', $s);
// remove everything except numbers and dot "."
$s = preg_replace("/[^0-9\.]/", "", $s);
// remove all seperators from first part and keep the end
$s = str_replace('.', '',substr($s, 0, -3)) . substr($s, -3);
// Set negative number
if( $neg ) {
$s = '-' . $s;
}
// return float
return (float) $s;
}
Here some tests: http://codepad.org/YtiHqsgz
Sorry. I couldn't include the other functions because codepad did not like them. But I compared them and there was trouble with strings like "22 000,76" or "22.000"
Update: As Limitless isa pointed out you might have a look at the build in function money-format.
Removing all the non-numeric characters should give you the price in cents. You can then divide that by 100 to get the 'human readable' price. You could do this with something like the filter_var FILTER_SANITIZE_NUMBER_INT. For example:
$cents = filter_var($input, FILTER_SANITIZE_NUMBER_INT);
$price = floatval($cents / 100);
Above is untested, but something like that is probably what you're looking for.
This function will fix your problem:
function priceToSQL($price)
{
$price = preg_replace('/[^0-9\.,]*/i', '', $price);
$price = str_replace(',', '.', $price);
if(substr($price, -3, 1) == '.')
{
$price = explode('.', $price);
$last = array_pop($price);
$price = join($price, '').'.'.$last;
}
else
{
$price = str_replace('.', '', $price);
}
return $price;
}
price to number
number to price examples
<?php
$number="1.050,50";
$result=str_replace(',','.',str_replace('.','',$number));
echo $result. "<br/>";
// 1050.50
setlocale(LC_MONETARY, 'tr_TR');
echo money_format('%!.2n', $result) ;
// 1.050,50
?>
To remove all but numbers, commas and full stops:
<?php
$prices = array( "1,00 €",
"€ 1.00",
"1'000,00 EUR",
"1 000.99$",
"1,000.01$",
"1.000,10 EURO");
$new_prices = array();
foreach ($prices as $price) {
$new_prices[] = preg_replace("/[^0-9,\.]/", "", $price);
}
print_r($new_prices);
Output:
Array ( [0] => 1,00 [1] => 1.00 [2] => 1000,00 [3] => 1000.99 [4] => 1,000.01 [5] => 1.000,10 )
Now lets utilize the parseFloat function from Michiel - php.net (I won't paste it here since it's a pretty big function):
<?php
$prices = array( "1,00 €",
"€ 1.00",
"1'000,00 EUR",
"1 000.99$",
"1,000.01$",
"1.000,10 EURO");
$new_prices = array();
foreach ($prices as $price) {
$new_prices[] = parseFloat(preg_replace("/[^0-9,\.]/", "", $price));
}
print_r($new_prices);
Output will be:
Array ( [0] => 1 [1] => 1 [2] => 1000 [3] => 1000.99 [4] => 1000.01 [5] => 1000.1 )
not perfect, but it work
function priceToFloat($s){
// clear witespaces
$s = trim($s);
$s = str_replace(' ', '', $s);
// is it minus value
$is_minus = false;
if(strpos($s, '(') !== false)
$is_minus = true;
if(strpos($s, '-') !== false)
$is_minus = true;
// check case where string has "," and "."
$dot = strpos($s, '.');
$semi = strpos($s, ',');
if($dot !== false && $semi !== false){
// change fraction sign to #, we change it again later
$s = str_replace('#', '', $s);
if($dot < $semi) $s = str_replace(',','#', $s);
else $s = str_replace('.','#', $s);
// remove another ",", "." and change "#" to "."
$s = str_replace([',','.', '#'], ['','', '.'], $s);
}
$s = str_replace(',', '.', $s);
// clear usless elements
$s = preg_replace("/[^0-9\.]/", "", $s);
// if it minus value put the "-" sign
if($is_minus) $s = -$s;
return (float) $s;
}
working cases
$prices = [
'123.456,789',
'123,456.789',
'123 456,789',
'123 456.789',
'-123,456.789',
'(123,456.789)',
];
foreach($prices as $price)
echo priceToFloat($price).'<br />';
return
123456.789
123456.789
123456.789
123456.789
-123456.789
-123456.789
Is there a better method than loop with strpos()?
Not i'm looking for partial matches and not an in_array() type method.
example needle and haystack and desired return:
$needles[0] = 'naan bread';
$needles[1] = 'cheesestrings';
$needles[2] = 'risotto';
$needles[3] = 'cake';
$haystack[0] = 'bread';
$haystack[1] = 'wine';
$haystack[2] = 'soup';
$haystack[3] = 'cheese';
//desired output - but what's the best method of getting this array?
$matches[0] = 'bread';
$matches[1] = 'cheese';
ie:
magic_function($haystack, %$needles%) !
foreach($haystack as $pattern) {
if (preg_grep('/'.$pattern.'/', $needles)) {
$matches[] = $pattern;
}
}
I think you are confusing $haystack and $needle in your question, because naan bread is not in haystack, nor is cheesestring. Your desired output suggests you are looking for cheese in cheesestring instead. For that, the following would work:
function in_array_multi($haystack, $needles)
{
$matches = array();
$haystack = implode('|', $haystack);
foreach($needles as $needle) {
if(strpos($haystack, $needle) !== FALSE) {
$matches[] = $needle;
}
}
return $matches;
}
For your given haystack and needles this performs twice as fast as a regex solution. Might change for different number of params though.
I think you'll have to roll your own. The User Contributed Comments to array_intersect() provide a number of alternative implementations (like this one). You would just have to replace the == matching against strstr().
$data[0] = 'naan bread';
$data[1] = 'cheesestrings';
$data[2] = 'risotto';
$data[3] = 'cake';
$search[0] = 'bread';
$search[1] = 'wine';
$search[2] = 'soup';
$search[3] = 'cheese';
preg_match_all(
'~' . implode('|', $search) . '~',
implode("\x00", $data),
$matches
);
print_r($matches[0]);
// [0] => bread
// [1] => cheese
You'll get better answers if you tell us more about the real problem.