PHP - Extract Custom Operators from String - php

I have a MySQL database back-end which is searched via PHP. I'm trying to implement custom operators for detailed searching.
Here is an example:
$string = "o:draw t:spell card";
Where o: = Search by Description and t: = Card Typing. This would be similar to a custom syntax from a site like Scryfall https://scryfall.com/docs/syntax
Here is my work in progress example for o:
$string = "o:draw t:spell card";
if (strpos($string, 'o:') !== false) {
$text = explode('o:', $string, 2)[1];
echo $text;
}
The output is: draw t:spell card
I'm trying to get the output to respond with just: draw
However, it would also need to be able to accomplish this "smartly". It may not always be in a specific order and may show up alongside other operators.
Any suggestions on how to build around a system like this in PHP?
Example user searches:
Search Description + Type (new Syntax):
o:draw a card t:spell card
Search Card Name/Description (what it currently does):
Time Wizard
Search Card Name/Description (what it currently does):
Draw a Card
Combination of new and old syntax:
Time Wizard t:effect monster

I am unable to take your input all the way through to sql strings, because I don't know if you have more columns to access than the two that you have listed in your question.
To begin to script a parser for your users' input, you will need to understand all of the possible inputs that you wish to accommodate and assign specific handlers for each occurrence. This task has the ability to turn into a rabbit hole -- in other words, it may become an ever-extending and diverging task. My answer is only to get you started; I'm not in it for the long haul.
Declare a whitelist of commands that you intend to respect. In my snippet, this is $commands. The whitelist of strings with special meaning must not be confused with any card name strings -- I have added the best fringe case that I can find in the YuGiOh database (a card name containing a colon).
Build a regex pattern using the whitelist to find qualifying delimiting spaces in input strings.
Iterate the array generated by the regex-split input and assess each element to determine if/how it will be integrated into the WHERE clause of your sql.
At this early stage, it is fair to only allow one card name per input string, but certainly possible that users will wish to enter multiple card types to filter their expected result set.
I am electing to sanitize the user input by trimming leading and trailing whitespace and converting all letters to lowercase. I assume you will be running a modern database which will make case-insensitive comparisons.
Code: (Demo)
$tests = [
'o:draw a card t:spell card',
'Time Wizard',
'Draw a Card',
'Time Wizard t:effect monster',
'A-Team: Trap Disposal Unit',
't: effect o: draw a card atk: 1400 def: 600 l: 4 t: fairy',
];
$commands = [
'o' => '', // seems to be useless to me
't' => 'type', // card type
'atk' => 'attack', // card's attack value
'def' => 'defense', // card's defense value
'l' => 'level', // card's level value
];
$pattern = '~ *(?=(?:' . implode('|', array_keys($commands)) . '):)~'; // create regex pattern using commands lookup
foreach ($tests as $test) {
foreach (preg_split($pattern, strtolower(trim($test)), 0, PREG_SPLIT_NO_EMPTY) as $component) {
$colectomy = explode(':', $component, 2);
if (count($colectomy) < 2) {
if ($colectomy[0] !== 'draw a card') { // "draw a card" seems worthless to the query
$result[$test]['cardName (old syntax)'] = $component;
}
} elseif ($colectomy[0] !== 'o') { // o command seems worthless to the query
if (isset($commands[$colectomy[0]])) {
$result[$test][$commands[$colectomy[0]]][] = $colectomy[1]; // enable capturing of multiple values of same command
} else {
$result[$test]['cardName (new syntax)'] = $component;
}
}
}
}
var_export($result);
Output:
array (
'o:draw a card t:spell card' =>
array (
'type' =>
array (
0 => 'spell card',
),
),
'Time Wizard' =>
array (
'cardName (old syntax)' => 'time wizard',
),
'Time Wizard t:effect monster' =>
array (
'cardName (old syntax)' => 'time wizard',
'type' =>
array (
0 => 'effect monster',
),
),
'A-Team: Trap Disposal Unit' =>
array (
'cardName (new syntax)' => 'a-team: trap disposal unit',
),
't: effect o: draw a card atk: 1400 def: 600 l: 4 t: fairy' =>
array (
'type' =>
array (
0 => ' effect',
1 => ' fairy',
),
'attack' =>
array (
0 => ' 1400',
),
'defense' =>
array (
0 => ' 600',
),
'level' =>
array (
0 => ' 4',
),
),
)

You can so something like below to extract key value from string.
$tags = ['o:', 't:'];
$str = "t:spell card o:draw a card";
$dataList = [];
foreach($tags as $tag){
$tagDataLevel1 = explode($tag, $str, 2)[1];
$expNew = explode(':', $tagDataLevel1,2);
if(count($expNew)==2){
$tagData= strrev(explode(' ',strrev($expNew[0]),2)[1]);
}else{
$tagData = $tagDataLevel1;
}
$dataList[$tag] = $tagData;
//overwrite string
$str = str_replace($tag . $tagData,"",$str);
}
$oldKeyWord = $str;
var_dump ($dataList);
echo $oldKeyWord;

Related

Match Country in String, split based on result

I have a CSV file with one of the fields holding state/country info, formatted like:
"Florida United States" or "Alberta Canada" or "Wellington New Zealand" - not comma or tab delimited between them, simply space delimited.
I have an array of all the potential countries as well.
What I am looking for, is a solution that, in a loop, I can split the State and Country to different variables, based on matching the country in the $countryarray that I have something like:
$countryarray=array("United States","Canada","New Zealand");
$userfield="Wellington New Zealand");
$somefunction=(match "New Zealand", extract into $country, the rest into $state)
Split won't do it straight up - because many of the countries AND states have spaces, but the original data set concatenated the state and country together with just a space...
TIA!
I'm a fan of the RegEx method that #Mike Morton mentioned. You can take an array of countries, implode them using the | which is a RegEx OR, and use that as an "ends with one of these" pattern.
Below I've come up with two ways to do this, a simple way and an arguably overly complicated way that does some extra escaping. To illustrate what that escaping does I've added a fake country called Country XYZ (formally ABC).
Here's the sample data that works with both methods, as well as a helper function that actually does the matching and echoing. The RegEx does named-capturing, too, which makes things really easy to deal with.
// Sample data
$data = [
'Wellington New Zealand',
'Florida United States of America',
'Quebec Canada',
'Something Country XYZ (formally ABC)',
];
// Array of all possible countries
$countries = [
'United States of America',
'Canada',
'New Zealand',
'Country XYZ (formally ABC)',
];
// The begining and ending pattern delimiter for the RegEx
$delim = '/';
function matchAndShowData(array $data, array $countries, string $delim, string $countryParts): void
{
$pattern = "^(?<region>.*?) (?<country>$countryParts)$";
foreach($data as $d) {
if(preg_match($delim . $pattern . $delim, $d, $matches)){
echo sprintf('%1$s, %2$s', $matches['region'], $matches['country']), PHP_EOL;
} else {
echo 'NO MATCH: ' . $d, PHP_EOL;
}
}
}
Option 1
The first option is a naïve implode. This method, however, will not find the country that includes parentheses.
matchAndShowData($data, $countries, $delim, implode('|', $countries));
Output
Wellington, New Zealand
Florida, United States of America
Quebec, Canada
NO MATCH: Something Country XYZ (formally ABC)
Option 2
The second option applies proper RegEx quoting of the countries, just in case they have special characters. If you are 100% certain you don't have any, this is overkill, but I personally have learned, after way too many hours of debugging, to just always quote, just in case.
$patternParts = array_map(fn(string $country) => preg_quote($country, $delim), $countries);
// Implode the cleaned countries using the RegEx pipe operator which means "OR"
matchAndShowData($data, $countries, $delim, implode('|', $patternParts));
Output
Wellington, New Zealand
Florida, United States of America
Quebec, Canada
Something, Country XYZ (formally ABC)
Note
If you don't expect your list of countries to change often you can echo the pattern out and then just bake that into your code which will probably shave a couple of milliseconds of execution, which in a tight loop might be worth it.
Demo
You can see a demo of this here: https://3v4l.org/CaNRZ
Prepare the array of countries for use in a regular expression with preg_quote().
Build a regex pattern that will match a space followed by one of the country values then the end of the string. A lookahead ((?= ... )) is used to ensure that those matched characters are not consumed/destroyed while exploding.
Save the 2-element returned array from preg_split() to the output array.
Code: (Demo)
$branches = array_map(fn($country) => preg_quote($country, '/'), $countries);
$result = [];
foreach ($data as $string) {
$result[] = preg_split('/ (?=(?:' . implode('|', $branches) . ')$)/', $string);
}
var_export($result);
Output:
array (
0 =>
array (
0 => 'Wellington',
1 => 'New Zealand',
),
1 =>
array (
0 => 'Florida',
1 => 'United States of America',
),
2 =>
array (
0 => 'Quebec',
1 => 'Canada',
),
3 =>
array (
0 => 'Something',
1 => 'Country XYZ (formally ABC)',
),
)
Note that if an item/row in the result array only has one element, then you know that the attempted split failed to match the country substring.
I use this same technique when splitting street name and street type (when things like "First Street North" (a multi-word street type)) happens.

PHP regular expression, match the last occurence

I have a php function that splits product names from their color name in woocommerce.
The full string is generally of this form "product name - product color", like for example:
"Boxer Welbar - ligth grey" splits into "Boxer Welbar" and "light grey"
"Longjohn Gari - marine stripe" splits into "Longjohn Gari" and "marine stripe"
But in some cases it can be "Tee-shirt - product color"...and in this case the split doesn't work as I want, because the "-" in Tee-shirt is detected.
How to circumvent this problem? Should I use a "lookahead" statement in the regexp?
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/–|[\p{Pd}\xAD]|(–)/", $currenttitle);
return $splitted;
}
I'd go for a negative lookahead.
Something like this:
-(?!.*-)
that means to search for a - not followed by any other -
This works if in the color name there will never be a -
What about counting space characters that surround a dash?
For example:
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/\s(–|[\p{Pd}\xAD]|(–))\s/", $currenttitle);
return $splitted;
}
This automatically trims spaces from split parts as well.
If you have - as delimiter (note the spaces around the dash), you may simply use explode(...). If not, use
\s*-(?=[^-]+$)\s*
or
\w+-\w+(*SKIP)(*FAIL)|-
with preg_split(), see the demos on regex101.com (#2)
In PHP this could be:
<?php
$strings = ["Tee-shirt - product color", "Boxer Welbar - ligth grey", "Longjohn Gari - marine stripe"];
foreach ($strings as $string) {
print_r(explode(" - ", $string));
}
foreach ($strings as $string) {
print_r(preg_split("~\s*-(?=[^-]+$)\s*~", $string));
}
?>
Both approaches will yield
Array
(
[0] => Tee-shirt
[1] => product color
)
Array
(
[0] => Boxer Welbar
[1] => ligth grey
)
Array
(
[0] => Longjohn Gari
[1] => marine stripe
)
To collect the splitted items, use array_map(...):
$splitted = array_map( function($item) {return preg_split("~\s*-(?=[^-]+$)\s*~", $item); }, $strings);
Your sample inputs convey that the neighboring whitespace around the delimiting hyphen/dash is just as critical as the hyphen/dash itself.
I recommend doing all of the html and special entity decoding before executing your regex -- that's what these other functions are built for and it will make your regex pattern much simpler to read and maintain.
\p{Pd} will match any hyphen/dash. Reinforce the business logic in the code by declaring a maximum of 2 elements to be generated by the split.
As a general rule, I discourage declaring single-use variables.
Code: (Demo)
function product_name_split($prod_name) {
return preg_split(
"/ \p{Pd} /u",
strip_tags(
html_entity_decode(
$prod_name
)
),
2
);
}
$tests = [
'Tee-shirt - product color',
'Boxer Welbar - ligth grey',
'Longjohn Gari - marine stripe',
'En dash – green',
'Entity – blue',
];
foreach ($tests as $test) {
echo var_export(product_name_split($test, true)) . "\n";
}
Output:
array (
0 => 'Tee-shirt',
1 => 'product color',
)
array (
0 => 'Boxer Welbar',
1 => 'ligth grey',
)
array (
0 => 'Longjohn Gari',
1 => 'marine stripe',
)
array (
0 => 'En dash',
1 => 'green',
)
array (
0 => 'Entity',
1 => 'blue',
)
As usual, there are several options for this, this is one of them
explode — Split a string by a string
end — Set the internal pointer of an array to its last element
$currenttitle = 'Tee-shirt - product color';
$array = explode( '-', $currenttitle );
echo end( $array );

Simplify PHP array with same items

I have this PHP array:
$this->user_list = array( 0 => 'Not paid',1 => 'Not paid', 2 => 'Not paid', 7 => 'Waiting, 15 => 'Waiting', 10 => 'Cancelled' );
How can I simplify this array as the id numbers are different, but some of them have same status?
I tried it like this:
$this->user_list = array( [0,1,2 => 'Not paid'],[7,15 => 'Waiting'],10 => 'Cancelled' );
but it doesn't work as expected.
Basically I want to achieve this:
echo $this->user_list[15] should give me Waiting, echo $this->user_list[10] should give me Cancelled, etc. So this is working in my first array very well, I am just thinking about grouping duplicate names there.
As mentioned by other contributors, there is no native support in the PHP grammar for your intended use case. As clearly stated in the PHP: Arrays documentation:
An array can be created using the array() language construct. It takes any number of comma-separated key => value pairs as arguments.
So basically each element in an array is a key => value pair, which means you cannot associate multiple keys to a single element.
This also explains why your first tentative didn't work:
$this->user_list = array( [0,1,2 => 'Not paid'],[7,15 => 'Waiting'],10 => 'Cancelled' );
If you don't specify a key for an element, PHP uses a progressive index (0, 1, ...). So basically in the example above, the first zero is not actually a key, but a value, and PHP binds it to the key = 0. Maybe it could be easier for you to understand how it works if you print a var_dump or print_r of $this->user_list. You would get something similar to the following structure (NOTE: I have simplified the structure to make it more clear):
[
0 => [
0 => 0
1 => 1
2 => "Not paid"
],
1 => [
0 => 7,
15 => "Waiting"
],
10 => "Cancelled"
]
So how do we resolve this problem? Well... actually there is no need to contort the structure by swapping keys with values as other contributors seem to suggest. Changing the structure might simplify your "data entry" work but might also create big issues in other parts of the program because who knows, maybe accessing the invoice data by "ID" is simply more efficient than by "status" ... or something.
Since PHP does not provide such a feature out of the box, I believe a better solution would be to develop our own function; a good starting point could be the one in the example below.
function explode_array($config, $sep = ',') {
$res = [];
foreach($config as $configKey => $value) {
// split key values
$keys = explode($sep, $configKey);
foreach($keys as $key) {
$res[$key] = $value;
}
}
return $res;
}
$config = [
'0,1,2' => 'Not paid',
'7,15' => 'Waiting',
'10' => 'Cancelled'
];
$myArr = explode_array($config);
print_r($myArr);
The idea is quite simple: since we cannot use an array as key we leverage the next best data type, that is a CSV string. Please note there is no error handling in the above code, so the first thing you may want to do is adding some validation code to the explode_array (or however you wish to name it) function.
you should use like this. if id number is invoice id or something else and other value is there status about it.
$arr = array(
'Not paid' => [0,1,2] ,
'Waiting' => [5,6],
'Cancelled' =>[8]
);
foreach($arr as $key => $val){
foreach($val as $keys => $vals){
echo "invoiceid ".$vals ." status ".$key;
echo"<br>";
}
}
// for only one status you can use like this
foreach($arr['Not paid'] as $key => $val){
echo $val;
echo"<br>";
}
just try to run this and check output.
PHP has no built-in function or structure for handling cases like this. I'd use a simple array value-cloning function to map your duplicates. Simply have one instance of each status, then map the aliases, and then run a function that clones them in. As follows:
// Status list:
$ulist = [ 0 => 'Not paid', 7 => 'Waiting', 10 => 'Cancelled' ];
// Alternative IDs list, mapped to above source IDs:
$aliases = [ 0 => [1,2], 7 => [15] ];
// Function to clone array values:
function clone_values(array &$arr, array $aliases)
{
foreach($aliases as $src => $tgts) {
foreach($tgts as $tgt) {
$arr[$tgt] = $arr[$src];
}
}
ksort($arr); // If the order matters
}
// Let's clone:
clone_values($ulist, $aliases);
This results in the following array:
array(6) {
[0] · string(8) "Not paid"
[1] · string(8) "Not paid"
[2] · string(8) "Not paid"
[7] · string(7) "Waiting"
[10] · string(9) "Cancelled"
[15] · string(7) "Waiting"
}
....which can be accessed as you expect, here $ulist[2] => Not paid, etc. If the use case is as simple as illustrated in the OP, I'd personally just spell it out as is. There's no dramatic complexity to it. However, if you have dozens of aliases, mapping and cloning begins to make sense.
As said in the comments, you can't have multiple keys with one value. The best way is to use the keyword => [ number, number, number...] construction.
//set a result array
$result = [];
//loop the original array
foreach ( $this->user_list as $number => $keyword ){
//if the keyword doesn't exist in the result, create one
if(!isset ( $result [ $keyword ] ) ) $result[ $keyword ] = [];
//add the number to the keyword-array
$result[ $keyword ] [] = $number;
}

Strange PHP behaviour for FOR within array

I get very strange PHP behavior with my function that allow to search within array of associative arrays to find array that have some key with searchable value.
My PHP code:
<?php
// Test array - results from database
$test_array[0] = array('period' => '2018', 'payment_type' => 'Direct Debit', 'value' => 0);
$test_array[1] = array('period' => '2018', 'payment_type' => 'Manual', 'value' => 20.85);
$test_array[2] = array('period' => '2018', 'payment_type' => 'Credit Debit Card', 'value' => 0);
// Function to find subarrays by payment type
function searchReportArrayForKeyValue($array, $searchvalue) {
$result_array = Array();
for($i = 0; $i < count($array); ++$i) {
$array_inside = $array[$i];
$keys = array_keys($array_inside);
for($j = 0; $j < count($array_inside); ++$j) {
$value = $array_inside[$keys[$j]];
if($value == $searchvalue) {
$result_array = $array_inside;
}
}
}
return $result_array;
}
var_dump(searchReportArrayForKeyValue($test_array, 'Direct Debit'));
var_dump(searchReportArrayForKeyValue($test_array, 'Manual'));
var_dump(searchReportArrayForKeyValue($test_array, 'Credit Debit Card'));
?>
If I run this code I should get 3 different arrays returned (0, 1, 2 keys from test array), however all three functions calls return 'Credit Debit Card' array key: http://take.ms/hZfec (screenshot).
BUT if I change 'value' => 0, to some float/integer all works as expected, for example if I change test array to this:
$test_array[0] = array('period' => '2018', 'payment_type' => 'Direct Debit', 'value' => 11.2);
$test_array[1] = array('period' => '2018', 'payment_type' => 'Manual', 'value' => 20.85);
$test_array[2] = array('period' => '2018', 'payment_type' => 'Credit Debit Card', 'value' => 10.5);
I get 3 correct different subarrays returned by my three function calls: http://take.ms/SSTu1 (screenshot)
Why this happens? How this ZERO in 'value' break arrays iteration? What is wrong in my function?
P.S.: previously I have 'for each' in code, and changed it to 'for' to make sure this issue does not related to pointer reset in 'for each' inside 'for each', but this does not helped in this situation.
Finally I fixed this with this code changes:
I changed this:
if($value == $searchvalue)
To this:
if(strval($value) == strval($searchvalue))
But I'am still does not understand how and why this gives so strange behavior.
You are using a loose comparison using ==
In every loop, the last comparision is 0 == Direct Debit which is true, and then you set $result_array = $array_inside;
You could see it when you run for example:
echo "$value == $searchvalue = " . ($value == $searchvalue ? "true" : "false") . PHP_EOL;
if($value == $searchvalue) {
You can't compare floating point values directly because not all rational decimal numbers have rational floating-point equivalents.
See: https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems
To properly compare floating point numbers for you need to do something like:
$e = 0.00001;
if( abs($float_a - $float_b) < $e );
Where $e is a sufficiently small margin of error for a particular comparison.
Your solution of casting to string only works because PHP's default float format precision is 14 digits and the imprecision is less than that.
That said, no one should ever use floating point to record amounts of money for these exact reasons, and more. Generally you store the value as an integer of the base unit of currency, eg: $1 == 100 cents, 1BTC == 1,000,000 satoshis.
However there are further concerns, such as safely splitting $4 across 3 recipients, safely rounding amounts without losing pennies, etc. For this reason there is Fowler's Money Pattern, and libraries that implement it.

Count Similar Array Keys

I have a POST request coming to one of my pages, here is a small segment:
[shipCountry] => United States
[status] => Accepted
[sku1] => test
[product1] => Test Product
[quantity1] => 1
[price1] => 0.00
This request can be any size, and each products name and quantity's key would come across as "productN" and "quantityN", where N is an integer, starting from 1.
I would like to be able to count how many unique keys match the format above, which would give me a count of how many products were ordered (a number which is not explicitly given in the request).
What's the best way to do this in PHP?
Well, if you know that every product will have a corresponding array key matching "productN", you could do this:
$productKeyCount = count(preg_grep("/^product(\d)+$/",array_keys($_POST)));
preg_grep() works well on arrays for that kind of thing.
What Gumbo meant with his "use array instead" comment is the following:
In your HTML-form use this:
<input type="text" name="quantity[]" />
and $_POST['quantity'] will then be an array of all containing all of your quantities.
If you need to supply an id you can also do this:
<input type="text" name="quantity[0]" />
$_POST['quantity][0] will then hold the corresponding quantity.
As mentioned by gumbo you could group all parameters describing one item in its own array which usually makes it easier to iterate them. You may not have control over the POST parameters but you can restructure them like e.g. with
<?php
$testdata = array(
'shipCountry' => 'United States',
'status' => 'Accepted',
'sku1' => 'test1',
'product1' => 'Test Product1',
'quantity1' => '1',
'price1' => '0.01',
'sku2' => 'test2',
'product2' => 'Test Product2',
'quantity2' => '2',
'price2' => '0.02'
);
$pattern = '/^(.*\D)(\d+)$/';
$foo = array('items'=>array());
foreach($testdata as $k=>$v) {
if ( preg_match($pattern, $k, $m) ) {
$foo['items'][$m[2]][$m[1]] = $v;
}
else {
$foo[$k] = $v;
}
}
print_r($foo);
Though there be plenty of examples, if you're guaranteed that the numbers should be contiguous, I usually take the approach:
<?php
$i = 1;
while( isset($_POST['product'.$i) )
{
// do something
$i++;
}

Categories