split on several chars and include them in the parts

split on several chars and include them in the parts - php

I want to split a string on several chars (being +, ~, > and #, but I want those chars to be part of the returned parts.
I tried:
$parts = preg_split('/\+|>|~|#/', $input, PREG_SPLIT_DELIM_CAPTURE);
The result is only 2 parts where there should be 5 and the split-char isn't part of part [1].
I also tried:
$parts = preg_split('/\+|>|~|#/', $input, PREG_SPLIT_OFFSET_CAPTURE);
The result is then 1 part too few (4 instead of 5) and the last part contains a split-char.
Without flags in preg_split, the result is almost perfect (as many parts as there should be) but all the split-chars are gone.
Example:
$input = 'oele>boele#4 + key:type:id + *~the end'; // spaces should be ignored
$output /* should be: */
array( 'oele', '>boele', ' #4 ', '+ key:type:id ', '+ *', '~the end' );
Is there a spl function or flag to do this or do I have to make one myself =(

$parts = preg_split('/(?=[+>~#])/', $input);
See it
Since you want to have the delimiters to be part of the next split piece, your split point is right before the delimiter and this can be easily done using positive look ahead.
(?= : Start of positive lookahead
[+>~#] : character class to match any of your delimiters.
) : End of look ahead assertion.
Effectively you are asking preg_split to split the input string at points just before delimiters.

You're missing an assignment for the limit parameter which is why it's returning less than you expected, try:
$parts = preg_split('/\+|>|~|#/', $input, -1, PREG_SPLIT_OFFSET_CAPTURE);

well i had the same problem in the past. You have to parenthese your regexp with brackets and then it hopefully works
$parts = preg_split('/(\+|>|~|#)/', $input, PREG_SPLIT_OFFSET_CAPTURE);
and here is it explained: http://www.php.net/manual/en/function.preg-split.php#94238

Ben is correct.
Just to add to his answer, PREG_SPLIT_DELIM_CAPTURE is a constant with value of 2 so you get 2 splits, similarly PREG_SPLIT_OFFSET_CAPTURE has a value of 4.

Related

Change 10.28 by 1028

I have a problem with a string to convert in number. I am not good with this elements !\d+!
I used that but the apporach is not correct.
Thank you.
preg_match_all('!\d+!', $product_price[$i], $matches);
$price_extracted = (float)implode('.', $matches[0]);
$item['normal_price'] = $price_extracted;
if ($item['normal_price'] > 800) ......
I have this result
1 299,99 $ (orginal) is converted in 1.2999 and must be 1299.99
549,99 $ (orginal) is converted in 549.99 and must be 549.99
44,99 $ (orginal) is converted in 44.99 and must be 44.99

The problem with your approach is, that you put the digits that are not separated by anything into an array.
This means that with the first string that you provided, where the thousand dollars is seperated by a whitespace is being registered as one of these matches.
preg_match_all('!\d+!', '1 299,99 $', $matches) -> returns an array as follows:
$matches[0] = 1
$matches[1] = 299
$matches[2] = 99
If you take my approach though and first replace all whitespaces by nothing and then split the numbers into the array...:
preg_match_all('!\d+!', preg_replace('/\s/', '', '1 299,99 $'), $matches) -> returns following array:
$matches[0] = 1299
$matches[1] = 99
after that you can still implode them:
$price_exctracted = (float)implode(".", $matches);
EDIT
A little explanation about preg_replace, preg_match_all and regex:
The regex '!\d+!' (I don't actually know why there would be '!' instead of '/' but if it works...) searches for digits (\d). The "+" refers to "one or more". So the line
preg_match_all('!\d+!', 'someString', $myArray)
could be translated into english as follows:
Find all occurances of digits, be it one or more,
and put these occurances separated into one index of $myArray.
The second regex used in my solution, '/\s/' , is used to search for whitespaces. The "preg_replace"-function is an easy "find and replace" function concluding in:
preg_replace('/\s/', '', 'someString')
translated to english:
Find all occurances of whitespaces and replace them with nothing in 'someString'
For reference:
preg_match_all
preg_replace
regex cheat sheet
Conditions can be checked on:
PHP Live Regex

Regex rules in an array

Maybe it can not be solved this issue as I want, but maybe you can help me guys.
I have a lot of malformed words in the name of my products.
Some of them has leading ( and trailing ) or maybe one of these, it is same for / and " signs.
What I do is that I am explode the name of the product by spaces, and examines these words.
So I want to replace them to nothing. But, a hard drive could be 40GB ATA 3.5" hard drive. I need to process all the word, but I can not use the same method for 3.5" as for () or // because this 3.5" is valid.
So I only need to replace the quotes, when it is at the start of the string AND at end of the string.
$cases = [
'(testone)',
'(testtwo',
'testthree)',
'/otherone/',
'/othertwo',
'otherthree/',
'"anotherone',
'anothertwo"',
'"anotherthree"',
];
$patterns = [
'/^\(/',
'/\)$/',
'~^/~',
'~/$~',
//Here is what I can not imagine, how to add the rule for `"`
];
$result = preg_replace($patterns, '', $cases);
This is works well, but can it be done in one regex_replace()? If yes, somebody can help me out the pattern(s) for the quotes?
Result for quotes should be this:
'"anotherone', //no quote at end leave the leading
'anothertwo"', //no quote at start leave the trailin
'anotherthree', //there are quotes on start and end so remove them.

You may use another approach: rather than define an array of patterns, use one single alternation based regex:
preg_replace('~^[(/]|[/)]$|^"(.*)"$~s', '$1', $s)
See the regex demo
Details:
^[(/] - a literal ( or / at the start of the string
| - or
[/)]$ - a literal ) or / at the end of the string
| - or
^"(.*)"$ - a " at the start of the string, then any 0+ characters (due to /s option, the . matches a linebreak sequence, too) that are captured into Group 1, and " at the end of the string.
The replacement pattern is $1 that is empty when the first 2 alternatives are matched, and contains Group 1 value if the 3rd alternative is matched.
Note: In case you need to replace until no match is found, use a preg_match with preg_replace together (see demo):
$s = '"/some text/"';
$re = '~^[(/]|[/)]$|^"(.*)"$~s';
$tmp = '';
while (preg_match($re, $s) && $tmp != $s) {
$tmp = $s;
$s = preg_replace($re, '$1', $s);
}
echo $s;

This works
preg_replace([[/(]?(.+)[/)]?|/\"(.+)\"/], '$1', $string)

PHP regex for math operations

So i'm trying to create a regex without success.
This is what i get as in input string:
String A: "##(ABC 50a- {+} UDF 69,22g,-) {*} 3##"
String B: "##ABC 0,10,- DEF {/} 9 ABC {*} UHG 3-##"
And this is what i need processed out of the regex:
Result A: "(50+69,22)*3"
String B: "0,10/9*3"
I just can't get the number replacement combined with the operation symbols.
This is what i got:
'/[^0-9\+\-\*\/\(\)\.]/'
Thankful for every help.

One simple solution consists of getting rid of everything you don't want.
So replace this:
\{(.+?)\}|[^0-9,{}()]+|(?<!\d),|,(?!\d)
With $1.
Simple enough:
$input = "(ABC 50a- {+} UDF 69,22g,-) {*} 3";
$output = preg_replace('#\{(.+?)\}|[^0-9,{}()]+|(?<!\d),|,(?!\d)#', '$1', $input);
\{(.+?)\} part matches everything inside {...} and outputs it (it gets replaced by $1)
[^0-9,{}()]+ gets rid of every character not belonging to the ones we're trying to keep (it's replaced with an empty string)
(?<!\d),|,(?!\d) throws out commas which are not part of a number
Unfortunately, I can't say much else without a better spec.

A good start would be to write down in words the patterns that you want to match. For instance, you've said that you know the operations are inside {}, but that doesn't appear anywhere in your first attempt at a regex.
You can also break it down into separate sections, and then build it up later. So for instance you might say:
if you see parentheses, keep them in the final answer
a number is made up either of digits...
...or digits followed by a comma and more digits
an operation is always in curly braces, and is either +, -, *, or /
everything else should be thrown away
Given the above list:
matching parentheses is easy: [()]
matching a digit can be done with [0-9] or \d; at least one is +; so "digits" is \d+
comma digits is easy: ,\d+; make it optional with ?and you get \d+(,\d+)?
any of four operations is just [+*/-]; escape the / and - to get [+*\/\-] don't forget that { and } have special meanings in regexes, so need to be escaped as \{ and \}; our list of operations in braces becomes: \{[+*\/\-]\}
Now we have to put it together; one way would be to use preg_match_all to find all occurences of any of those patterns, in order, and then we can stick them back together. So our regex is just "this or this or this or this":
/[()]|\d+(,\d+)?|\{[+*\/\-]\}/
I haven't tested this, but given the explanation of how I arrived at it, hopefully you can figure out how to test parts of it and tweak it if necessary.

I`m not good at regex but I found another approach:
Do EXTRA check of input before running eval!!!
$string = "(ABC 50a- {+} UDF 69,22g) {*} 3";
$new ='';
$string = str_split($string);
foreach($string as $char) {
if(!ctype_alnum($char) || ctype_digit($char) ){
//you don't want letters, except symbols like {, ( etc
$new .=$char;
}
}
//echo $new; will output -> ( 50- {+} 69,22) {*} 3
//remove the brackets although you could put it in the if statement ...
$new = str_replace(array('{','}'),array('',''), $new);
//floating point numbers use dot not comma
$new = str_replace(',','.', $new);
$p = eval('return '.$new.';');
print $p; // -57.66
Used: ctype_digit, ctype_alnum, eval, str_split, str_replace
P.S: I assumed that the minus before the base operation is taken into account.

Just a quick try before leaving the office ;-)
$data = array(
"(ABC 50a- {+} UDF 69,22g) {*} 3",
"ABC 0,10- DEF {/} 9 ABC {*} UHG 3-"
);
foreach($data as $d) {
echo $d . " = " . extractFormula($d) . "\n";
}
function extractFormula($string) {
$regex = '/([()])|([0-9]+(,[0-9]+)?)|\{([+\*\/-])\}/';
preg_match_all($regex, $string, $matches);
$formula = implode(' ', $matches[0]);
$formula = str_replace(array('{', '}'),NULL,$formula);
return $formula;
}
Output:
(ABC 50a- {+} UDF 69,22g) {*} 3 = ( 50 + 69,22 ) * 3
ABC 0,10- DEF {/} 9 ABC {*} UHG 3- = 0,10 / 9 * 3
If some one likes to fiddle around with the code, here is a live example: http://sandbox.onlinephpfunctions.com/code/373d76a9c0948314c1d164a555bed847f1a1ed0d

How to split string to array while retaining its punctuation mark in PHP [duplicate]

For example, I have an article should be splitted according to sentence boundary such as ".", "?", "!" and ":".
But as well all know, whether preg_split or explode function, they both remove the delimiter.
Any help would be really appreciated!
EDIT:
I can only come up with the code below, it works great though.
$content=preg_replace('/([\.\?\!\:])/',"\\1[D]",$content);
Thank you!!! Everyone. It is only five minutes for getting 3 answers! And I must apologize for not being able to see the PHP manual carefully before asking question. Sorry.

I feel this is worth adding. You can keep the delimiter in the "after" string by using regex lookahead to split:
$input = "The address is http://stackoverflow.com/";
$parts = preg_split('#(?=http://)#', $input);
// $parts[1] is "http://stackoverflow.com/"
And if the delimiter is of fixed length, you can keep the delimiter in the "before" part by using lookbehind:
$input = "The address is http://stackoverflow.com/";
$parts = preg_split('#(?<=http://)#', $input);
// $parts[0] is "The address is http://"
This solution is simpler and cleaner in most cases.

You can set the flag PREG_SPLIT_DELIM_CAPTURE when using preg_split and capture the delimiters too. Then you can take each pair of 2‍n and 2‍n+1 and put them back together:
$parts = preg_split('/([.?!:])/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = [];
for ($i = 0, $n = count($parts) - 1; $i <= $n; $i += 2) {
$sentences[] = $parts[$i] . ($parts[$i+1] ?? '');
}
Note to pack the splitting delimiter into a group, otherwise they won’t be captured.

preg_split with PREG_SPLIT_DELIM_CAPTURE flag
For example
$parts = preg_split("/([\.\?\!\:])/", $string, -1, PREG_SPLIT_DELIM_CAPTURE);

Try T-Regx
<?php
$parts = pattern('([.?!:])')->split($string);

Parsing English sentences has a lot of nuance and fringe cases. This makes crafting a perfect parser very difficult to do. It is important to have sufficient test cases using your real project data to make sure that you are covering all scenarios.
There is no need to use lookarounds or capture groups for this task. You simply match the punctuation symbol(s), then forget them with \K, then match one or more whitespace characters that occurs between sentences. Using the PREG_SPLIT_NO_EMPTY flag prevents creating empty elements if your string starts with or ends with characters that satisfy the pattern.
Code: (Demo)
$str = 'Heading: This is a string. Very exciting! What do you think? ...one more thing, this is cool.';
var_export(
preg_split('~[.?!:]+\K\s+~', $str, 0, PREG_SPLIT_NO_EMPTY)
);
Output:
array (
0 => 'Heading:',
1 => 'This is a string.',
2 => 'Very exciting!',
3 => 'What do you think?',
4 => '...one more thing, this is cool.',
)

How to manipulate a string so I can make implicit multiplication explicit in a math expression?

I want to manipulate a string like "...4+3(4-2)-...." to become "...4+3*(4-2)-....", but of course it should recognize any number, d, followed by a '(' and change it to 'd*('. And I also want to change ')(' to ')*(' at the same time if possible. Would nice if there is a possibility to add support for constants like pi or e too.
For now, I just do it this stupid way:
private function make_implicit_multiplication_explicit($string)
{
$i=1;
if(strlen($string)>1)
{
while(($i=strpos($string,"(",$i))!==false)
{
if(strpos("0123456789",substr($string,$i-1,1)))
{
$string=substr_replace($string,"*(",$i,1);
$i++;
}
$i++;
}
$string=str_replace(")(",")*(",$string);
}
return $string;
}
But I Believe this could be done much nicer with preg_replace or some other regex function? But those manuals are really cumbersome to grasp, I think.

Let's start by what you are looking for:
either of the following: ((a|b) will match either a or b)
any number, \d
the character ): \)
followed by (: \(
Which creates this pattern: (\d|\))\(. But since you want to modify the string and keep both parts, you can group the \( which results in (\() making it worse to read but better to handle.
Now everything left is to tell how to rearrange, which is simple: \\1*\\2, leaving you with code like this
$regex = "/(\d|\))(\()/";
$replace = "\\1*\\2";
$new = preg_replace($regex, $replace, $test);
To see that the pattern actually matches all cases, see this example.

To recognize any number followed by a ( OR a combination of a )( and place an asterisk in between them, you can use a combination of lookaround assertions.
echo preg_replace("/
(?<=[0-9)]) # look behind to see if there is: '0' to '9', ')'
(?=\() # look ahead to see if there is: '('
/x", '*', '(4+3(4-2)-3)(2+3)');
The Positive Lookbehind asserts that what precedes is either a number or right parentheses. While the Positive Lookahead asserts that the preceding characters are followed by a left parentheses.
Another option is to use the \K escape sequence in replace of the Lookbehind. \K resets the starting point of the reported match. Any previously consumed characters are no longer included ( throws away everything that it has matched up to that point. )
echo preg_replace("/
[0-9)] # any character of: '0' to '9', ')'
\K # resets the starting point of the reported match
(?=\() # look ahead to see if there is: '('
/x", '*', '(4+3(4-2)-3)(2+3)');

Your php code should be,
<?php
$mystring = "4+3(4-2)-(5)(3)";
$regex = '~\d+\K\(~';
$replacement = "*(";
$str = preg_replace($regex, $replacement, $mystring);
$regex1 = '~\)\K\(~';
$replacement1 = "*(";
echo preg_replace($regex1, $replacement1, $str);
?> //=> 4+3*(4-2)-(5)*(3)
Explanation:
~\d+\K\(~ this would match the one or more numbers followed by a (. Because of \K it excludes the \d+
Again it replaces the matched part with *( which in turn produces 3*( and the result was stored in another variable.
\)\K\( Matches )( and excludes the first ). This would be replaced by *( which in turn produces )*(
DEMO 1
DEMO 2

Silly method :^ )
$value = '4+3(4-2)(1+2)';
$search = ['1(', '2(', '3(', '4(', '5(', '6(', '7(', '8(', '9(', '0(', ')('];
$replace = ['1*(', '2*(', '3*(', '4*(', '5*(', '6*(', '7*(', '8*(', '9*(', '0*(', ')*('];
echo str_replace($search, $replace, $value);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

split on several chars and include them in the parts - php

You're missing an assignment for the limit parameter which is why it's returning less than you expected, try: $parts = preg_split('/\+|>|~|#/', $input, -1, PREG_SPLIT_OFFSET_CAPTURE);

well i had the same problem in the past. You have to parenthese your regexp with brackets and then it hopefully works $parts = preg_split('/(\+|>|~|#)/', $input, PREG_SPLIT_OFFSET_CAPTURE); and here is it explained: http://www.php.net/manual/en/function.preg-split.php#94238

Ben is correct. Just to add to his answer, PREG_SPLIT_DELIM_CAPTURE is a constant with value of 2 so you get 2 splits, similarly PREG_SPLIT_OFFSET_CAPTURE has a value of 4.

Related

Change 10.28 by 1028

Regex rules in an array

PHP regex for math operations

How to split string to array while retaining its punctuation mark in PHP [duplicate]

How to manipulate a string so I can make implicit multiplication explicit in a math expression?

Categories

Resources