php replace with regex and remove specified pattern with regex

php replace with regex and remove specified pattern with regex - php

|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Subject Name]|h|r
my usual printed out variable is ^
|cffffc700|Hitem:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x|h[SUBJECT_NAME]|h|r
my pattern is ^
ALL X's can be a-Z, 0-9
in one column I have many variables like that (up to 8).
and all variables are mixed with strings like that:
|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Gold]|h|r NEW SOLD |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Copper]|h|r maximum price 15k|affffx312|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Silver]|h|r
In one variable I want to clean all these unnecessary patterns and leave only subject name in brackets. []
So;
|cffffc700|Hitem:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x:x|h[SUBJECT NAME]|h|r
needs to leave only SUBJECT_NAME in my variable.
just to remind, I have always more than one from these pattern in my every variable... (up to 8)
I've searched it everywhere but couldn't find any reasonable answers NOR good patterns. Tried to make it myself but I guess I need to take all these patterns and make it array and clean it and only leave these subject names but I don't know exactly how to do it.
how do I convert this to :
|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Gold]|h|r NEW SOLD |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Copper]|h|r maximum price 15k|affffx312|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Silver]|h|r
this:
Gold NEW SOLD Copper maxiumum price 15k Silver
what should I use, preg_replace?
one more thing left, when I have a string without my special pattern, I get empty result from the function eg:
$str = "15KKK sold, 20KK updated";
expected result:
"15KKK sold, 20KK updated" // same without any pattern
but ^ that one returns EMPTY result..
another string:
$str = "|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Uranium]|h|r 155kk |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Metal]|h|r is sold";
expected result:
"Uranium 155kk Metal is sold"
if I use that function with non-pattern string it returns empty result that's my problem now
thank you very much

I'd do:
$str = '|affffc100|Hitem:bb:101:1:1:1:1:48:-30:47:18:5:2:6:6:0:0:0:0:0:0:0:0|h[Gold]|h|r NEW SOLD |affffc451|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Copper]|h|r maximum price 15k|affffx312|Hitem:bb:101:1:1:1:1:25:-33:12:42:5a:2f:6w:6:0:0:0:0f:0:0a:0b:0|h[Silver]|h|r';
preg_match_all('/h(\[.+?\])\|h\|r([^|]*)/', $str, $m);
for($i=0; $i<count($m[0]); $i++) {
$res .= $m[1][$i] . ' ' . $m[2][$i] . ' ';
}
echo $res,"\n";
Output:
[Gold] NEW SOLD [Copper] maximum price 15k [Silver]
If you want to keep the strings that don't match, test the result of preg_match:
if (preg_match_all('/h(\[.+?\])\|h\|r([^|]*)/', $str, $m)) {
for($i=0; $i<count($m[0]); $i++) {
$res .= $m[1][$i] . ' ' . $m[2][$i] . ' ';
}
} else {
$res = $str;
}
echo $res,"\n";

try this regex:
\|\w{9}\|Hitem(?::-?\w+)+\|h\[(?<SUBJECTNAME>\w+)\]\|h\|r
it will capture each variable sequence, as well as the relevant element name in the named group.
see the demo here

Related

Find numbers on a string and order by them

I have this string
$s = "red2 blue5 black4 green1 gold3";
I need to order by the number, but can show the numbers.
Numbers will always appears at the end of the word.
the result should be like:
$s = "green red gold black blue";
Thanks!

Does it always follow this pattern - separated by spaces?
I would break down the problem as such:
I would first start with parsing the string into an array where the key is the number and the value is the word. You can achieve this with a combination of preg_match_all and array_combine
Then you could use ksort in order to sort by the keys we set with the previous step.
Finally, if you wish to return your result as a string, you could implode the resulting array, separating by spaces again.
An example solution could then be:
<?php
$x = "red2 blue5 black4 green1 gold3";
function sortNumberedWords(string $input) {
preg_match_all('/([a-zA-Z]+)([0-9]+)/', $input, $splitResults);
$combined = array_combine($splitResults[2], $splitResults[1]);
ksort($combined);
return implode(' ', $combined);
}
echo sortNumberedStrings($x);
The regex I'm using here matches two seperate groups (indicated by the brackets):
The first group is any length of a string of characters a-z (or capitalised). Its worth noting this only works on the latin alphabet; it won't match ö, for example.
The second group matches any length of a string of numbers that appears directly after that string of characters.
The results of these matches are stored in $splitResults, which will be an array of 3 elements:
[0] A combined list of all the matches.
[1] A list of all the matches of group 1.
[2] A list of all the matches of group 2.
We use array_combine to then combine these into a single associative array. We wish for group 2 to act as the 'key' and group 1 to act as the 'value'.
Finally, we sort by the key, and then implode it back into a string.

$s = "red2 blue5 black4 green1 gold3";
$a=[];
preg_replace_callback('/[a-z0-9]+/',function($m) use (&$a){
$a[(int)ltrim($m[0],'a..z')] = rtrim($m[0],'0..9');
},$s);
ksort($a);
print " Version A: ".implode(' ',$a);
$a=[];
foreach(explode(' ',$s) as $m){
$a[(int)ltrim($m,'a..z')] = rtrim($m,'0..9');
}
ksort($a);
print " Version B: ".implode(' ',$a);
preg_match_all("/([a-z0-9]+)/",$s,$m);
foreach($m[1] as $i){
$a[(int)substr($i,-1,1)] = rtrim($i,'0..9');
}
ksort($a);
print " Version C: ".implode(' ',$a);
Use one of them, but also try to understand whats going on here.

PHP: How to extract a substring from a specified index until the next whitespace or end of line

I have an input string:
$subject = "This punctuation! And this one. Does n't space that one."
I also have an array containing exceptions to the replacement I wish to perform, currently with one member:
$exceptions = array(
0 => "n't"
);
The reason for the complicated solution I would like to achieve is because this array will be extended in future and could potentially include hundreds of members.
I would like to insert whitespace at word boundaries (duplicate whitespace will be removed later). Certain boundaries should be ignored, though. For example, the exclamation mark and full stops in the above sentence should be surrounded with whitespace, but the apostrophe should not. Once duplicate whitespaces are removed from the final result with trim(preg_replace('/\s+/', ' ', $subject));, it should look like this:
"This punctuation ! And this one . Does n't space that one ."
I am working on a solution as follows:
Use preg_match('\b', $subject, $offsets, 'PREG_OFFSET_CAPTURE'); to gather an array of indexes where whitespace may be inserted.
Iterate over the $offsets array.
split $subject from whitespace before the current offset until the next whitespace or end of line.
check if result of split is contained within $exceptions array.
if result of split is not contained within exceptions array, insert whitespace character at current offset.
So far I have the following code:
$subject="This punctuation! And this one. Does n't space that one.";
$pattern = '/\b/';
preg_match($pattern, $subject, $offsets, PREG_OFFSET_CAPTURE );
if(COUNT($offsets)) {
$indexes = array();
for($i=0;$i<COUNT($offsets);$i++) {
$offsets[$i];
$substring = '?';
// Replace $substring with substring from after whitespace prior to $offsets[$i] until next whitespace...
if(!array_search($substring, $exceptions)) {
$indexes[] = $offsets[$i];
}
}
// Insert whitespace character at each offset stored in $indexes...
}
I can't find an appropriate way to create the $substring variable in order to complete the above example.

$res = preg_replace("/(?:n't|ALL EXCEPTIONS PIPE SEPARATED)(*SKIP)(*F)|(?!^)(?<!\h)\b(?!\h)/", " ", $subject);
echo $res;
Output:
This punctuation ! And this one . Doesn't space that one .
Demo & explanation

One "easy" (but not necessarily fast, depending on how many exceptions you have) solution would be to first replace all the exceptions in the string with something unique that doesn't contain any punctuation, then perform your replacements, then convert back the unique replacement strings into their original versions.
Here's an example using md5 (but could be lots of other things):
$subject = "This punctuation! And this one. Doesn't space that one.";
$exceptions = ["n't"];
foreach ($exceptions as $exception) {
$result = str_replace($exception, md5($exception), $subject);
}
$result = preg_replace('/[^a-z0-9\s]/i', ' \0', $result);
foreach ($exceptions as $exception) {
$result = str_replace(md5($exception), $exception, $result);
}
echo $result; // This punctuation ! And this one . Doesn't space that one .
Demo

How can I check whether the input string partailly matches any word in the given array in php?

For example my input string is
$edition = Vol.123 or Edition 1920 or Volume 951 or Release A20 or Volume204 or Edition967
How can I check the words in string matches any word in the array.
$editionFormats = ['Vol','Volume','Edition','Release'];
Basically I need to check whether the input has Vol or Volume or Edition or Release.
Can anyone please provide a way to check the pattern?
I tried with str_pos(), preg_grep(), preg_match(), split(), str_split() What I thought was to split the string after the first occurance of period or white space or numeric ,
but wasnt able to find it.

Solution with regexp:
$edition[] = 'Vol.123';
$edition[] = 'Edition 1920';
$edition[] = 'Volume 951';
$edition[] = 'Release A20';
$edition[] = 'Unknown data';
$editionFormats = ['Vol','Volume','Edition','Release'];
$pattern = implode('|', $editionFormats);
foreach ($edition as $e) {
if (preg_match('/' . $pattern. '/', $e)) {
echo $e . ' matches' . PHP_EOL;
} else {
echo $e . ' NOT matches' . PHP_EOL;
}
}
Fiddle.

Assuming your input is a single string (it wasn't obvious to me from the question)
A non regex way to do it is to look at the intersection between the set of words in the incoming string, and the set of words you're interested in:
$edition = 'Vol.123 or Edition 1920 or Volume 951 or Release A20'
$editionFormats = ['Vol','Volume','Edition','Release'];
// Break $edition into single words on, on space character.
$edition_words = explode(" ", $edition);
$present = !empty(array_intersect($edition_words, $editionFormats));
If you had meant that the $edition would be just one of those;
i.e.
$edition = 'Volume 951'
This approach will still work; Note that splitting on the space character only works if there is a space, so your 'Vol.123' wouldn't get matched, unless you also included 'Vol.' in your $editionFormats.

Find a Word after a particular word

I am not so expert in the regular expression. I have created a function to find out a particular word after a first match word from an array. the code is
$searchkey1 = array('NUMBER', 'REF', 'TRANSACTION NUMBER');
$searchkey2 = array('CURRENT BALANCE RS.', 'NEW BALANCE IS');
$smsdata = "TRANSACTION NUMBER NER13082010132400255 TO RECHARGE 198 INR TO 9999999999 IS SUCCESSFUL. YOUR NEW BALANCE IS 8183.79 INR.";
function get_details($searchkey, $smsdata)
{
foreach ($searchkey as $key) {
$result = preg_match_all("/(?<=(" . $key . "))(\s\w*)/", $smsdata, $networkID);
$myanswer = #$networkID[0][0];
}
return $myanswer;
}
echo get_details($searchkey1, $smsdata);
echo "<BR>";
echo get_details($searchkey2, $smsdata);
I am using this function to find out other words also. and it is working find for any other words except the decimal value.
Here in my example code the return value is 11192 but i would like to get with decimal as 11192.75
Please rectify my error.
$result=preg_match_all("/(?<=(".$key."))(\s\w*)/",$smsdata,$networkID);
give the result NER13082010132400255 and 8183 //here the demical is ignore.
$result=preg_match_all("/(?<=(".$key."))\s*(\d*\.?\d*)/",$smsdata,$networkID);
give the result "blank" and 8183.79 // here the first results is blank
My desire result is NER13082010132400255 and 8183.79

i don't believe \w covers period. use a mixture of \d and \.
$result=preg_match_all("/(?<=(".$key."))(\s[\d\.]+)/",$smsdata,$networkID);

The OP's regexp captures the leading space too.
#DevZer0's regexp captures the trailing dot too.
This gives the balance only:
$result=preg_match_all("/(?<=(".$key."))\s*(\d*\.?\d*)/",$smsdata,$networkID);

Using regex to fix phone numbers in a CSV with PHP

My new phone does not recognize a phone number unless its area code matches the incoming call. Since I live in Idaho where an area code is not needed for in-state calls, many of my contacts were saved without an area code. Since I have thousands of contacts stored in my phone, it would not be practical to manually update them. I decided to write the following PHP script to handle the problem. It seems to work well, except that I'm finding duplicate area codes at the beginning of random contacts.
<?php
//the script can take a while to complete
set_time_limit(200);
function validate_area_code($number) {
//digits are taken one by one out of $number, and insert in to $numString
$numString = "";
for ($i = 0; $i < strlen($number); $i++) {
$curr = substr($number,$i,1);
//only copy from $number to $numString when the character is numeric
if (is_numeric($curr)) {
$numString = $numString . $curr;
}
}
//add area code "208" to the beginning of any phone number of length 7
if (strlen($numString) == 7) {
return "208" . $numString;
//remove country code (none of the contacts are outside the U.S.)
} else if (strlen($numString) == 11) {
return preg_replace("/^1/","",$numString);
} else {
return $numString;
}
}
//matches any phone number in the csv
$pattern = "/((1? ?\(?[2-9]\d\d\)? *)? ?\d\d\d-?\d\d\d\d)/";
$csv = file_get_contents("contacts2.CSV");
preg_match_all($pattern,$csv,$matches);
foreach ($matches[0] as $key1 => $value) {
/*create a pattern that matches the specific phone number by adding slashes before possible special characters*/
$pattern = preg_replace("/\(|\)|\-/","\\\\$0",$value);
//create the replacement phone number
$replacement = validate_area_code($value);
//add delimeters
$pattern = "/" . $pattern . "/";
$csv = preg_replace($pattern,$replacement,$csv);
}
echo $csv;
?>
Is there a better approach to modifying the CSV? Also, is there a way to minimize the number of passes over the CSV? In the script above, preg_replace is called thousands of times on a very large String.

If I understand you correctly, you just need to prepend the area code to any 7-digit phone number anywhere in this file, right? I have no idea what kind of system you're on, but if you have some decent tools, here are a couple options. And of course, the approaches they take can presumably be implemented in PHP; that's just not one of my languages.
So, how about a sed one-liner? Just look for 7-digit phone numbers, bounded by either beginning of line or comma on the left, and comma or end of line on the right.
sed -r 's/(^|,)([0-9]{3}-[0-9]{4})(,|$)/\1208-\2\3/g' contacts.csv
Or if you want to only apply it to certain fields, perl (or awk) would be easier. Suppose it's the second field:
perl -F, -ane '$"=","; $F[1]=~s/^[0-9]{3}-[0-9]{4}$/208-$&/; print "#F";' contacts.csv
The -F, indicates the field separator, the $" is the output field separator (yes, it gets assigned once per loop, oh well), the arrays are zero-indexed so second field is $F[1], there's a run-of-the-mill substitution, and you print the results.

Ah programs... sometimes a 10-min hack is better.
If it were me... I'd import the CSV into Excel, sort it by something - maybe the length of the phone number or something. Make a new col for the fixed phone number. When you have a group of similarly-fouled numbers, make a formula to fix. Same for the next group. Should be pretty quick, no? Then export to .csv again, omitting the bad col.

A little more digging on my own revealed the issues with the regex in my question. The problem is with duplicate contacts in the csv.
Example:
(208) 555-5555, 555-5555
After the first pass becomes:
2085555555, 208555555
and After the second pass becomes
2082085555555, 2082085555555
I worked around this by changing the replacement regex to:
//add escapes for special characters
$pattern = preg_replace("/\(|\)|\-|\./","\\\\$0",$value);
//add delimiters, and optional area code
$pattern = "/(\(?[0-9]{3}\)?)? ?" . $pattern . "/";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php replace with regex and remove specified pattern with regex - php

try this regex: \|\w{9}\|Hitem(?::-?\w+)+\|h\[(?<SUBJECTNAME>\w+)\]\|h\|r it will capture each variable sequence, as well as the relevant element name in the named group. see the demo here

Related

Find numbers on a string and order by them

PHP: How to extract a substring from a specified index until the next whitespace or end of line

How can I check whether the input string partailly matches any word in the given array in php?

Find a Word after a particular word

Using regex to fix phone numbers in a CSV with PHP

Categories

Resources