I'm converting text from a txt file into an array.I need to shred the texts in this array using regex.
This is the array in my text file.
Array
(
[0] => 65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89
[1] => 06W01232BOOK IS SUCCESSFUL1.000YJ160.00021.853 496.00
[2] => 67E45643DO YOU HAVE A PEN? 7/56.450EQ9000.3451.432 765.12
)
if I need to explain a line as an example,
input => 65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89
required sections => 65S34523 APPLE IS VERY BEAUTIFUL 6.000 TX 786.345 63.67 5 234.89
target I want :
Array
(
[0] => 65S34523
[1] => APPLE IS VERY BEAUTIFUL
[2] => TX
[3] => 786.345
)
I need multiple regex patterns to achieve this.I need to shred the data I want in order in a loop.but since there is no specific layout, I don't know what to choose according to the regex patterns.
I've tried various codes to smash this array.
$smash =
array('65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89',
'06W01232BOOK IS SUCCESSFUL1.000YJ160.00021.853 496.00',
'67E45643DO YOU HAVE A PEN? 7/56.450EQ9000.3451.432 765.12');
I'm trying to foreach and parse the array.For example, I tried to get the text first.
foreach ($smash as $row) {
$delete_numbers = preg_replace('/\d/', '', $smash);
}
echo "<pre>";
print_r($delete_numbers);
echo "</pre>";
While it turned out it was that way.
Array
(
[0] => SAPPLE IS VERY BEAUTIFUL.TX.. .
[1] => WBOOK IS SUCCESSFUL.YJ.. .
[2] => EDO YOU HAVE A PEN? /.EQ.. .
)
Naturally, this is not what I want.Each array has a different structure.So i have to check with if-else too.
As you can see in the example, there is no pure text.Here
TX,YJ,EQ should be deleted.The dots should be wiped using apples.The first letters at the beginning of the text should
be removed.The remaining special characters must be removed.
I have tried many of the above.I have looked at alternative examples.
AS A RESULT;
I'm in a dead end.
Code: (Demo)
$smash = ['65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89',
'06W01232BOOK IS SUCCESSFUL1.000YJ160.00021.853 496.00',
'67E45643DO YOU HAVE A PEN? 7/56.450EQ9000.3451.432 765.12'];
foreach ($smash as $line) {
$result[] = preg_match('~(\w+\d)(\D+)[^A-Z]+([A-Z]{2})(\d+\.\d{3})~', $line, $out) ? array_slice($out, 1) : [];
}
var_export($result);
Output:
array (
0 =>
array (
0 => '65S34523',
1 => 'APPLE IS VERY BEAUTIFUL',
2 => 'TX',
3 => '786.345',
),
1 =>
array (
0 => '06W01232',
1 => 'BOOK IS SUCCESSFUL',
2 => 'YJ',
3 => '160.000',
),
2 =>
array (
0 => '67E45643',
1 => 'DO YOU HAVE A PEN? ',
2 => 'EQ',
3 => '9000.345',
),
)
My pattern assumes:
The first group will consist of numbers and letters and conclude with a digit.
The second group contains no digits.
The third group is consistently 2 uppercase letters.
The fourth group will reliably have three decimal places.
p.s. If you don't want that pesky trailing space after PEN?, you could use this:
https://3v4l.org/9XpA6
~(\w+\d)([^\d ]+(?: [^\d ]+)*) ?[^A-Z]+([A-Z]{2})(\d+\.\d{3})~
Related
So I'm using the next thing:
<?php
$string='JUD. NEAMT, SAT ROMEDC ALEXANDRE COM. COMENKA, STR. EXAMMS RANTEM, NR.6';
$result=preg_split("/(?:JUD.\s*|\s*SAT\s*|\s*COM\.\s*|\s*STR.\s*|\s*SECTOR\s*|\s*B-DUL\s*|\s*NR\.\s*|\s*ET.\s*|\s*MUN\.\s*|\s*BL.\s*|\s*SC\.\s*|\s*AP\.\s*)/", $string);
array_walk($result,function($value,$key) use (&$result){
if(stristr($value, ","))
{
$result[$key]=explode(",", $value)[0];
}
});
print_r(array_filter($result));
the output would be:
Array
(
[1] => NEAMT
[2] => ROMEDC ALEXANDRE
[3] => COMENKA
[4] => EXAMMS RANTEM
[5] => 6
)
The main problem is that $string is different each time and can contain different parameters like 'SAT' could simply not appear in another string because is replaced by 'SECTOR'.
All these are localization words like House number('NR.') or Town name('JUD').
What I want is to convert the above array into something like this:
Array
(
['JUD'] => NEAMT
['SAT'] => ROMEDC ALEXANDRE
['COM'] => COMENKA
['STR'] => EXAMMS RANTEM
['NR'] => 6
)
I hope you got the idea:
I'm getting from a string 'address' different parameters like apartment number , building number and so on (it depends each time on the customer-- he might be living at a house so there is no apartment number) so having words instead of numbers in the array would help me output the info in different columns.
Any idea is welcome.
Thank you.
$string='JUD. NEAMT, SAT ROMEDC ALEXANDRE COM. COMENKA, STR. EXAMMS RANTEM, NR.6';
//fix missing commas
$string = preg_replace('#([A-Z]+) ([A-Z]+\.)#',"$1, $2",$string);
//a trick to fix non space on `NR.6`
$string = str_replace(['.',' '],['. ',' '],$string);
//get the part seperated by comma, trim to remove spaces
$ex = array_map('trim',explode(',',$string));
//iterate over it
foreach($ex as $e){
//explode the part by space
$new = array_map('trim',explode(' ',$e));
//take the first part as key, remove spaces and dot
$key = trim(array_shift($new),' . ');
//collect via key and implode rest with a space
$coll[$key]=implode(' ',$new);
}
//done
print_r($coll);
Result:
Array
(
[JUD] => NEAMT
[SAT] => ROMEDC ALEXANDRE
[COM] => COMENKA
[STR] => EXAMMS RANTEM
[NR] => 6
)
a fast lane to rome...
Below is that data I'm trying to parse:
50‐59 1High300.00 Avg300.00
90‐99 11High222.00 Avg188.73
120‐1293High204.00 Avg169.33
The first section is a weight range, next is a count, followed by Highprice, ending with Avgprice.
As an example, I need to parse the data above into an array which would look like
[0]50-59
[1]1
[2]High300.00
[3]Avg300.00
[0]90-99
[1]11
[2]High222.00
[3]Avg188.73
[0]120‐129
[1]3
[2]High204.00
[3]Avg169.33
I thought about creating an array of what the possible weight ranges can be but I can't figure out how to use the values of the array to split the string.
$arr = array("10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109","110-119","120-129","130-139","140-149","150-159","160-169","170-179","180-189","190-199","200-209","210-219","220-229","230-239","240-249","250-259","260-269","270-279","280-289","290-299","300-309");
Any ideas would be greatly appreciated.
Hope this will work:
$string='50-59 1High300.00 Avg300.00
90-99 11High222.00 Avg188.73
120-129 3High204.00 Avg169.33';
$requiredData=array();
$dataArray=explode("\n",$string);
$counter=0;
foreach($dataArray as $data)
{
if(preg_match('#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#', $data,$matches))
{
$requiredData[$counter][]=$matches[1];
$requiredData[$counter][]=$matches[2];
$requiredData[$counter][]=$matches[3];
$requiredData[$counter][]=$matches[4];
$counter++;
}
}
print_r($requiredData);
'#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#'
I don't think that will work because of the space you have in the regex
between the weight and count. The thing I'm struggling with is a row
like this where there is no space. 120‐1293High204.00 Avg169.33 that
needs to be parsed like [0]120‐129 [1]3 [2]High204.00 [3]Avg169.33
You are right. That can be remedied by limiting the number of weight digits to three and making the space optional.
'#^(\d+-\d{1,3}) *…
$arr = array('50-59 1High300.00 Avg300.00',
'90-99 11High222.00 Avg188.73',
'120-129 3High204.00 Avg169.33');
foreach($arr as $str) {
if (preg_match('/^(\d+-\d{1,3})\s*(\d+)(High\d+\.\d\d) (Avg\d+\.\d\d)/i', $str, $m)) {
array_shift($m); //remove group 0 (ie. the whole match)
$result[] = $m;
}
}
print_r($result);
Output:
Array
(
[0] => Array
(
[0] => 50-59
[1] => 1
[2] => High300.00
[3] => Avg300.00
)
[1] => Array
(
[0] => 90-99
[1] => 11
[2] => High222.00
[3] => Avg188.73
)
[2] => Array
(
[0] => 120-129
[1] => 3
[2] => High204.00
[3] => Avg169.33
)
)
Explanation:
/ : regex delimiter
^ : begining of string
( : start group 1
\d+-\d{1,3} : 1 or more digits a dash and 1 upto 3 digits ie. weight range
) : end group 1
\s* : 0 or more space character
(\d+) : group 2 ie. count
(High\d+\.\d\d) : group 3 literal High followed by price
(Avg\d+\.\d\d) : Group 4 literal Avg followed by price
/i : regex delimiter and case Insensitive modifier.
To be more generic, you could replace High and Avg by [a-z]+
This is a pattern you can trust (Pattern Demo):
/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m
The other answers overlooked the digital pattern in the weight range substring. The range start integer always ends in 0, and the range end integer always ends in 9; the range always spans ten integers.
My pattern will capture the digits that precede the 0 in the starting integer and reference them immediately after the dash, then require that captured number to be followed by a 9.
I want to point out that your sample input was a little bit tricky because your ‐ is not the standard - that is between the 0 and = on my keyboard. This was a sneaky little gotcha for me to solve.
Method (Demo):
$text = '50‐59 1High300.00 Avg300.00
90‐99 11High222.00Avg188.73
120‐1293High204.00 Avg169.33';
preg_match_all(
'/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m',
$text,
$matches,
PREG_SET_ORDER
);
var_export(
array_map(
fn($captured) => [
'weight range' => $captured[1],
'count' => $captured[3],
'Highprice' => $captured[4],
'Avgprice' => $captured[5]
],
$matches
)
);
Output:
array (
0 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
1 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
2 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
3 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
)
I need a regex that match if the array contain certain it could anywhere for example, this array :
Array
(
[1] => Array
(
[0] => http://www.test1.com
[1] => 4
[2] => 4
)
[2] => Array
(
[0] => http://www.test2.fr/blabla.html
[1] => 2
[2] => 2
)
[3] => Array
(
[0] => http://www.stuff.com/admin/index.php
[1] => 2
[2] => 2
)
[4] => Array
(
[0] => http://www.test3.com/blabla/bla.html
[1] => 2
[2] => 2
)
[5] => Array
(
[0] => http://www.stuff.com/bla.html
[1] => 2
[2] => 2
)
I want to return all but the array that have the word stuff in it, and when i try to test with this it doesn't quite work :
return !preg_match('/(stuff)$/i', $element[0]);
any solution for that ?
Thanks
You don't need a regular expression for performing a simple search. Use array_filter() in conjunction with strpos():
$result = array_filter($array, function ($elem) {
return (strpos($elem[0], 'stuff') !== FALSE);
});
Now, to answer your question, your current regex pattern will only match strings that contain stuff at the end of the line. You don't want that, so get rid of the "end of the line" anchor $ from your regex.
The updated regex should look like below:
return !preg_match('/stuff/i', $element[0]);
If the actual use-case is different from what is shown in your question and if the operation involves more than just a simple pattern matching, then preg_match() is the right tool. As shown above, this can be used with array_filter() to create a new array that satisifes your requirements.
Here's how you'd do it with a callback function:
$result = array_filter($array, function ($elem) {
return preg_match('/stuff/i', $elem[0]);
});
Note: The actual regex might be more complex - I've used /stuff/ as an example. Also, note that I've removed the negation !... from the statement.
Your pattern will only match a string where stuff appears at the end of the string or line. To fix this, just get rid of the end anchor ($):
return !preg_match('/stuff/i', $element[0]);
I'm processing a single string which contains many pairs of data. Each pair is separated by a ; sign. Each pair contains a number and a string, separated by an = sign.
I thought it would be easy to process, but i've found that the string half of the pair can contain the = and ; sign, making simple splitting unreliable.
Here is an example of a problematic string:
123=one; two;45=three=four;6=five;
For this to be processed correctly I need to split it up into an array that looks like this:
'123', 'one; two'
'45', 'three=four'
'6', 'five'
I'm at a bit of dead end so any help is appreciated.
UPDATE:
Thanks to everyone for the help, this is where I am so far:
$input = '123=east; 456=west';
// split matches into array
preg_match_all('~(\d+)=(.*?);(?=\s*(?:\d|$))~', $input, $matches);
$newArray = array();
// extract the relevant data
for ($i = 0; $i < count($matches[2]); $i++) {
$type = $matches[2][$i];
$price = $matches[1][$i];
// add each key-value pair to the new array
$newArray[$i] = array(
'type' => "$type",
'price' => "$price"
);
}
Which outputs
Array
(
[0] => Array
(
[type] => east
[price] => 123
)
)
The second item is missing as it doesn't have a semicolon on the end, i'm not sure how to fix that.
I've now realised that the numeric part of the pair sometimes contains a decimal point, and that the last string pair does not have a semicolon after it. Any hints would be appreciated as i'm not having much luck.
Here is the updated string taking into account the things I missed in my initial question (sorry):
12.30=one; two;45=three=four;600.00=five
You need a look-ahead assertion for this; the look-ahead matches if a ; is followed by a digit or the end of your string:
$s = '12.30=one; two;45=three=four;600.00=five';
preg_match_all('/(\d+(?:.\d+)?)=(.+?)(?=(;\d|$))/', $s, $matches);
print_r(array_combine($matches[1], $matches[2]));
Output:
Array
(
[12.30] => one; two
[45] => three=four
[600.00] => five
)
I think this is the regex you want:
\s*(\d+)\s*=(.*?);(?=\s*(?:\d|$))
The trick is to consider only the semicolon that's followed by a digit as the end of a match. That's what the lookahead at the end is for.
You can see a detailed visualization on www.debuggex.com.
You can use following preg_match_all code to capture that:
$str = '123=one; two;45=three=four;6=five;';
if (preg_match_all('~(\d+)=(.+?);(?=\d|$)~', $str, $arr))
print_r($arr);
Live Demo: http://ideone.com/MG3BaO
$str = '123=one; two;45=three=four;6=five;';
preg_match_all('/(\d+)=([a-zA-z ;=]+)/', $str,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
o/p:
Array
(
[0] => Array
(
[0] => 123=one; two;
[1] => 45=three=four;
[2] => 6=five;
)
[1] => Array
(
[0] => 123
[1] => 45
[2] => 6
)
[2] => Array
(
[0] => one; two;
[1] => three=four;
[2] => five;
)
)
then y can combine
echo '<pre>';
print_r(array_combine($matches[1],$matches[2]));
echo '</pre>';
o/p:
Array
(
[123] => one; two;
[45] => three=four;
[6] => five;
)
Try this but this code is written in c#, you can change it into php
string[] res = Regex.Split("123=one; two;45=three=four;6=five;", #";(?=\d)");
--SJ
I have a string like that :
0d(Hi)i(Hello)4d(who)i(where)540d(begin)i(began)
And i want to make it an array with that.
I try first to add separator, in order to use the php function explode.
;0,d(Hi),i(Hello);4,d(who),i(where);540,d(begin),i(began)
It works but the problem is I want to minimize the separator to save disk space.
Therefore i want to know by using preg_split, regular expression, if it's possible to have a huge array like that without using separator :
Array ( [0] => Array ( [0] => 0 [1] => d(hi) [2] => i(Hello) )
[1] => Array ( [0] => 4 [1] => d(who) [2] => i(where) )
[2] => Array ( [0] => 540 [1] => d(begin) [2] => i(began) )
)
I try some code & regex, but I saw that the value in the regular expression was not present in the final result (like explode function, in the final array we do not have the delimitor.)
More over, i have some difficulties to build the regex. Here is the one that I made :
$modif = preg_split("/[0-9]+(d(.+))?(i(.+))?/", $data);
I must precise that d() and i() can not be present (but at least one)
Thanks
If you do
preg_match_all('/(\d+)(d\([^()]*\))?(i\([^()]*\))?/', $subject, $result, PREG_SET_ORDER);
on your original string, then you'll get an array where
$result[$i][0]
contains the ith match (i. e. $result[0][0] would be 0d(Hi)i(Hello)) and where
$result[$i][$c]
contains the cth capturing group of the ith match (i. e. $result[0][1] is 0, $result[0][2] is d(Hi) and $result[0][2] is i(Hello)).
Is that what you wanted?