Regex to split lines from a file

Regex to split lines from a file - php

I can have a line like this
BK0001 PHP and MySQL Web Development (4th Edition) $49.95 (Clearance Price!)
or
BK0013 Wanna be a Master at PHP? (Free!)
I want to split like [BK0001] ][PHP and MySQL Web Development (4th Edition)] [$49.95] [(Clearance Price!)]
or
[BK0013] [Wanna be a Master at PHP?] [(Free!)]
The regex I'm using is:
$re = '~^(?<id>\S+)\s+(?<title>.*?)\s+(?<price>\$\d[\d.]*)\s*(?<clearance>.*)$~';
However when used in my code,
foreach ($bookFile as $book) {
$re = '~^(?<id>\S+)\s+(?<title>.*?)\s+(?<price>\$\d[\d.]*)\s*(?<clearance>.*)$~';
if (preg_match($re, $book, $parts)) {
$b_price = substr($parts['price'], 1);
$bookObj = new Book($parts['id'], $parts['title'], number_format($b_price, 2), $parts['clearance']);
array_push($bookList, $bookObj);
}
}
I'm unable to get any free books. How can I also get free books using regex?

Since it helped you, I will post the expression:
'~^(?<id>\S+)\s+(?<title>.*?)(?:\s+(?<price>\$\d[\d.]*))?\s*(?<clearance>\([^()]*\))$~'
See the regex demo
Details
^ - start of string
(?<id>\S+) - Group "id": 1+ non-whitespace chars
\s+ - 1+ whitespace chars
(?<title>.*?) - Group "title": any 0+ chars, as few as possible
(?:\s+(?<price>\$\d[\d.]*))? - an optional group:
\s+ - 1+ whitespace chars
(?<price>\$\d[\d.]*) - Group "price": $, a digit and then 0+ digits or .
\s* - 0+ whitespaces
(?<clearance>\([^()]*\)) - Group "clearance": (, 0+ chars other than ( and ) and then )
$ - end of string.

Related

Regex only grabbing first digit

I'm trying to grab everything after the following digits, so I end up with just the store name in this string:
full string: /stores/1077029-gacha-pins
what I want to ignore: /stores/1077029-
what I need to grab: gacha-pins
Those digits can change at any time so it's not specifically that ID, but any numbers after /stores/
My attempt so far is only grabbing /stores/1
\/stores\/[0-9]
I'm still trying, just thought I would see if I can get some help in the meantime too, will post an answer if I solve.

You may use
'~/stores/\d+-\K[^/]+$~'
Or a more specific one:
'~/stores/\d+-\K\w+(?:-\w+)*$~'
See the regex demo and this regex demo.
Details
/stores/ - a literal string
\d+ - 1+ digits
- - a hyphen
\K - match reset operator
[^/]+ - any 1+ chars other than /
\w+(?:-\w+)* - 1+ word chars and then 0+ sequences of - and 1+ word chars
$ - end of string.
See the PHP demo:
$s = "/stores/1077029-gacha-pins";
$rx = '~/stores/\d+-\K[^/]+$~';
if (preg_match($rx, $s, $matches)) {
echo "Result: " . $matches[0];
}
// => Result: gacha-pins

You should do it like this:
$string = '/stores/1077029-gacha-pins';
preg_match('#/stores/[0-9-]+(.*)#', $string, $matches);
$part = $matches[1];
print_r($part);

How to do preg_replace that only matches particular conditions?

I am struggling to write a preg_replace command that achieves what I need.
Essentially I have the following array (all the items follow one of these four patterns):
$array = array('Dogs/Cats', 'Dogs/Cats/Mice', 'ANIMALS/SPECIES Dogs/Cats/Mice', '(Animals/Species) Dogs/Cats/Mice' );
I need to be able to get the following result:
Dogs/Cats = Dogs or Cats
Dogs/Cats/Mice = Dogs or Cats or Mice
ANIMALS/SPECIES Dogs/Cats/Mice = ANIMALS/SPECIES Dogs or Cats or Mice
(Animals/Species) Dogs/Cats/Mice = (Animals/Species) Dogs or Cats or Mice
So basically replace slashes in anything that isn't capital letters or brackets.
I am starting to grasp it but still need some guidance:
preg_replace('/(\(.*\)|[A-Z]\W[A-Z])[\W\s\/]/', '$1 or', $array);
As you can see this recognises the first patterns but I don't know where to go from there
Thanks!

You might use the \G anchors to assert the position at the previous match and use \K to forget what was matched to match only a /.
You could optionally match ANIMALS/SPECIES or (Animals/Species) at the start.
(?:^(?:\(\w+/\w+\)\h+|[A-Z]+/[A-Z]+\h+)?|\G(?!^))\w+\K/
Explanation
(?: Non capturing group
^ Assert start of string
(?: Non capturing group, match either
\(\w+/\w+\)\h+ Match between (....) 1+ word chars with a / between ending with 1+ horizontal whitespace chars
| Or
[A-Z]+/[A-Z]+\h+ Match 1+ times [A-Z], / and again 1+ times [A-Z]
)? Close non capturing group and make it optional
| Or
\G(?!^) Assert position at the previous match
)\w+ Close non capturing group and match 1+ times a word char
\K/ Forget what was matched, and match a /
Regex demo | Php demo
In the replacement use a space, or and a space
For example
$array = array('Dogs/Cats', 'Dogs/Cats/Mice', 'ANIMALS/SPECIES Dogs/Cats/Mice', '(Animals/Species) Dogs/Cats/Mice');
$re = '~(?:^(?:\(\w+/\w+\)\h+|[A-Z]+/[A-Z]+\h+)?|\G(?!^))\w+\K/~';
$array = preg_replace($re, " or ", $array);
print_r($array);
Result:
Array
(
[0] => Dogs or Cats
[1] => Dogs or Cats or Mice
[2] => ANIMALS/SPECIES Dogs or Cats or Mice
[3] => (Animals/Species) Dogs or Cats or Mice
)

The way you present your problem with your example strings, doing:
$result = preg_replace('~(?:\S+ )?[^/]*+\K.~', ' or ', $array);
looks enough. In other words, you only have to check if there's a space somewhere to consume the beginning of the string until it and to discard it from the match result using \K.
But to avoid future disappointments, it is sometimes useful to put yourself in the shoes of the Devil to consider more complex cases and ask embarrassing questions:
What if a category, a subcategory or an item contains a space?
~
(?:^
(?:
\( [^)]* \)
|
\p{Lu}+ (?> [ ] \p{Lu}+ \b )*
(?> / \p{Lu}+ (?> [ ] \p{Lu}+ \b )* )*
)
[ ]
)?
[^/]*+ \K .
~xu
demo
In the same way, to deal with hyphens, single quotes or whatever, you can replace [ ] with [^\pL/] (a class that excludes letters and the slash) or something more specific.

php string replace after character #

We load a dynamic producttitle using the following code <?php echo $producttitle; ?>
Input could be:
HP ProBook 450 G5 15.6 inch i5-8250U - 4LT51EA#ABB - Black
This line is dynamic, so the code after the # can change and also be #ACC.
Should become:
HP ProBook 450 G5 15.6" i5 - 4LT51EA - Black
These product-titles can contain a value something like #.
It is always displayed as # without any spaces.
We want to remove the value # until the next space.
And these product-titles can also contain the value 15.6 inch, which we want to replace with 15.6". So the text inch should be replaced for the sign ".
We also want to change the value i5-8250U into i5. But i5 can also be i3 or i7. So it should replace everything from - until the next space.
How can we include both replacements inside this code?
I currently have the following:
<?php $trans = array(' inch' => '"'); ?>
<h1><?php echo strtr($producttitle, $trans); ?></h1>
But now I need to include the # part, how can we achieve that?

What you might so is use preg_replace and create an array with 2 regexes and an array with 2 replacements.
In the replacement you could refer to the first capturing group with $1 to keep that as the replacement.
First part
\s+inch\s+(i[357])-\w+
\s+inch\s+ Match 1+ whitespace characters, inch and then 1+ whitespace characters (To not also match newlines, you could use \h+ instead of \s to match 1+ horizontal whitespace characters)
(i[357]) Capturing group to match i followed by 3, 5, or 7
- Match literally
\w+ Match 1+ times a word character
Replace with
" $1
Second part
(\w+)#\w+
(\w+) Capturing group which matches 1+ word characters
#\w+ Match # followed by 1+ word characters
Replace with
$1
For example:
$string = 'HP ProBook 450 G5 15.6 inch i5-8250U - 4LT51EA#ABB - Black';
$find = array('/\s+inch\s+(i[357])-\w+/', '/(\w+)#\w+/');
$replace = array('" $1', '$1');
$result = preg_replace($find, $replace, $string);
echo $result; // HP ProBook 450 G5 15.6" i5 - 4LT51EA - Black
Demo

Split address street name house number and room number

I need split address: Main Str. 202-52 into
street=Main Str.
house No.=202
room No.=52
I tried to use this:
$data['address'] = "Main Str. 202-52";
$data['street'] = explode(" ", $data['address']);
$data['building'] = explode("-", $data['street'][0]);
It is working when street name one word. How split address where street name have several words.
I tried $data['street'] = preg_split('/[0-9]/', $data['address']);But getting only street name...

You may use a regular expression like
/^(.*)\s(\d+)\W+(\d+)$/
if you need all up to the last whitespace into group 1, the next digits into Group 2 and the last digits into Group 3. \W+ matches 1+ chars other than word chars, so it matches - and more. If you have a - there, just use the hyphen instead of \W+.
See the regex demo and a PHP demo:
$s = "Main Str. 202-52";
if (preg_match('~^(.*)\s(\d+)\W+(\d+)$~', $s, $m)) {
echo $m[1] . "\n"; // Main Str.
echo $m[2] . "\n"; // 202
echo $m[3]; // 52
}
Pattern details:
^ - start of string
(.*) - Group 1 capturing any 0+ chars other than line break chars as many as possible up to the last....
\s - whitespace, followed with...
(\d+) - Group 2: one or more digits
\W+ - 1+ non-word chars
(\d+) - Group 3: one or more digits
$ - end of string.
Also, note that in case the last part can be optional, wrap the \W+(\d+) with an optional capturing group (i.e. (?:...)?, (?:\W+(\d+))?).

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?

To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to split lines from a file - php

Related

Regex only grabbing first digit

How to do preg_replace that only matches particular conditions?

php string replace after character #

Split address street name house number and room number

split string in numbers and text but accept text with a single digit inside

Categories

Resources