Regex matches new lines contains text after line contains only specific characters - php

I have a text for a book pages that may have footnotes at the end of the string like the following example:
والخاتِم بكسر التاء اسم فاعل، فكأنه قد جاء آخر الرسل، والخاتَم بفتح التاء اسم آلة، كأنه قد ختمت به الرسالة.
__________
(1) - سورة الأحزاب آية : 43.
(2) - سورة البقرة آية : 157.
(3) - سورة الأنعام آية : 17.
(4) - سورة الكهف آية : 19.
The line that I mean in the sample and the specific characters in this case are Kashidas _ (It is not dash -), in Latin, it called underscore. What I need to get is matching the four lines or any number of lines under that line.
What I have tried let only to match the first line under that line:/_.*\n*(.*)/gum and this is a demo. The only way to get them all, is to repeat the pattern portion \n*(.*) n times equals to the number of lines in the footnotes i.e four times, regarding the example case, and this is not a practical solution like this demo

You can utilize the \G anchor here:
preg_match_all('~(?:\G(?!^)|_)\R+\K[^\n]+~', $str, $matches);
print_r($matches[0]);
eval.in

Basically its not that easy to catch lines, and then every match. But what can you do is to catch everything after line, and then match again every line.
You can do that making:
/_{4,}.+/gums
/(\(.*?\.)*/gums
I hope that is good enough for you.

I just tested this successfully:
$text = "_________\r\n\r\nLine 1\r\nLine 2\r\nLine 3\r\n";
$matches = array();
$pattern = '/_+\r\n\r\n(.+)/s'; // s to have . match newlines.
// Change \r\n to \n if appropriate
// Extract all footnotes
preg_match($pattern, $text, $matches);
$footnotes = $matches[1]; // $matches[0] is the whole matched string,
// $matches[1] is the part within ()
$matches = array();
$pattern = '/(.+)/'; // Don't match newlines here
// Extract individual footnotes
preg_match_all($pattern, $footnotes, $matches);
foreach ($matches[0] as $match) { // preg_match_all returns multi-dimensional array
// Do something with each footnote
}

Related

Change 10.28 by 1028

I have a problem with a string to convert in number. I am not good with this elements !\d+!
I used that but the apporach is not correct.
Thank you.
preg_match_all('!\d+!', $product_price[$i], $matches);
$price_extracted = (float)implode('.', $matches[0]);
$item['normal_price'] = $price_extracted;
if ($item['normal_price'] > 800) ......
I have this result
1 299,99 $ (orginal) is converted in 1.2999 and must be 1299.99
549,99 $ (orginal) is converted in 549.99 and must be 549.99
44,99 $ (orginal) is converted in 44.99 and must be 44.99
The problem with your approach is, that you put the digits that are not separated by anything into an array.
This means that with the first string that you provided, where the thousand dollars is seperated by a whitespace is being registered as one of these matches.
preg_match_all('!\d+!', '1 299,99 $', $matches) -> returns an array as follows:
$matches[0] = 1
$matches[1] = 299
$matches[2] = 99
If you take my approach though and first replace all whitespaces by nothing and then split the numbers into the array...:
preg_match_all('!\d+!', preg_replace('/\s/', '', '1 299,99 $'), $matches) -> returns following array:
$matches[0] = 1299
$matches[1] = 99
after that you can still implode them:
$price_exctracted = (float)implode(".", $matches);
EDIT
A little explanation about preg_replace, preg_match_all and regex:
The regex '!\d+!' (I don't actually know why there would be '!' instead of '/' but if it works...) searches for digits (\d). The "+" refers to "one or more". So the line
preg_match_all('!\d+!', 'someString', $myArray)
could be translated into english as follows:
Find all occurances of digits, be it one or more,
and put these occurances separated into one index of $myArray.
The second regex used in my solution, '/\s/' , is used to search for whitespaces. The "preg_replace"-function is an easy "find and replace" function concluding in:
preg_replace('/\s/', '', 'someString')
translated to english:
Find all occurances of whitespaces and replace them with nothing in 'someString'
For reference:
preg_match_all
preg_replace
regex cheat sheet
Conditions can be checked on:
PHP Live Regex

Find next word after colon in regex

I am getting a result as a return of a laravel console command like
Some text as: 'Nerad'
Now i tried
$regex = '/(?<=\bSome text as:\s)(?:[\w-]+)/is';
preg_match_all( $regex, $d, $matches );
but its returning empty.
my guess is something is wrong with single quotes, for this i need to change the regex..
Any guess?
Note that you get no match because the ' before Nerad is not matched, nor checked with the lookbehind.
If you need to check the context, but avoid including it into the match, in PHP regex, it can be done with a \K match reset operator:
$regex = '/\bSome text as:\s*'\K[\w-]+/i';
See the regex demo
The output array structure will be cleaner than when using a capturing group and you may check for unknown width context (lookbehind patterns are fixed width in PHP PCRE regex):
$re = '/\bSome text as:\s*\'\K[\w-]+/i';
$str = "Some text as: 'Nerad'";
if (preg_match($re, $str, $match)) {
echo $match[0];
} // => Nerad
See the PHP demo
Just come from the back and capture the word in a group. The Group 1, will have the required string.
/:\s*'(\w+)'$/

How to strip and get the code between two lines in PHP?

I'm trying to assign it to a variable in PHP, but all the Regex and preg_replace I've tried doesn't help me. Here is a sample text.
Claim Code:
7241B-2HWRXR9-2P2BA
$1.00
I want to pull out exactly what is in the middle, which is 7241B-2HWRXR9-2P2BA.
You can use the following to match:
Claim Code:\s*([\w-]+)\s*\$(\d+(?:\.\d+)?)
And you can pull out whatt you want by $1
See DEMO
If the code has always the same format (5-7-5 chars), you can use:
$str = 'Claim Code:
7241B-2HWRXR9-2P2BA
$1.00';
preg_match('~[\w]{5}\-[\w]{7}\-[\w]{5}~', $str, $matches);
echo $matches[0]; // returns 7241B-2HWRXR9-2P2BA
UPDATE
For optional code length this regex is possible:
preg_match('~.*:\s+([\w\-]{10,25})\s+.*~', $str, $matches);
echo $matches[1]; // ^^ here set the min and max code length, or remove it
// without setting min/max code length:
preg_match('~.*:\s+([\w\-]+)\s+.*~', $str, $matches);
Less elegant than a regexp / preg_replace, but this should work, too: Put the string into an array, then only use line 2 (array element number 1).
<?php
$string = 'Claim Code: ...............';
$lines = explode("\n", $string); //Transform the string into an array, separated by new lines (\n) (each index in the array is a single line from the string
echo $lines[1]; //this is 2nd line of the string, i.e. the claim code

Make two simple regex's into one

I am trying to make a regex that will look behind .txt and then behind the "-" and get the first digit .... in the example, it would be a 1.
$record_pattern = '/.txt.+/';
preg_match($record_pattern, $decklist, $record);
print_r($record);
.txt?n=chihoi%20%283-1%29
I want to write this as one expression but can only seem to do it as two. This is the first time working with regex's.
You can use this:
$record_pattern = '/\.txt.+-(\d)/';
Now, the first group contains what you want.
Your regex would be,
\.txt[^-]*-\K\d
You don't need for any groups. It just matches from the .txt and upto the literal -. Because of \K in our regex, it discards the previously matched characters. In our case it discards .txt?n=chihoi%20%283- string. Then it starts matching again the first digit which was just after to -
DEMO
Your PHP code would be,
<?php
$mystring = ".txt?n=chihoi%20%283-1%29";
$regex = '~\.txt[^-]*-\K\d~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 1

regex function[filename] pattern and function[string_with_escaped_characters] pattern

I'm trying to script and parse a file,
Please help with regex in php to find and replace the following patterns:
From: "This is a foo[/www/bar.txt] within a foo[/etc/bar.txt]"
To: "This is a bar_txt_content within a bar2_txt_content"
Something along those lines:
$subject = "This is a foo[/www/bar.txt] within a foo[/etc/bar.txt]";
$pattern = '/regex-needed/';
preg_match($pattern, $subject, $matches);
foreach($matches as $match) {
$subject = str_replace('foo['.$match[0].']', file_get_contents($match[0]), $subject);
}
And my second request is to have:
From: 'This is a foo2[bar bar ] bar bar].'
To: "this is a returned"
Something along those lines:
$subject = 'This is a foo2[bar bar \] bar bar].';
$pattern = '/regex-needed/';
preg_match($pattern, $subject, $matches);
foreach($matches as $match) {
$subject = str_replace('foo2['.$match[0].']', my_function($match[0]), $subject);
}
Please help in constructing these patterns...
If you always have a structure like foo[ ... ]
Then is very easy:
foo\[([^]]+)\]
That is .NET syntax but i'm sure the expressions is simple enough for you to convert.
Description of the regex:
Match the characters “foo” literally «foo»
Match the character “[” literally «[»
Match the regular expression below and capture its match into backreference number 1 «([^]]+)»
Match any character that is NOT a “]” «[^]]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “]” literally «]»
Luc,
this should help you get started.
http://php.net/manual/en/function.preg-replace.php
You may have to setup a loop and increase the counter, using preg_replace with a limit of 1 to replace only the first instance.
In order to match foo[/www/bar.txt]:
the regex should be something like:
foo\[\/www\/([A-Za-z0-9]*)\.txt\]
The backslashes are there to cancel the special meaning of some characters in your regexp.
It will match foo[/www/.[some file name].txt, and ${1} will contain the filename without the .txt as brackets form groups which can be used in the replaced expression. ${1} will contain what was matched in the first round brackets, ${2} will contain what was matched in the second one, etc ...
Therefore your replaced expression should be something like "${1}_txt_content". Or in the second iteration "${1}2_txt_content".
[A-Za-z0-9]* means any alphanumeric character 0 or more times, you may want to replace the * with a + if you want at least 1 character.
So try:
$pattern = foo\[\/www\/([A-Za-z0-9]*)\.txt\];
$replace = "${1}_txt_content";
$total_count = 1;
do {
echo preg_replace($pattern, $replace, $subject, 1, $count);
$replace = "${1}" + ++$total_count + "_txt_content";
} while ($count != 0);
(warning, this is my first ever PHP program, so it may have mistakes as I cannot test it ! but I hope you get the idea)
Hope that helps !
Tony
PS: I am not a PHP programmer but I know this works in C#, for example, and looking at the PHP documentation it seems that it should work.
PS2: I always keep this website bookmarked for reference when I need it: http://www.regular-expressions.info/
$pattern = '/\[([^\]]+)\]/';
preg_match_all($pattern, $subject, $matches);
print_r($matches['1']);
found the correct regex I needed for escaping:
'/foo\[[^\[]*[^\\\]\]/'

Categories