Regex skipping value

Regex skipping value - php

Greetings All
I am trying to get the values in the 4th column from the left for this url. I can get all the values but it skips the first one (e.g. 30 i think is the value on top right now )
My regex is
~<td align="center" class="row2">.*([\d,]+).*</td>~isU
NOTE: HTML PARSING IS NOT AN OPTION RIGHT NOW AS THIS IS PART OF A HUGE SYSTEM AND CANNOT
BE CHANGED
Thanking you
Imran

You could just use:
/([\d,]+)/
As the javascript function can be exploited as a "regex selection point"
If you want your regex to work you need to use non-greedy expression, i.e. change .* to .*?
Also your first align match attribute in the HTML is surrounded in '' quotation marks, not "" in the HTML, for some weird inconsistent reason. Try this:
|<td align=["\']center["\'] class="row2">.*?([\d,]+).*?</td>|is
Edit:
$a = file_get_contents('http://www.zajilnet.com/forum/index.php?showforum=31');
preg_match_all('|<td align=["\']center["\'] class="row2">.*?([\d,]+).*?</td>|is',$a,$m);
print_r($m[1]);
Result:
Array
(
[0] => 30
[1] => 16
[2] => 56
[3] => 14
[4] => 96
[5] => 4
[6] => 0
[7] => 17
[.... and more....]

Related

PHP Regex: How to get optional text if present?

Let's take an example of following string:
$string = "length:max(260):min(20)";
In the above string, :max(260):min(20) is optional. I want to get it if it is present otherwise only length should be returned.
I have following regex but it doesn't work:
/(.*?)(?::(.*?))?/se
It doesn't return anything in the array when I use preg_match function.
Remember, there can be something else than above string. Maybe like this:
$string = "number:disallow(negative)";
Is there any problem in my regex or PHP won't return anything? Dumping preg_match returns int 1 which means the string matches the regex.
Fully Dumped:
int 1
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)

You're using single character (.) matching in the case of being lazy, at the very beginning. So it stops at the zero position. If you change your preg_match function to preg_match_all you'll see the captured groups.
Another problem is with your Regular Expression. You're killing the engine. Also e modifier is deprecated many many decades before!!! and yet it was used in preg_replace function only.
Don't use s modifier too! That's not needed.
This works at your case:
/([^:]+)(:.*)?/
Online demo

I tried to prepare a regex which can probably solve your issue and also add some value to it
this regex will not only match the optional elements but will also capture in key value pair
Regex
/(?<=:|)(?'prop'\w+)(?:\((?'val'.+?)\))?/g
Test string
length:max(260):min(20)
length
number:disallow(negative)
Result
MATCH 1
prop [0-6] length
MATCH 2
prop [7-10] max
val [11-14] 260
MATCH 3
prop [16-19] min
val [20-22] 20
MATCH 4
prop [24-30] length
MATCH 5
prop [31-37] number
MATCH 6
prop [38-46] disallow
val [47-55] negative
try demo here
EDIT
I think I understand what you meant by duplicate array with different key, it was due to named captures eg. prop & val
here is the revision without named capturing
Regex
/(?<=:|)(\w+)(?:\((.+?)\))?/
Sample code
$str = "length:max(260):min(20)";
$str .= "\nlength";
$str .= "\nnumber:disallow(negative)";
preg_match_all("/(?<=:|)(\w+)(?:\((.+?)\))?/",
$str,
$matches);
print_r($matches);
Result
Array
(
[0] => Array
(
[0] => length
[1] => max(260)
[2] => min(20)
[3] => length
[4] => number
[5] => disallow(negative)
)
[1] => Array
(
[0] => length
[1] => max
[2] => min
[3] => length
[4] => number
[5] => disallow
)
[2] => Array
(
[0] =>
[1] => 260
[2] => 20
[3] =>
[4] =>
[5] => negative
)
)
try demo here

Finding (regex?) 10 digits in a row (PHP)

I am facing a problem i am not capable to solve. I have a string consisting of not needed text and 10 digit numbers who always start with "2" or "6". I need to get those in 10digit numbers into an array. I thought of regex and found this article Regular Expression for matching a numeric sequence? which is pretty close to what i need (except the descending/ascending thing) yet, as i could never and will NEVER be able to understand regex, i cant modify to my needs. If anyone could help me out here i would highly appreciate it!
Here is a sample of my string:
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21
EAArivtg .....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN."
Thank you very much in advance!
Greetings from Greece!

As I see your string your digits have an space between, and if you want strictly make your selections this is the regex:
[62]\d{2}\s*\d{7}
Explanation:
[62] # Start with 6 or 2
\d{2} # 2 more digits
\s* # any number of white spaces
\d{7} # 7 more digits
Live demo
and PHP code which has preg_match_all to match all occurrences of those strings:
preg_match_all("/[62]\d{2}\s*\d{7}/", $text, $matches);
Output:
Array
(
[0] => 693 7098469
[1] => 210 5014166
[2] => 210 9618677
[3] => 210 9643623
[4] => 210 9643887
[5] => 210 9914534
[6] => 697 7440896
)
PHP live demo

Maybe like this:
<?php
$x=
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21 EAArivtg ....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN.";
$x=str_replace(' ','',$x);
preg_match_all('/((2|6)\d{9})/',$x,$matches);
print_r($matches[0]);
And the result:
Array
(
[0] => 6937098469
[1] => 2105014166
[2] => 2109618677
[3] => 2109643623
[4] => 2109643887
[5] => 2109914534
[6] => 6977440896
)

there is a pretty cool page, that visualize the regex code for better understading:
https://www.debuggex.com/
this should work
((?:2|6)[0-9]{2} [0-9]{7})

Separating a few things with preg_split

For the life of me, I can't figure out how to write the regex to split this.
Lets say we have the sample text:
15HGH(Whatever)ASD
I would like to break it down into the following groups (numbers, letters by themselves, and parenthesis contents)
15
H
G
H
Whatever
A
S
D
It can have any combination of the above such as:
15HGH
12ABCD
ABCD(Whatever)(test)
So far, I have gotten it to break apart either the numbers/letters or just the parenthesis part broken away. For example, in this case:
<?php print_r(preg_split( "/(\(|\))/", "5(Test)(testing)")); ?>
It will give me
Array
(
[0] => 5
[1] => Test
[2] => testing
)
I am not really sure what to put in the regex to match on only numbers and individual characters when combined. Any suggestions?

I don't know if preg_match_all satisfying you:
$text = '15HGH(Whatever)ASD';
preg_match_all("/([a-z]+)(?=\))|[0-9]+|([a-z])/i", $text, $out);
echo '<pre>';
print_r($out[0]);
Array
(
[0] => 15
[1] => H
[2] => G
[3] => H
[4] => Whatever
[5] => A
[6] => S
[7] => D
)

I've got this: Example (I don't know how is written the \n) but the substitution is working.
(\d+|\w|\([^)]++\)) Not too much to explain, first tries to get a number, then a char, and if there's nothing there, tries to get a whole word between parentheses. (They can't be nested)

Check this out using preg_match_all():
$string = '15HGH(Whatever)(Whatever)ASD';
preg_match_all('/\(([^\)]+)\)|(\d+)|([a-z])/i', $string, $matches);
$results = array_merge(array_filter($matches[1]),array_filter($matches[2]),array_filter($matches[3]));
print_r($results);
\(([^\)]+)\) --> Matches everything between parenthesis
\d+ --> Numbers only
[a-z] --> Single letters only
i --> Case insensitive

$ not matching position immediately before a newline that is the last character

$ is not matching a position immediately before a newline that is the last character.
Ideally /1...$/ should match but match happens with the pattern /1....$/ which seems to be wrong.
What could be the reason?
PHP doc also says A dollar character ($) is an assertion which is TRUE only if the current matching point is at the end of the subject string, or immediately before a newline character that is the last character in the string (by default).
$subject = 'abc#
123#
';
$pattern = '/1...$/';
preg_match_all($pattern,$subject,$matches); // no match
Update:
I suspect extra dot due to \r\n format of newline.
I did following experiment and see some hint.
$pattern = '/1...(.)$/';
echo bin2hex($matches[1]); // 28
28 seems to be equal to \r (CR) so basically $ is matching before \n not before \r\n, that may be the reason of my problem.
Image after non printable character turn on

Issue was due to different newline representation of window file and linux file
Why this issue:
I created php file in window and transferred to linux where PHP was installed.
Windows uses \r\n to represent newline and linux \n ==> that's why initially it was taking extra dot to match.
Below experiment confirmed the same:
$subject = 'abc#
123#
';
$pattern = '/1...(.)$/';
preg_match_all($pattern,$subject,$matches);
echo bin2hex($matches[1]); // 28
// 28 is equivalent of \r or CR(carriage return)
Created new file in linux system and /1...$/ catches the match :)
I hope this will save someone's time if stuck with same problem.

Your string is multi-line. By default regex won't do multi-line. You have to add the m modifier for this to happen.
For example:
/1...$/m

I have been stuck on this issue for two days. I did a lot of testing to find any logic that lies behind this because it all depends on where your data comes from (internal and controlled vs. external and uncontrolled). In my case it was input field (<textarea>) on my website available from various browsers (and various OS-es) and there were no such problems in JavaScript with pattern testing/matching/checking. Here is a hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
<?php
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
// C 3 p 0 _
$pat1='/\w$/mi'; // This works excellent in JavaScript (Firefox 7.0.1+)
$pat2='/\w\r?$/mi'; // Slightly better
$pat3='/\w\R?$/mi'; // Somehow disappointing according to php.net and pcre.org when used improperly
$pat4='/\w(?=\R)/i'; // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected
$pat5='/\w\v?$/mi';
$pat6='/(*ANYCRLF)\w$/mi'; // Excellent but undocumented on php.net at the moment (described on pcre.org and en.wikipedia.org)
$n=preg_match_all($pat1, $str, $m1);
$o=preg_match_all($pat2, $str, $m2);
$p=preg_match_all($pat3, $str, $m3);
$r=preg_match_all($pat4, $str, $m4);
$s=preg_match_all($pat5, $str, $m5);
$t=preg_match_all($pat6, $str, $m6);
echo $str."\n1 !!! $pat1 ($n): ".print_r($m1[0], true)
."\n2 !!! $pat2 ($o): ".print_r($m2[0], true)
."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)
."\n4 !!! $pat4 ($r): ".print_r($m4[0], true)
."\n5 !!! $pat5 ($s): ".print_r($m5[0], true)
."\n6 !!! $pat6 ($t): ".print_r($m6[0], true);
// Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 and $pat4 (\R), $pat5 (\v) and altered newline option in $pat6 ((*ANYCRLF)) - for some applications at least.
/* The code above results in the following output:
ABC ABC
123 123
def def
nop nop
890 890
QRS QRS
~-_ ~-_
1 !!! /\w$/mi (3): Array
(
[0] => C
[1] => 0
[2] => _
)
2 !!! /\w\r?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
3 !!! /\w\R?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
4 !!! /\w(?=\R)/i (6): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
)
5 !!! /\w\v?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
6 !!! /(*ANYCRLF)\w$/mi (7): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
[6] => _
)
*/
?>
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

regex to find year/month substring

Can someone help me with a regular expression to get the year and month from a text string?
Here is an example text string:
http://www.domain.com/files/images/2012/02/filename.jpg
I'd like the regex to return 2012/02.

This regex pattern would match what you need:
(?<=\/)\d{4}\/\d{2}(?=\/)

Depending on your situation and how much your strings vary - you might be able to dodge a bullet by simply using PHP's handy explode() function.
A simple demonstration - Dim the lights please...
$str = 'http://www.domain.com/files/images/2012/02/filename.jpg';
print_r( explode("/",$str) );
Returns :
Array
(
[0] => http:
[1] =>
[2] => www.domain.com
[3] => files
[4] => images
[5] => 2012 // Jack
[6] => 02 // Pot!
[7] => filename.jpg
)
The explode() function (docs here), splits a string according to a "delimiter" that you provide it. In this example I have use the / (slash) character.
So you see - you can just grab the values at 5th and 6th index to get the date values.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex skipping value - php

Related

PHP Regex: How to get optional text if present?

Finding (regex?) 10 digits in a row (PHP)

Separating a few things with preg_split

$ not matching position immediately before a newline that is the last character

regex to find year/month substring

Categories

Resources