Using preg_replace in PHP - Strange Behaviour

Using preg_replace in PHP - Strange Behaviour - php

I have the following:
$string = '4745518 some text 4510018 some text 4743618 4745518 some text 4510518 some text';
$newstring = preg_replace('/[1-9]{7,7}/','NEWTRANSACTION: $0',$string);
My intent is "replace all occurrences of seven digits with 'NEWTRANSACTION: ' plus those seven digits."
However, my result is:
NEWTRANS: 4745518 some text 4510018 some text NEWTRANS: 4743618 NEWTRANS: 4745518
some text 4510518 some text
In other words, it appears that only some of the seven-digit groups are being replaced. If I edit the original string, shift the seven digit groups around, those same seven digit groups get replaced. It's like only certain combinations of numbers are being marked for replacement. My actual input string is hundreds of lines long, and it really appears that random seven-digit groups are being replaced.
Can anyone see what's wrong? Thanks in advance.
=== EDIT ===
Thanks for all of the help so quickly. I would up using
/\b\d{7}\b/
and it works like a charm. I'm new to regex, so I learned a bit here -- although not realizing the missing '0' was total boneheadedness on my part.
My bad, showing 'NEWTRANSACTION: ' in the code, but showing 'NEWTRAN:' in the output. I was just typing the output, instead of copy/paste, and shortened it accidentally.
Thanks again.

Your code working fie after changing [1-9] to [0-9] (As your digits have 0 also at some places)
<?php
$string = '4745518 some text 4510018 some text 4743618 4745518 some text 4510518 some text';
echo $newstring = preg_replace('/[0-9]{7,7}/','NEWTRANSACTION: $0',$string);
https://eval.in/984686
Note:- A much shorter code given in a comment by #GrumpyCrouton,#kaii and #Barmar
/\b\d{7}\b/
Output:-https://eval.in/984792

Related

Detect phone number with preg_replace with some specifics

It's a basic preg_replace that detects phone numbers (and just long numbers). My problem is I want to avoid detecting numbers between double "", single '' and forward slashes //
$text = preg_replace("/(\+?[\d-\(\)\s]{8,25}[0-9]?\d)/", "<strong>$1</strong>", $text);
I poked around but nothing is working for me. Your help will be appreciated.

I predict that your pattern is going to let you down more than it is going to satisfy you (or you are very comfortable with "over-matching" within the scope of your project).
While my suggestion really blows out the pattern length, a (*SKIP)(*FAIL) technique will serve you well enough by consuming and discarding the substrings that require disqualification. There may be a way of dictating the pattern logic with lookaround instead, but with an initial pattern with so many potential holes in it and no sample data, there are just too many variables to make a confident suggestion.
Regex101 Demo
Code: (Demo)
$text = <<<TEXT
A number 555555555 then some more text and a quoted number "(123)4567890" and
then 1 2 3 4 6 (54) 3 -2 and forward slashed /+--------0/ versus
+--------0 then something more realistic '234 588 9191' no more text.
This is not closed by the same character on both
ends: "+012345678901/ which of course is a _necessary_ check?
TEXT;
echo preg_replace(
'~([\'"/])\+?[\d()\s-]{8,25}\d{1,2}\1(*SKIP)(*FAIL)|((?!\s)\+?[\d()\s-]{8,25}\d{1,2})~',
"<strong>$2</strong>",
$text);
Output:
A number <strong>555555555</strong> then some more text and a quoted number "(123)4567890" and
then <strong>1 2 3 4 6 (54) 3 -2</strong> and forward slashed /+--------0/ versus
<strong>+--------0</strong> then something more realistic '234 588 9191' no more text.
This is not closed by the same character on both
ends: "<strong>+012345678901</strong>/ which of course is a _necessary_ check?
For the technical breakdown, see the Regex101 link.
Otherwise, this is effectively checking for "phone numbers" (by your initial pattern) and if they are wrapped by ', ", or / then the match is ignored and the regex engine continues looking for matches AFTER that substring. I have added (?!\s) at the start of the second usage of your phone pattern so that leading spaces are omitted from the replacement.

It seems that you're not validating, then you might be trying to write some expression with less boundaries, such as:
^\+?[0-9()\s-]{8,25}[0-9]$
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

PHP - While writing a fixed length file, PHP is not processing `\t` as Tab after the first one. Why?

I am writing a fixed length file using PHP. Each character will have a predefined length. If characters are less than the length then it will insert that number of blank spaces. Each field is separated by TAB.
$str1 = str_pad('ten',10)."\t";
$str2 = str_pad('seven',7)."\t";
$str3 = str_pad('fifteen',15)."\t";
$str = $str1.$str2.$str3;
file_put_contents("newfile.txt",$str);
But TAB is not working as expected.
I can see TAB only after first word (ten). After that there are no more TABs in the output file.

Your text is rendered correctly.
A tab does not mean a fixed number of spaces or a printable character. It means "forward the cursor to the next tab stop".
A tab stop in computers is typically set to every 4 or 8 characters. It's a display issue that will behave differently across various systems and user configurations.
If you copy your output to your preferred text editor and manually select it, you can see spaces as dots and tabs as lines, as in the examples below:
Note: your output is the first line ("ten seven fifteen"). I added the second and third lines to illustrate the configured tab length in the text editor.
If we set tab length to 4:
The first string contains 10 chars ("t", "e", "n" plus 7 dots), and the next tab stop is at the 12th char. So it's only 2 chars away, that's why your tab character (grey line) is only 2 chars long. The second string has 7 chars, and the next stop is only 1 char away so the tab will have only 1 char. It's the same logic for the third string.
If we set tab length to 8:
It follows the same logic, but in this example, the first tab is now bigger because the nearest stop is at position 16, so the tab has 6 chars. The second and third tabs have coincidentally only one char.
If we set tab length to an odd number like 3:
This is not common, but it's something possible. In this example, we can see all tabs as multiple spaces. It follows the same logic as explained above, but we can visually see that your code is producing the expected output.
Notice that I did not change your output in any way. I just played with the editor's configuration.
If you want a fixed length spacer between your chars, you should use "\s" instead. You can repeat the character like this: "\s\s\s\s", or using str_repeat("\s", 4); or "\s" * 4 if you were using Python.

It looks like when you are adding the padding to the string it is considering the TAB character as a space. I would suggest you change the code to the following to get the desired effect.
$str1 = str_pad('ten',10);
$str2 = str_pad('seven',7);
$str3 = str_pad('fifteen',15);
$str = $str1."\t".$str2."\t".$str3."\t";
file_put_contents("newfile.txt",$str);
If there are a huge number of strings I would suggest putting them in an array and doing a foreach loop appending the tab character at the end.
By the way there is no tab character in your final string all you are seeing is the padding of spaces after the strings

In Your Case, try this code :
$str1 = str_pad('ten',10)."\t";
$str2 = str_pad('seven',7)."\t\t";
$str3 = str_pad('fifteen',15)."\t\t";
$str = $str1.$str2.$str3;
file_put_contents("newfile.txt",$str);

Regex preg_match issue with commas

This is my code to pre_match when an amount looks like this: $ 99.00 and it works
if (preg_match_all('/[$]\s\d+(\.\d+)?/', $tout, $matches))
{ $tot2 = $matches[0];
$tot2 = preg_replace("/\\\$/", '', $tot2);}
I need to do the same thing for a amount that looks like this (with a comma): $ 99,00
Thank you for your help (changing dot for comma do not help, there is an "escape" thing I do not understand...
Idealy I need to preg_match any number that looks like an amount with dot or commas and with or without dollar sign before or after (I know, it's a lot to ask :) since on the result form I want to scan there are phone and street numbers...
UPDATE (For some reason I cannot comment on replies) : To test properly, I need to preg_replace the comma by a dot (since we are dealings with sums, I don't think calculations can be done on numbers with commas in it).
So to clarify my question, I should say : I need to transform, let's say "$ 200,24" to "200.24". (could be amounts bettween 0.10 to 1000.99) :
$tot2 = preg_replace("/\\\$/", '', $tot2);}
(this code just deals with the $ (it works), I need adaptation to deal also with the change of (,) for (.))

No, using , in place of \. works perfectly fine.
It's just that your input does not contain a space between dollar sign and amount $ 99,00 like your .-using source did.
Make the \s optional.

How about:
$str='$ 200,24';
echo str_replace(array('$',',',' '), array('','.',''), $str);
output:
200.24

replace the . character with a character class [,.] which includes both a dot(.) and comma(,)
'/[$]\s\d+([.,]\d+)?/'
edit: comment is correct, regex fixed.

Select one number between others with spaces

Can I select the number 3433 in this example of generated file with so many spaces that I can not control?
BIOLOGIQUES 3433 130906 / 3842
Please see the example here : http://regexr.com?368ku
The number 3343 could change from one file to an other, but it will have always the same position/
I'm using regex with php.
It's a pdf document that I transform with pdftotext function of xpdf and so I must have that number which change from a pdf to an other.
It's very bad positioned and I don't know how to capture it via regex.
I tried:
BIOLOGIQUES [^0-9]*\K([0-9]*)(.*)
http://regexr.com?368ku
but it takes all the numbers,
I need only the first one.

You are making this far too complicated. Something like this will work:
BIOLOGIQUES\s+(\d+)
Which matches the string "BIOLOGIQUES" literally, then one or more whitespace characters, then captures one or more digits, saving your number in capturing group 1.
Use it in PHP like this:
$str = 'DES ANALYSES BIOLOGIQUES 3433 130906 / 3842';
preg_match( '/BIOLOGIQUES\s+(\d+)/', $str, $matches);
echo $matches[1];
You can see from this demo that this produces:
3433

I tried BIOLOGIQUES[^0-9]*\K([0-9]*)() and worked fine

PHP: preg_replace to match some numbers but not others

So I've been working on a little project to write a syntax highlighter for a game's scripting language. It's all gone off without a hitch, except for one part: the numbers.
Take these lines for example
(5:42) Set database entry {healthpoints2} to the value 100.
(5:140) Move the user to position (29,40) on the map.
I want to highlight that 100 on the end, without highlighting the (5:42) or the 2 in the braces. The numbers won't always be in the same place, and there won't always only be one number.
I basically need a regexp to say:
"Match any numbers that aren't anywhere between {} and don't match the (#:#) pattern."
I've been at this for a day and a half now and I'm pulling out my hair trying to figure it out. Help with this would be greatly appreciated!
I've already looked at regular-expressions.info, and tried playing around with RegexBuddy, but i'm just not getting it :c
Edit: By request, here's some more lines copied right from the script editor.
(0:7) When somebody moves into position (**10** fhejwkfhwjekf **20**,
(0:20) When somebody rolls exactly **10** on **2** dice of **6** sides,
(0:31) When somebody says {...},
(3:3) within the diamond (**5**,**10**) - **20** //// **25**,
(3:14) in a line starting at (#, #) and going # more spaces northeast.
(5:10) play sound # to everyone who can see (#,#).
(5:14) move the user to (#,#) if there's nobody already there.
(5:272) set message ~msg to be the portion of message ~msg from position # to position #.
(5:302) take variable %var and add # to it.
(5:600) set database entry {...} about the user to #.
(5:601) set database entry {...} about the user named {...} to #.

You might kick yourself when you see this solution...
Assuming this desired number will always be used in a sentence, it should always have a space preceding it.
$pattern = '/ [0-9]+/s';
If the preceding space isn't always present, let me know and I'll update the answer.
Here's the updated regex to match the 2 examples in your question:
$pattern = '/[^:{}0-9]([0-9,]+)[^:{}0-9]/s';
3nd update to account for your question revisions:
$pattern = '/[^:{}0-9a-z#]([0-9]+[, ]?[0-9]*)[^:{}0-9a-z#]/s';
So you don't highlight the number in things like
{update 29 testing}
you might want to pre strip the braces, like so:
$pattern = '/[^:{}0-9a-z#]([0-9]+[, ]?[0-9]*)[^:{}0-9a-z#]/s';
$str = '(0:7) Hello {update 29 testing} 123 Rodger alpha charlie 99';
$tmp_str = preg_replace('/{[^}]+}/s', '', $str);
preg_match($pattern, $tmp_str, $matches);

(\d+,\s?\d+)|(?<![\(\{:]|:\d)\d+(?![\)\}])
http://regexr.com?30omd
Would this work?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.