I'm trying to get all numbers from a string, having - or _ before the number and optional - _ space or the end of string at the end of the number.
So, my regex looks like this:
[-_][\d]+[-_ $]?
My problem is, I don't match numbers right after each other. From a "foo-5234_2123_54-13-20" string, I only get 5234, 54 and 20.
What I tried is the following regexes: (?:[-_])[\d]+(?:[-_ $])? and [-_]([\d]+)[-_ $]? that obviously didn't work. I'm looking for hours now and I know it can't be that hard so I hope someone can help me here.
If that makes any difference, I'm using PHP preg_match_all.
You just need to use look-arounds:
(?<=[-_])\d+(?=[-_ ]|$)
See demo
Fortunately, PHP supports at least fixed-width look-behinds, and we can use it here.
Related
Running into this problem and I've been searching for days. I'm using PHP to parse formulas for a platform.
A formula could be something like:
object.Field
ADD(object.NumberOfTHings, object.NumberOfThings)
object.DoSomething(ADD(object.NumberOfTHings, object.NumberOfThings), 'words!')
The idea is, it can nest many levels. Users can include quotes (double and single) as well.
I'm working on a function that will return each parameter at the highest level. So
object.DoSomething(ADD(object.NumberOfTHings, object.NumberOfThings), 'words!')
Will need to return the following array:
ADD(object.NumberOfTHings, object.NumberOfThings)
'words!'
We then go back and parse each parameter appropriately (some are object calls, function calls, etc.). I'm open to parsing it all at once, but figured that would just be more complicated.
My current regex is as follows:
\(?'pullsinglequotes'\'.+?\')|(?'pulldoublequotes'\".+?\")|(?'pullfunctions'[^,]\(([^()]|(?R))*\))\
It MOSTLY works, but has two issues:
Won't return objects yet (ex. if I reference object.Field as a parameter).
Only includes the last letter of a function.
Here's a REGEXR with the issue:
https://regexr.com/41e20
I've tried many different variations of REGEX and each has its downsides.
My question is: Does anyone have enough regex knowledge to solve those two issues? If so, any help would be greatly appreciated.
Update
If anyone is interested, this following was my final regex.
/(?'pullsinglequotes'\'.+?\')|(?'pulldoublequotes'\".+?\")|(?'pullfunctions'\b[\w.]+\s*\(([^()]|(?R))*\))|(?'pullvars'\w+(?:\.\w+)?)/
Your pullfunctions is only matching one character that's not a , followed by a parens. Allow it to repeat and precede it with a word boundary.
For the vars and objects, just use a repeating word character with an optional dot-separated part. You can adjust this to a character group to allow other characters like - or _.
Full regex:
(?'pullsinglequotes'\'.+?\')|(?'pulldoublequotes'\".+?\")|(?'pullfunctions'\b[\w]+\s*\(([^()]|(?R))*\))|(?'pullvars'\w+(?:\.\w+)?)
I am stuck trying to create a regex that will allow for letters, numbers, and the following chars: _ - ! ? . ,
Here is what I have so far:
/^[-\'a-zA-Z0-9_!\?,.\s]+$/ //not escaping the ?
and this version too:
/^[-\'a-zA-Z0-9_!\?,.\s]+$/ //attempting to escape the ?
Neither of these seem to be able to match the following:
"Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?"
Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.
Thanks!
UPDATE:
Thanks to Lix's advice, this is what I have so far:
/^[-\'a-zA-Z0-9_!\?,\.\s]+$/
However, it's still not working??
UPDATE2
Ok, based on input this is what's happening.
User inputs string, then I run the string through following functions:
$comment = preg_replace('/\s+/', '',
htmlspecialchars(strip_tags(trim($user_comment_orig))));
So in the end, user input is just a long string of chars without any spaces. Then that string of chars is run using:
preg_match("#^[-_!?.,a-zA-Z0-9]+$#",$comment)
What could possibly be causing trouble here?
FINAL UPDATE:
Ended up using this regex:
"#[-'A-Z0-9_?!,.]+#i"
Thanks all! lol, ya'll are going to kill me once you find out where my mistake was!
Ok, so I had this piece of code:
if(!preg_match($pattern,$comment) || strlen($comment) < 2 || strlen($comment) > 60){
GEEZ!!! I never bothered to look at the strlen part of the code. Of course it was going to fail every time...I only allowed 60 chars!!!!
When in doubt, it's always safe to escape non alphanumeric characters in a class for matching, so the following is fine:
/^[\-\'a-zA-Z0-9\_\!\?\,\.\s]+$/
When run through a regular expression tester, this finds a match with your target just fine, so I would suggest you may have a problem elsewhere if that doesn't take care of everything.
I assume you're not including the quotes you used around the target when actually trying for a match? Since you didn't build double quote matching in...
Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.
in which case you don't need the \s if it's working correctly.
I got the following code to work as expected to (running php5):
<?php
$pattern = "#[-'A-Z0-9_?!,.\s]+#i";
$string = "Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?";
$results = array();
preg_match($pattern, $string, $results);
echo '<pre>';
print_r($results);
echo '</pre>';
?>
The output from print_r($results) was as following:
Array
(
[0] => Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?
)
Tested on http://writecodeonline.com/php/.
It's not necessary to escape most characters inside []. However, \s will not do what you want inside the expression. You have two options: either manually expand (/^[-\'a-zA-Z0-9_!?,. \t\n\r]+$/) or use alternation (/^(?:[-\'a-zA-Z0-9_!?,.]|\s)+$/).
Note that I left the \ before the ' because I'm assuming you're putting this in a PHP string and I wouldn't want to suggest a syntax error.
The only characters with a special meaning within a character class are:
the dash (since it can be used as a delimiter for ranges), except if it is used at the beginning (since in this case it is no part of any range),
the closing bracket,
the backslash.
In "pure regex parlance", your character class can be written as:
[-_!?.,a-zA-Z0-9\s]
Now, you need to escape whatever needs to be escaped according to your language and how strings are written. Given that this is PHP, you can take the above sample as is. Note that \s is interpreted in character classes as well, so this will match anything which is matched by \s outside of a character class.
While some manuals recommend using escapes for safety, knowing the general regex rules for character classes and applying them leads to shorter and easier to read results ;)
I want to find any pattern matching: ###-##-####
and replace the ###-##, with ***-**
but leave the -####
I tried this below, but nothing is being replaced at all.
preg_replace('/(^[\d]{3})(-)([\d]{2})(-[\d]{4}$)/','\2\4',$myText);
Any help is appreciated
Update, here is my entire code string as it currently stands, after trying a few of the suggestions below. I am comparing the second echo output to the first... and the social numbers all remain the same.
Also, as it was mentioned below, my string does contain more than just a social... it is thousands of characters long. which i think is my real issue. Sorry if i didnt clear that up in the beginning.
//Make the CSC credit report request.
$strCscResponse = $Csc->makeRequest($strFixedFormatRecord);
echo "<br/><br/><pre>" . $strCscResponse . "</pre><br/><br/>";
$strCscResponse = str_replace("!", " ", $strCscResponse);
$strCscResponse = preg_replace('/^\d{3}-\d{2}(-\d{4})$/','***-**$1',$strCscResponse);
echo "<br/><br/><pre>" . $strCscResponse . "</pre><br/><br/>";
update
I'd like to mark all the answers and "the answer" just because i didnt clarify the string has more than just a social in it. thank you for the help with this issue, embarrisingly enough it has been driving me wild for a couple days now.
There is one possible problem: you might not be matching the right string (if you are trying to find SSNs buried in a large block of text) - the ^ and $ anchors will only match beginning of string (or sometimes beginning of line) - if this is not what you want, but instead you want to find SSNs in a long string, you need to get rid of those anchors.
The other problem, potentially, is that you seem to want to replace things with asterisks, but you do not include asterisks in your replacement expression. you need to use a replacement expression like
`***-**\4`
Try this regex:
(\d{3})(-)(\d{2})(-\d{4})
Try this:
preg_replace('/^\d{3}-\d{2}(-\d{4})$/','***-**$1',$myText);
you have ^ and $ in your pattern, but I see no m modifier, so this
will only match if ###-##-#### is the entire string.
[\d] can be
shortened to \d
your \2\4 will leave --####, if you wanted *-####
you can simply have *\4
I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.
I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr
This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.
im new to regular expressions in php.
I have some data in which some of the values are stored as zero(0).What i want to do is to replace them with '-'. I dont know which value will get zero as my database table gets updated daily thats why i have to place that replace thing on all the data.
$r_val=preg_replace('/(0)/','-',$r_val);
The code im using is replacing all the zeroes that it finds for eg. it is even replacing zero from 104.67,giving the output 1-4.56 which is wrong. i want that data where value is exact zero that must be replaced by '-' not every zero that it encounter.
Can anyone please help!!
Example of the values that $r_val is having :-
10.31,
391.05,
113393,
15.31,
1000 etc.
This depends alot on how your data is formatted inside $r_val, but a good place to start would be to try:
$r_val = preg_replace('/(?<!\.)\b0\b(?!\.)/', '-', $r_val);
Where \b is a 0-length character representing the start or end of a 'word'.
Strange as it may sound, but the Perl regex documentation is actually really good for explaining the regex part of the preg_* functions, since Perl is where the functionality is actually implemented.
Again, it would be more than helpful if you could supply an example of what the $r_val string really looks like.
Note that \b matches at word boundaries, which would also turn a string like "0.75" into "-.75". Not a desirable result, I guess.
Whilst the other answer does work, it seems overly complex to me. I think you need only to use the ^ and $ chars either side of 0.
$r_val = preg_replace('/^0+$/', '-', $r_val);
^ indicates the regex should match from the beginning of the line.
$ indicates the regex should match to the end of the line.
+ means match this pattern 1 or more times
I altered the minus sign to it's html code equivalent too. Paranoid, yes, but we are dealing with numbers after all, so I though throwing a raw minus sign in there might not be the best idea.
Why not just do this?
if ( $r_val == 0 )
$r_val = '-';
You do not need to use a regex for this. In fact, I'd advise against doing so for performance reasons. The operation above is approximately 20x faster than the regex solution.
Also, the PHP manual advises against using regexes for simple replacements:
If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().
http://us.php.net/manual/en/function.str-replace.php
Hope that helps!