Regular expression doesn't quite work - php

I have created a Regular Expression (using php) below; which must match ALL terms within the given string that contains only a-z0-9, ., _ and -.
My expression is: '~(?:\(|\s{0,},\s{0,})([a-z0-9._-]+)(?:\s{0,},\s{0,}|\))$~i'.
My target string is: ('word', word.2, a_word, another-word).
Expected terms in the results are: word.2, a_word, another-word.
I am currently getting: another-word.
My Goal
I am detecting a MySQL function from my target string, this works fine. I then want all of the fields from within that target string. It's for my own ORM.
I suppose there could be a situation where by further parenthesis are included inside this expression.

From what I can tell, you have a list of comma-separated terms and wish to find only the ones which satisfy [a-z0-9._\-]+. If so, this should be correct (it returns the correct results for your example at least):
'~(?<=[,(])\\s*([a-z0-9._-]+)\\s*(?=[,)])~i'
The main issues were:
$ at the end, which was anchoring the query to the end of the string
When matching all you continue from the end of the previous match - this means that if you match a comma/close parenthesis at the end of one match it's not there at match at the beginning of the next one. I've solved this with a lookbehind ((?<=...) and a lookahead ((?=...)
Your backslashes need to be double escaped since the first one may be stripped by PHP when parsing the string.
EDIT: Since you said in a comment that some of the terms may be strings that contain commas you will first want to run your input through this:
$input = preg_replace('~(\'([^\']+|(?<=\\\\)\')+\'|"([^"]+|(?<=\\\\)")+")~', '"STRING"', $input);
which should replace all strings with '"STRING"', which will work fine for matching the other regex.

Maybe using of regex is overkill. In this kind of text you can just remove parenthesis and explode string by comma.

Related

PHP (preg) Regular Expression For Content Indexing/Update

I have the following code:
/* record 863.content.en */
UPDATE language_def
SET en='<html>blah blah markup</html>'
WHERE page_id=863,
AND string_id='content';
/* record_end 863.content.en */
I would like to create an expression to match that statement where:
the data in between the periods of 863.content.en are variable BUT SPECIFIC (there will be many of these statements in a row)
the data in between the two comments is variable but NOT specific
This is what I have so far:
'[/*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*/].*[/*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*/]'
There are a few problems with your regex.
First of all, as FrankeTheKneeMan pointed out, you need delimiters. # is a good choice for HTML matches (the standard choice is / but that interferes with tags too often):
'#[/*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*/].*[/*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*/]#'
Now while [.] is a nice way of escaping a single character, it doesn't work the same for [/*]. This is a character class, that matches either / or *. Same for [*/]. Use this instead:
'#/[*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/.*/[*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/#'
Now .* is the remaining problem. Actually there are too, one is critical, the other might not be. The first is that . does not match line breaks by default. You can change this by using the s (singleline) modifier. The second is, that * is greedy. Should a section appear twice in the string, you would get everything from the first corresponding /* record to the last corresponding /* record_end, even if there is unrelated stuff in between. Since your records seem to be very specific, I suppose this is not the case. But still it is generally good practice, to make the quantifier ungreedy, so that it consumes as little as possible. Here is your final regex string:
'#/[*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/.*?/[*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*]/#s'
For your presented example, this is
'#/[*]\s*record\s*863[.]content[.]en\s*[*]/.*?/[*]\s*record_end\s*863[.]content[.]en\s*[*]/#s'
If you want to find all of these sections, then you can make 863, content and en variable, capture them (using parentheses) and use a backreference to make sure you get the corresponding record_end:
'#/[*]\s*record\s*(\d+)[.](\w+)[.](\w+)\s*[*]/.*?/[*]\s*record_end\s*\1[.]\2[.]\3\s*[*]/#s'
'#/\* record (\S+) \*/.*<html>(.*)</html>.*/\* record_end \1 \*/#is'
This regular expression will split your string up into individual records, as seen here. You can feel free to replace any spaces with \s*, but I left it this way for readability. \S+ matches any number of non-whitespace characters, but you can replace it with your specific strings if you like. Other wise, you can parse over the match objects returned by preg_match_all and use the first subcapture to get the specific record, and the second subcapture to get the information between the html tags. The #s are delimiters needed by php to separate the regular expressions - i for case insensitive and s to make the . match new lines.

Regex to find content of the last occurence of square brackets

Hi Everybody,
I'm Currently using preg_match and I'm trying to extract some informations enclosed in square brackets.
So far, I have used this:
/\[(.*)\]/
But I want it to be only the content of the last occurence - or the first one, if starting from the end!
In the following:
string = "Some text here [value_a] some more text [value_b]"
I need to get:
"value_b"
Can anybody suggest something that will do the trick?
Thanks!
Match against:
/.*\[([^]]+)\]/
using preg_match (no need for the _all version here, since you only want the last group) and capture the group inside.
Your current regex, with your input, would capture value_a] some more text [value_b. Here, the first .* swallows everything, but must backtrack for a [ to be matched -- the last one in the input.
If you are only expecting numbers/letter (no symbols) you could use \[([\w\d]+)\] with preg_match_all() and pull the last of the array as the end variable. You can add any custom symbols by escaping them in the character class definition.
\[([^\]]*)\][^\[]*$
See it here on regexr
var someText="Some text here [value_a] some more text [value_b]";
alert(someText.match(/\[([^\]]*)\][^\[]*$/)[1]);
The part inside the brackets is stored in capture group 1, therefor you need to use match()1 to access the result.
For simple brakets, see the source to make this answer: Regex for getting text between the last brackets ()

php regex question for matching google searchterms in url

im finding searchwords from google request urls.
im using
preg_match("/[q=](.*?)[&]/", $requesturl, $match);
but it fails when the 'q' parameter is the last parameter of the string.
so i need to fetch everything that comes after 'q=', but the match must stop IF it finds '&'
how to do that?
EDIT:
I eventually landed on this for matching google request url:
/[?&]q=([^&]+)/
Because sometimes they have a param that ends with q. like 'aq=0'
You need /q=([^&]+)/. The trick is to match everything except & in the query.
To build on your query, this is a slightly modified version that will (almost) do the trick, and it's the closest to what you have there: /q=(.*?)(&|$)/. It puts the q= out of the brackets, because inside the brackets it will match either of them, not both together, and at the end you need to match either & or the end of the string ($). There are, though, a few problems with this:
sometimes you will have an extra & at the end of the match; you don't need it. To solve this problem you can use a lookahead query: (?=&|$)
it introduces an extra group at the end (not necessarily bad, but can be avoided) -- actually, this is fixed by 1.
So, if you want a slightly longer query to expand what you have there, here it is: /q=(.*?)(?=&|$)/
Try this:
preg_match("/q=([^&]+)/", $requesturl, $match);
A little explaining:
[q=] will search for either q or =, but not one after another.
[&] is not needed as there is only one character. & is fine.
the ? operator in regex tells it to match 0 or 1 occurrences of the ** preceding** character.
[^&] will tell it to match any character except for &. Which means you'll get all the query string until it hits &.

Parse block with php regex

I'm trying to write a (I think) pretty simple RegEx with PHP but it's not working.
Basically I have a block defined like this:
%%%%blockname%%%%
stuff goes here
%%%%/blockname%%%%
I'm not any good at RegEx, but this is what I tried:
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/i',$input,$matches);
It returns an array with 4 empty entries.
I guess it also, apart from actually working, needs some sort of pointer for the third match because it should be equal to the first one?
Please enlighten me :)
You need to allow the dot to match newlines, and to allow ^ and $ to match at the start and end of lines (not just the entire string):
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/sm',$input,$matches);
The s (single-line) option makes the dot match any character including newlines.
The m (multi-line) option allows ^ and $ to match at the start and end of lines.
The i option is unnecessary in your regex since there are no case-sensitive characters in it.
Then, to answer the second part of your question: If blockname is the same in both cases, then you can make that explicit by using a backreference to the first capturing group:
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/\1%%%%$/sm',$input,$matches);
I'm pretty sure you can't since these operations would need to save a variable and you can't in regex. You should try to do this using PHP's built-in token parser. http://php.net/manual/en/function.token-get-all.php

Replacing a string using preg_match

I'm having trouble using preg_match to find and replace a string. The string of interest is:
<span style="font-size:0.6em">EXPIRATION DATE: 04/30/2011</span>
I need to target and replace the date, "04/30/2011" with a different date. Can someone throw me a bone a give me the regular expression to match this pattern using preg_match in PHP? I also need it to match in such a way that it only replaces up to the first closing span and not closing span tags later in the code, e.g.:
<span style="font-size:0.6em">EXPIRATION DATE: 04/30/2011</span><span class="hello"></span>
I'm not versed in regex, and although I've spent the last hour trying to learn enough to make this work, I'm utterly failing. Thanks so much!
EDIT: As you can see this has gotten me exhausted. I did mean preg_replace, not preg_match.
If you're after a replacement, consider using preg_replace(), something like
preg_replace('#(\d{2})/(\d{2})/(\d{4})#', '<new date>', $string);
How about this:
$toBeFoundPattern = '/([0-9][0-9])\/([0-9][0-9])\/([0-9][0-9][0-9][0-9])/';
$toBeReplacedPattern = '$2.$1.$3';
$inString = '<span style="font-size:0.6em">EXPIRATION DATE: 04/30/2011</span>';
// Will convert from US date format 04/30/2011 to european format 30.04.2011
echo preg_replace( $toBeFoundPattern, $toBeReplacedPattern, $inString );
and prints
EXPIRATION DATE: 30.04.2011
Patterns always begin and end with identical so called delimiter characters. Often the character / is used.
$1 references the string, which matched the first string matched by ([0-9][0-9]), $2 references be (...) and $3 the four letters matched by the last (...).
[...] matched a single character, which is one of those listed inside the brackets. E.g. [a-z] matches all lower case letters.
To use the special meaning character / inside of a pattern, you need to escape it by \ to make it be the literal slash character.
Update: Using {..} as pointed out below is shorthand for repeated patterns.
Regex should be:
(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d
If you want to only match one instance, this is OK. For multiple instances, use preg_match_all instead. Taken from http://www.regular-expressions.info/regexbuddy/datemmddyyyy.html.
Edit: are you looking to just search and replace inside a PHP script or do you want to do some javascript live replacement?

Categories