Having a little trouble with my RegEx today

Having a little trouble with my RegEx today - php

I am having a bit of a time with my RegEx today
\('[\d',]+
In the string:
INSERT INTO `order_status_histories` VALUES ('3602','52efabe9-5f8c-4512-a994-3227c63dd20e','1','','Order recieved','2014-02-03 16:47:05','2014-02-03 16:47:05'),('3603','52eff713-54fc-4be0-9389-68d5c63dd20e','1','','Order recieved','2014-02-03 22:07:47','2014-02-03 22:07:47'),('3604','52effd1a-bc14-4095-97fd-6d46c63dd20e','1','','Order recieved','2014-02-03 22:33:30','2014-02-03 22:33:30')
As you can see this is an insert statement, however, that 1st value is the ID of the record, which I do not need inserted, so I am attempting to find all of them, and simply blank them out... but I need to #1 get that number, the 2 ' characters, and the , after it in order to do so... so I though that I would start with the opening (.
The regex I posted in here is grabbing what I need, but a bit extra... it seems to be grabbing this ('3670','5304 (for instance in that first insertable record)
How can I do what I need here?

What about \('\d+', - so explicitly looking for digits, then ' then ,

the character class [\d',] isn't doing what you want - its matching both the opening quote of the second field and the decimal digits after it until you get to a letter

Related

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

I am trying to retrieve matches from a comma separated list that is located inside parenthesis using regular expression. (I also retrieve the version number in the first capture group, though that's not important to this question)
What's worth noting is that the expression should ideally handle all possible cases, where the list could be empty or could have more than 3 entries = 0 or more matches in the second capture group.
The expression I have right now looks like this:
SomeText\/(.*)\s\(((,\s)?([\w\s\.]+))*\)
The string I am testing this on looks like this:
SomeText/1.0.4 (debug, OS X 10.11.2, Macbook Pro Retina)
Result of this is:
1. [6-11] `1.0.4`
2. [32-52] `, Macbook Pro Retina`
3. [32-34] `, `
4. [34-52] `Macbook Pro Retina`
The desired result would look like this:
1. [6-11] `1.0.4`
2. [32-52] `debug`
3. [32-34] `OS X 10.11.2`
4. [34-52] `Macbook Pro Retina`
According to the image above (as far as I can see), the expression should work on the test string. What is the cause of the weird results and how could I improve the expression?
I know there are other ways of solving this problem, but I would like to use a single regular expression if possible. Please don't suggest other options.

When dealing with a varying number of groups, regex ain't the best. Solve it in two steps.
First, break down the statement using a simple regex:
SomeText\/([\d.]*) \(([^)]*)\)
1. [9-14] `1.0.4`
2. [16-55] `debug, OS X 10.11.2, Macbook Pro Retina`
Then just explode the second result by ',' to get your groups.

Probably the \G anchor works best here for binding the match to an entry point. This regex is designed for input that is always similar to the sample that is provided in your question.
(?<=SomeText\/|\G(?!^))[(,]? *\K[^,)(]+
(?<=SomeText\/|\G) the lookbehind is the part where matches should be glued to
\G matches where the previous match ended (?!^) but don't match start
[(,]? *\ matches optional opening parenthesis or comma followed by any amount of space
\K resets beginning of the reported match
[^,)(]+ matches the wanted characters, that are none of ( ) ,
Demo at regex101 (grab matches of $0)
Another idea with use of capture groups.
SomeText\/([^(]*)\(|\G(?!^),? *([^,)]+)
This one without lookbehind is a bit more accurate (it also requires the opening parenthesis), of better performance (needs fewer steps) and probably easier to understand and maintain.
SomeText\/([^(]*)\( the entry anchor and version is captured here to $1
|\G(?!^),? *([^,)]+) or glued to previous match: capture to $2 one or more characters, that are not , ) preceded by optional space or comma.
Another demo at regex101

Actually, stribizhev was close:
(?:SomeText\/([^() ]*)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\))
Just had to make that one class expect at least one match
(?:SomeText\/([0-9.]+)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\)) is a little more clear as long as the version number is always numbers and periods.

I wanted to come up with something more elegant than this (though this does actually work):
SomeText\/(.*)\s\(([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?\)
Obviously, the
([^\,]+)?\,?\s?
is repeated 6 times.
(It can be repeated any number of times and it will work for any number of comma-separated items equal to or below that number of times).
I tried to shorten the long, repetitive list of ([^\,]+)?\,?\s? above to
(?:([^\,]+)\,?\s?)*
but it doesn't work and my knowledge of regex is not currently good enough to say why not.

This should solve your problem. Use the code you already have and add something like this. It will determine where commas are in your string and delete them.
Use trim() to delete white spaces at the start or the end.
$a = strpos($line, ",");
$line = trim(substr($line, 55-$a));
I hope, this helps you!

Dynamic regex to capture

Not sure if it is actually possible, but consider the following text:
INSERT INTO cms_download_history
SET
user_id = '{$userId}',
download_id = '{$fileId}',
remote_addr = '{$remote_addr}',
doa = GetDate()";
I want to change that to be:
INSERT INTO cms_download_history
(user_id,download_id,remote_addr,doa)
VALUES('{$userId}','{$fileId}','{$remote_addr}',GetDate());
Doing a regex to find and replace this one is easy as I know how many columns I have but what if I am trying to do this for multiple similar queries without knowing the number of columns, i.e.:
INSERT INTO mystery_table
SET
col1 = val1
col2 = val2
.... unknown number of columns and values.
Is there a dynamic regex that I can write that would detect that example?

Actually, if all queries look like this, with only a variable amount of columns, you can get the field names using a somewhat simple regex:
(\w+)\W*=\W*['"].+?(?!\\)['"],
Here is an example. Here is what it does:
It captures one or more word characters, if followed by:
Zero or more whitespace characters
An equal sign
Zero or more whitespace characters (again)
A ' or " (start of a String)
One or more characters
An unescaped ' or "
A comma
Note that this does assume that all values are strings. If you also need support for numbers, please let me know.

Regex match number consisting of specific range, and length?

I'm trying to match a number that may consist of [1-4], with a length of {1,1}.
I've tried multiple variations of the following, which won't work:
/^string\-(\d{1,1})[1-4]$/
Any guidelines? Thanks!

You should just use:
/^string-[1-4]$/
Match the start of the string followed by the word "string-", followed by a single number, 1 to 4 and the end of the string. This will match only this string and nothing else.
If this is part of a larger string and all you want is the one part you can use something like:
/string-[1-4]\b/
which matches pretty much the same as above just as part of a larger string.
You can (in either option) also wrap the character class ([1-4]) in parentheses to get that as a separate part of the matches array (when using preg_match/preg_match_all).

This is not hard:
/^string-([1-4]{1})$/

PHP: get numeric value in the end of a given formatted string

I "inherited" a buggy PHP page. I'm not an expert of this language but I think I found the origin of the bug. Inside a loop, the page sends a formatted string to the server: the string I found in the HTML page is like this one:
2011-09-19__full_1
so, it seems we have three parts:
a date (0,10);
a string (10,6);
a final number (17,1);
The code the handles this situation is the following:
$datagrid[] = array("date"=>substr($post_array_keys[$i], 0, 10),"post_mode"=>substr($post_array_keys[$i], 10, 6),"class_id"=>substr($post_array_keys[$i], 17, 1),"value"=>$_POST[$post_array_keys[$i]]);
What happens: the final number can contain more than one character, so this piece:
"class_id"=>substr($post_array_keys[$i], 17, 1)
is not correct because it seems to retrieve only one character starting from the 17th (and this seems to cause strange behaviors to the website).
Being the whole number the last part of the string, to get the entire number could I safely change this line this way?
"class_id"=>substr($post_array_keys[$i], 17, strlen($post_array_keys[$i])-17);

If you change the code the way you suggest you would get the numbers at the end starting in position 17. The original code gets only the first digit. Your code would get all the digits.
And it seems you did your homework the line
$datagrid[] = array("date"=>substr($post_array_keys[$i], 0, 10),"post_mode"=>substr($post_array_keys[$i], 10, 6),"class_id"=>substr($post_array_keys[$i], 17, 1),"value"=>$_POST[$post_array_keys[$i]]);
does give you a very good clue of what you should expect in the variable:
first 10 is the date
then you have 6 chars for post_mode
then you have 1 char for class_id
If you also confirmed that sometimes the class_id can be more than 1 char, your suggested change would give you the complete class_id at the end.
Good luck.

you could use
$array = explode("_", $string);
this functions returns an array with the elements in the string delimited by "_".
I suggest this because the double underscore may hide another value that is empty in that particular case.

If it's only the last integer causing trouble, you can use strrchr to get the "tail" of the string, starting with the last '_'.

PHP Regex - Finding two consecutive words with unknown number of spaces (" ") between them

I am trying to create a PHP REGEX that will match if two words appear next to each other with ANY number of spaces between them.
For example, match "Daniel Baylis" (3 spaces between 'Daniel' and 'Baylis'). I tried with this but it doesn't seem to work:
"/DANIEL[ ]{1,5}BAYLIS/" (this was to check up to 5 spaces which the most I expect in the data)
and
"/DANIEL[ ]{*}BAYLIS/"
I need to extract names from within larger bodies of text and names can appear anywhere within that text. User input error is what creates the multiple spaces.
Thanks all! - Dan

/DANIEL[ ]+BAYLIS/ should do... + will glob one or more occurence of the previous character(-class), in this case, litteral space.
Also, assuming you want to match regardless of the case, you'll need to adjust your regex to be case-insensitive, which I'm not sure how to do in PHP (it depends on which flavor of regex you use... Long time since I last touched that...)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Having a little trouble with my RegEx today - php

What about \('\d+', - so explicitly looking for digits, then ' then ,

the character class [\d',] isn't doing what you want - its matching both the opening quote of the second field and the decimal digits after it until you get to a letter

Related

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

Dynamic regex to capture

Regex match number consisting of specific range, and length?

PHP: get numeric value in the end of a given formatted string

PHP Regex - Finding two consecutive words with unknown number of spaces (" ") between them

Categories

Resources