Regex for two latitudes and longitudes not working - php

I am using some data which gives paths for google maps either as a path or a set of two latitudes and longitudes. I have stored both values as a BLOB in a mySql database, but I need to detect the values which are not paths when they come out in the result. In an attempt to do this, I have saved them in the BLOB in the following format:
array(lat,lng+lat,lng)
I am using preg_match to find these results, but i havent managed to get any to work. Here are the regex codes I have tried:
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)\+(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)[\)]{1}^
Regex confuses me sometimes (as it is doing now). Can anyone help me out?
Edit:
The lat can be 2 digits followed by a decimal point and 8 more digits and the lng can be 3 digits can be 3 digits follwed by a decimal point and 8 more digits. Both can be positive or negative.
Here are some example lat lngs:
51.51160000,-0.12766000
-53.36442000,132.27519000
51.50628000,0.12699000
-51.50628000,-0.12699000
So a full match would look like:
array(51.51160000,-0.12766000+-53.36442000,132.27519000)
Further Edit
I am using the preg_match() php function to match the regex.

Here are some pointers for writing regex:
If you have a single possibility for a character, for example, the a in array, you can indeed write it as [a]; however, you can also write it as just a.
If you are looking to match exactly one of something, you can indeed write it as a{1}, however, you can also write it as just a.
Applying this lots, your example of ^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^ reduces to ^array\([1-9\.\,\+]{1*}\)^ - that's certainly an improvement!
Next, numbers may also include 0's, as well as 1-9. In fact, \d - any digit - is usually used instead of 1-9.
You are using ^ as the delimiter - usually that is /; I didn't recognize it at first. I'm not sure what you can use for the delimiter, so, just in case, I'll change it to the usual /.This makes the above regex /array\([\d\.\,\+]{1*}\)/.
To match one or more of a character or character set, use +, rather than {1*}. This makes your query /array\([\d\.\,\+]+\)/
Then, to collect the resulting numbers (assuming you want only the part between the brackets, put it in (non-escaped) brackets, thus: /array\(([\d\.\,\+]+)\)/ - you would then need to split them, first by +, then by ,. Alternatively, if there are exactly two lat,lng pairs, you might want: /array\(([\d\.]+),([\d\.]+)\+([\d\.]+),([\d\.]+)\)/ - this will return 4 values, one for each number; the additional stuff (+, ,) will already be removed, because it is not in (unescaped) brackets ().
Edit: If you want negative lats and longs (and why wouldn't you?) you will need \-? (a "literal -", rather than part of a range) in the appropriate places; the ? makes it optional (i.e. 0 or 1 dashes). For example, /array\((\-?[\d\.]+),(\-?[\d\.]+)\+(\-?[\d\.]+),(\-?[\d\.]+)\)/
You might also want to check out http://regexpal.com - you can put in a regex and a set of strings, and it will highlight what matches/doesn't match. You will need to exclude the delimiter / or ^.
Note that this is a little fast and loose; it would also match array(5,0+0,1...........). You can nail it down a little more, for example, by using (\-?\d*\.\d+)\) instead of (\-?[\d\.]+)\) for the numbers; that will match (0 or 1 literal -) followed by (0 or more digits) followed by (exactly one literal dot) followed by (1 or more digits).

This is the regex I made:
array\((-*\d+\.\d+),(-*\d+\.\d+)\+(-*\d+\.\d+),(-*\d+\.\d+)\)
This also breaks the four numbers into groups so you can get the individual numbers.
You will note the repeated pattern of
(-*\d+\.\d+)
Explanation:
-* means 0 or more matches of the - sign ( so - sign is optional)
\d+ means 1 or more matches of a number
\. means a literal period (decimal)
\d+ means 1 or more matches of a number
The whole thing is wrapped in brackets to make it a captured group.

Related

Calculate the max length of a regex output string

A user can define the format of an identifier in my system, and this is stored in the d/b as a regex string (for example, "/^\d{6}$/", or a more complicated example of "/^[A-Z]{2}\d{8}$/").
Can anyone suggest how I can calculate the maximum length of the string that the given regex can match (thanks #Ulver)?
Many thanks for reading!
This answer assumes 5 things:
The expressions are simple, as per your examples.
You do not have * or + operators in your expression.
You do not have patterns of the type foo{n, }, where n is some positive, integer value.
Each expression starts with ^ and ends with $.
I am also assuming that each term is followed by the amount of times you expect to match it.
To calculate the amount of characters they match, you could go through the expression and look for 2 patterns:
{n}, which translates to match exactly n times. In this case, extract n.
{n, m}, which translates to match at least n times, and at most m times. In this case, extract m.
Once that you will have all the n and m values, you would simply add them together.
Some more details on the assumptions:
As expressions get more complicated, you will need to keep track of various characters. For instance, ^[A-Z]{2}$ means match 2 upper case letters. Thus, the length of what is matched will be 2. On the other hand, foo{2} means fooo. But afooo and foooobar will also be matched. Thus, you have no control over the lenght of the pattern. also (abc){2} means match abc twice, thus, in this case, you would need to multiply the value of n (the value in the braces) with the length of what ever lies within the brackets which precede it, if any. Of course, you could have nested values.
The * and + operator denote 0 or more, and 1 or more respectively. Thus, there is, theoretically, no limit on the length of whatever it is matched.
Similar to point 2, {n,} means match at least n times. Thus, there is no upper limit.
Similar to point 1, without the ^ and $ anchor, an expression can match any string. The expression foo can match afoo, foobar, foooooooooooooooooooooooo and so on.
I took this assumption for reasons similar to point 1. You could enhance your application to look for [] pairs and count them as 1 character, but I think you could have other caveats.

PCRE(php) Is it possible to check if sequence of numbers contains only unique number for that sequence?

Assuming I have a set of numbers (from 1 to 22) divided by some trivial delimiters (comma, point, space, etc). I need to make sure that this set of numbers does not contain any repetition of the same number. Examples:
1,14,22,3 // good
1,12,12,3 // not good
Is it possible to do via regular expression?
I know it's easy to do using just php, but I really wander how to make it work with regex.
Yes, you could achieve this through regex via negative looahead.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:,\d+)+$
(?!.*\b(\d+)\b.*\b\1\b) Negative lookahead at the start asserts that the there wouldn't be a repeated number present in the match. \b(\d+)\b.*\b\1\b matches the repeated number.
\d+ matches one or more digits.
(?:,\d+)+ One or more occurances of , , one or more digits.
$ Asserts that we are at the end .
DEMO
OR
Regex for the numbers separated by space, dot, comma as delimiters.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:([.\s,])\d+)(?:\2\d+)*$
(?:([.\s,])\d+) capturing group inside this non-capturing group helps us to check for following delimiters are of the same type. ie, the above regex won't match the strings like 2,3 5.6
DEMO
You can use this regex:
^(?!.*?(\b\d+)\W+\1\b)\d+(\W+\d+)*$
Negative lookahead (?!.*?(\b\d+)\W+\1\b) avoids the match when 2 similar numbers appear one after another separated by 1 or more non-word characters.
RegEx Demo
Here is the solution that fit my current need:
^(?>(?!\2\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\2\b)(1\d{1}|2[0-2]{1}|\d{1}+))$
It returns all the sequences with unique numbers divided by one or more separator and also limit the number itself from 1 to 22, allowing only 3 numbers in the sequence.
See working example
Yet, it's not perfect, but work fine! Thanks a lot to everyone who gave me a hand on this!

Regex to allow numbers and only one hyphen in the middle

I am trying to write a regular expression to allow numbers and only one hypen in the middle (cannot be at start or at the end)
say pattern: 02-04 , 02are acceptable but
pattern: -- or - or -02 or 04- or 02-04-06 are unacceptable
I tried something like this but this would allow - at the beginning and also allow multiple -
'/^[0-9 \-]+$/'
I am not that good with regex so a little explanation would be real helpful.
EDIT: Sorry to bug you again with this but I need the numbers to be of only 2 digits (123-346) should be considered invalid.
Try this one:
/^\d{1,2}(-\d{1,2})?$/
One or two digits, followed by, optionally, ( a hyphen followed by one or two digits)
Fairly easy:
^\d+(-\d+)?$
At least one (+) digit (\d), followed by an optional group containing a hyphen-minus (-), followed by at least one digit again.
For strings containing only that pattern the following should work
^(\d{2}-)?\d{2}$
A group of 2 digits followed by minus ending with a group of 2 digits without minus.

Need to know what this regex does, is it safe?

Updating someone else's old PHP project and I'm unfamiliar with regular expressions.
Question one is: What does this do?
preg_match('/^[0-9]+[.]?[0-9]*$/', $variable)
Question two is: Is this a safe filter for insertion into a mysql DB without mysql_real_escape_string()? I know the answer is prob no, but it is set up to use mysql_real_escape_string() only if this regex doesn't pass.
Thanks.
^ // start of string
[0-9]+ // one or more numbers (could also be \d+)
[.]? // zero or one period (could also be \.?)
[0-9]* // zero or more numbers (could also be \d*)
$ //end of string
So, it makes sure the input is a number, such as 12 or 3.6 (52. will also match). It will not match .35 or 12a6.
It seems safe enough for DB insertion, because it only allows numbers.
it matches strings that:
start with at least 1 digit from 0-9
have a decimal point after the first n digits 0 or 1 time
have any digit after a char 0 or more times
It does not sanitise string for database.
It checks if $variable matches this pattern...
starts with one or more digits (^[0-9]+)
followed by optional . ([.]?)
followed by as many or as few digits as you like ([0-9]*)
followed by the end of the string ($)
It's attempting to match a decimal number (albeit poorly). It doesn't modify $variable anyway, so you would need to escape it properly before passing to MySQL.
That will match a number that has at least one digit before the decimal point (if there is a decimal point). If the value matches this regex, I don't see how it could be unsafe to insert it into the database.
looks if the a exact match.
it matches
234234232432343.231313132321
and
2232233223
and
322332.
and not
.32232
and not
Is this a safe filter for insertion into a mysql DB without mysql_real_escape_string()?
Assuming the possible use of this variable, I'd say that mysql_real_escape_string() would be quite useless for it.
Need the query assembling code to be certain though.

PHP regex non-capture non-match group

I'm making a date matching regex, and it's all going pretty well, I've got this so far:
"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"
It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.
But, I've got two simple questions regarding these results:
(?: ) what is a simple explanation for this? Apparently it's a non-matching group. But then...
What is the trailing ? for? e.g. (? )?
[Edited (again) to improve formatting and fix the intro.]
This is a comment and an answer.
The answer part... I do agree with alex' earlier answer.
(?: ), in contrast to ( ), is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.
The ? following the (?: ) -- or when following anything except * + ? or {} -- means that the preceding item may or may not be found within a legitimate match. Eg, /z34?/ will match z3 as well as z34 but it won't match z35 or z etc.
The comment part... I made what might considered to be improvements to the regex you were working on:
(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)
-- First, it avoids things like 0-0-2011
-- Second, it avoids things like 233443-4-201154564
-- Third, it includes things like 1-1-2022
-- Forth, it includes things like 1-1-11
-- Fifth, it avoids things like 34-4-11
-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.
Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:
(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)
Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).
A commenter elsewhere referenced this resource which you might find useful:
http://rubular.com/
It is a non capturing group. You can not back reference it. Usually used to declutter backreferences and/or increase performance.
It means the previous capturing group is optional.
Subpatterns
Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpattern does two things:
It localizes a set of alternatives. For example, the pattern
cat(aract|erpillar|) matches one of the words "cat", "cataract", or
"caterpillar". Without the parentheses, it would match "cataract",
"erpillar" or the empty string.
It sets up the subpattern as a capturing subpattern (as defined
above). When the whole pattern matches, that portion of the subject
string that matched the subpattern is passed back to the caller via
the ovector argument of pcre_exec(). Opening parentheses are counted
from left to right (starting from 1) to obtain the numbers of the
capturing subpatterns.
For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.
The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 65535. It may not be possible to compile such large patterns, however, depending on the configuration options of libpcre.
As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns
(?i:saturday|sunday)
(?:(?i)saturday|sunday)
match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".
It is possible to name a subpattern using the syntax (?Ppattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name. PHP 5.2.2 introduced two alternative syntaxes (?pattern) and (?'name'pattern).
Sometimes it is necessary to have multiple matching, but alternating subgroups in a regular expression. Normally, each of these would be given their own backreference number even though only one of them would ever possibly match. To overcome this, the (?| syntax allows having duplicate numbers. Consider the following regex matched against the string Sunday:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2, while backreference 1 is empty. Matching yields Sat in backreference 1 while backreference 2 does not exist. Changing the pattern to use the (?| fixes this problem:
(?|(Sat)ur|(Sun))day
Using this pattern, both Sun and Sat would be stored in backreference 1.
Reference : http://php.net/manual/en/regexp.reference.subpatterns.php

Categories