How can I use regex to solve this?

How can I use regex to solve this? - php

I have two strings that I need to pull data out of but can't seem to get it working. I wish I knew regular expression but unfortunately I don't. I have read some beginner tutorials but I can't seem to find an expression that will do what I need.
Out of this first string delimited by the equal character, I need to skip the first 6 characters and grab the following 9 characters. After the equal character, I need to grab the first 4 characters which is a day and year. Lastly for this string, I need the remaining numbers which is a date in YYYYmmdd.
636014034657089=130719889904
The second string seems a little more difficult because the spaces between the characters differ but always seem to be delimited by at minimum, a single space. Sometimes, there are as many as 15 or 20 spaces separating the blocks of data.
Here are two different samples that show the space difference.
!!92519 C 01 M600200BLNBRN D55420090205M1O
!!95815 A M511195BRNBRN D62520070906 ":%/]Q2#0*&
The data that I need out of these last two strings are:
The zip code following the 2 exclamation marks.
The single letter 'M' following that. It always appears to be in a 13 character block
The 3 numbers after the single letter
The next 3 numbers which are the person's height
The following next 3 are the person's weight
The next 3 are eye color
The next block of 3 which are the person's hair color
The last block that I need data from:
I need to get the single letter which in the example appears to be a 'D'.
Skip the next 3 numbers
The last and remaining 8 numbers which is a date in YYYYmmdd
If someone could help me resolve this, I'd be very grateful.

For the first string you can use this regular expression:
^[0-9]{6}([0-9]{9})=([0-9]{4})([0-9]{4})([0-9]{2})([0-9]{2})$
Explanation:
^ Start of string/line
[0-9]{6} Match the first 6 digits
([0-9]{9}) Capture the next 9 digits
= Match an equals sign
([0-9]{4}) Capture the "day and year" (what format is this in?)
([0-9]{4}) Capture the year
([0-9]{2}) Capture the month
([0-9]{2}) Capture the date
$ End of string/line
For the second:
^!!([0-9]{5}) +.*? +M([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
Rubular
It works in a similar way to the first. You may need to adjust it slightly if your data is not exactly in the format that the regular expression expects. You might want to replace the .*? with something more precise but I'm not sure what because you haven't described the format of the parts you are not interested in.

Related

Regular expression for exposed filter in a view [Drupal 8]

I am trying to filter out a result in a Drupal 8 view [Exposed Filter] using a regular expression. What I need is to search the keyword in the last 4 or 5 digits/letters of a specific field.
For example:
2006ABC00022
2014DEF03120
2019GHI03128
2019GHI07437
These are the data I need to filter. If someone tried to search "0022" I want to show the result as 2006ABC00022. Because the last 4 digit is 0022. We can use Ends with operator to do this. But I want something different because If someone tried to filter the result with "312" I want to show the results as 2014DEF03120 and 2019GHI03128. Because these 2 strings have 312 as starting of the last 4 digits. This scenario will not work if I use 'Ends with' operator. So I go for a regular expression.
"[0- 9]{4}$"
I tried to use the regex with the above one. And I realize that this is not working as I expected. This one is searching all over the string.
If I search for 2019 it shows the last 2 results. But it should be empty.
I just want to search the keyword on the last 4 digits. And if the keyword is 5 digit search for the last 5 digits.

It seems that a valid match here is a keyword starting with a four digit year, followed by three letters, then followed by 5 digits. This implies the following pattern:
\b\d{4}[A-Z]{3}\d{5}\b
Specifically, if you wanted to find matches ending in 0022 then modify the above pattern and use:
\b\d{4}[A-Z]{3}\d0022\b

Regular Expression in a METAR

I have some kind of simple and tricky problem.
Here I have a METAR (Weather in a very specific string format).
LIEA 051550Z 21005KT 9999 FEW020 19/14 Q1011
In this string, 051550Z represents that the weather bullettin has been emitted on 5th of the month at 15:50 UTC,... and 9999 indicates the visibility,...
Well, I tried to match a RegExp which could output me the visibility, but I didn't manage to get out of the problem.
preg_match_all() returns me the numbers
0515 (from the time group)
2100 (from the wind group)
9999 (wanted)
1011 (from the pressure group)
with the RegExp I've tried
([0-9]{4})
And then, I blindly added a
(?!Z)
trying not to get at least the time group...
But it doesn't work...
Looking at the problem itself, is it better to consider taking every time the third element of the array (without (?!Z) RegExp addition) or trying to catch directly the right value?
In my opinion the last choice would be better...
So, how can I get the visibility?

You could use a word boundary \b and then match 4 digits to get the visibility:
\b\d{4}\b
If it has to be 4 digits at the fourth position you could also match the first 3 sets matching 1+ times not a whitespace character \S+ followed by 1+ times a horizonal whitespace \h and repeat that 3 times.
Then use \K to forget what was matched and match 4 digit followed by a word boundary.
^(?:\S+\h+){3}\K\d{4}\b
Regex demo

Regex for two latitudes and longitudes not working

I am using some data which gives paths for google maps either as a path or a set of two latitudes and longitudes. I have stored both values as a BLOB in a mySql database, but I need to detect the values which are not paths when they come out in the result. In an attempt to do this, I have saved them in the BLOB in the following format:
array(lat,lng+lat,lng)
I am using preg_match to find these results, but i havent managed to get any to work. Here are the regex codes I have tried:
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^
^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)\+(\-?\d+(\.\d+)?),(\-?\d+(\.\d+)?)[\)]{1}^
Regex confuses me sometimes (as it is doing now). Can anyone help me out?
Edit:
The lat can be 2 digits followed by a decimal point and 8 more digits and the lng can be 3 digits can be 3 digits follwed by a decimal point and 8 more digits. Both can be positive or negative.
Here are some example lat lngs:
51.51160000,-0.12766000
-53.36442000,132.27519000
51.50628000,0.12699000
-51.50628000,-0.12699000
So a full match would look like:
array(51.51160000,-0.12766000+-53.36442000,132.27519000)
Further Edit
I am using the preg_match() php function to match the regex.

Here are some pointers for writing regex:
If you have a single possibility for a character, for example, the a in array, you can indeed write it as [a]; however, you can also write it as just a.
If you are looking to match exactly one of something, you can indeed write it as a{1}, however, you can also write it as just a.
Applying this lots, your example of ^[a]{1}[r]{2}[a]{1}[y]{1}[\(]{1}[1-9\.\,\+]{1*}[\)]{1}^ reduces to ^array\([1-9\.\,\+]{1*}\)^ - that's certainly an improvement!
Next, numbers may also include 0's, as well as 1-9. In fact, \d - any digit - is usually used instead of 1-9.
You are using ^ as the delimiter - usually that is /; I didn't recognize it at first. I'm not sure what you can use for the delimiter, so, just in case, I'll change it to the usual /.This makes the above regex /array\([\d\.\,\+]{1*}\)/.
To match one or more of a character or character set, use +, rather than {1*}. This makes your query /array\([\d\.\,\+]+\)/
Then, to collect the resulting numbers (assuming you want only the part between the brackets, put it in (non-escaped) brackets, thus: /array\(([\d\.\,\+]+)\)/ - you would then need to split them, first by +, then by ,. Alternatively, if there are exactly two lat,lng pairs, you might want: /array\(([\d\.]+),([\d\.]+)\+([\d\.]+),([\d\.]+)\)/ - this will return 4 values, one for each number; the additional stuff (+, ,) will already be removed, because it is not in (unescaped) brackets ().
Edit: If you want negative lats and longs (and why wouldn't you?) you will need \-? (a "literal -", rather than part of a range) in the appropriate places; the ? makes it optional (i.e. 0 or 1 dashes). For example, /array\((\-?[\d\.]+),(\-?[\d\.]+)\+(\-?[\d\.]+),(\-?[\d\.]+)\)/
You might also want to check out http://regexpal.com - you can put in a regex and a set of strings, and it will highlight what matches/doesn't match. You will need to exclude the delimiter / or ^.
Note that this is a little fast and loose; it would also match array(5,0+0,1...........). You can nail it down a little more, for example, by using (\-?\d*\.\d+)\) instead of (\-?[\d\.]+)\) for the numbers; that will match (0 or 1 literal -) followed by (0 or more digits) followed by (exactly one literal dot) followed by (1 or more digits).

This is the regex I made:
array\((-*\d+\.\d+),(-*\d+\.\d+)\+(-*\d+\.\d+),(-*\d+\.\d+)\)
This also breaks the four numbers into groups so you can get the individual numbers.
You will note the repeated pattern of
(-*\d+\.\d+)
Explanation:
-* means 0 or more matches of the - sign ( so - sign is optional)
\d+ means 1 or more matches of a number
\. means a literal period (decimal)
\d+ means 1 or more matches of a number
The whole thing is wrapped in brackets to make it a captured group.

Prevent the number of characters over 20 to pass the validation

I have this light regular expression to validate the coordinates: ^([0-9.,-]+){18,20}$^. If the coordinates for example is 33.431441,-170.15625 and are under 18 characters, I'll get an error message. Good! But if I add more characters to say 23 or more it will pass even if I have set an lock to 20.
You can test the regexp on http://regexpal.com/.
How can I fix this problem?
Thanks in advance.

You need the leading carat, or you are only checking the last 18-20 characters in the string:
/^[0-9\.,-]{18,20}$/
Edit: also, drop the plus sign, as others have noted.
Edit2: Parens are superfluous
Edit3: need to escape the period (otherwise, it matches any character)

It should be just ^([0-9.,-]){18,20}$. The + means one or more of the preceding elements, and then you have 18 to 20 of those. You want just 18 to 20 of the preceding elements. You don't need the caret at the end as that means "beginning of string"

Here's a suggestion if you want to constrain the limits and validate the coordinate format.
Assuming Lat,Long:
^-?\d{1,2}(\.\d{1,5})?,-?\d{1,3}(\.\d{1,5})?$
The comma will always consume 1 characters.
Max longitude primary digits are 3 characters.
Max latitude primary digits are 2 characters.
Optional decimal points take up 2 characters.
Optional polarity is 2 characters.
This leaves us with 10 characters reserved, giving us 5 left over on each side for decimal places. You can adjust \d{1,5} to something like \d{4,5} and enforce the decimal places if you require a minimum of 18.

Regex help to match more than one letter

I am using the following regex to match an account number. When we originally put this regex together, the rule was that an account number would only ever begin with a single letter. That has since changed and I have an account number that has 3 letters at the beginning of the string.
I'd like to have a regex that will match a minimum of 1 letter and a maximum of 3 letters at the beginning of the string. The last issue is the length of the string. It can be as long as 9 characters and a minimum of 3.
Here is what I am currently using.
'/^([A-Za-z]{1})([0-9]{7})$/'
Is there a way to match all of this?

You want:
^[A-Za-z]([A-Za-z]{2}|[A-Za-z][0-9]|[0-9]{2})[0-9]{0,6}$
The initial [A-Za-z] ensures that it starts with a letter, the second bit ([A-Za-z]{2}|[A-Za-z][0-9]|[0-9]{2}) ensures that it's at least three characters long and consists of between one and three letters at the start, and the final bit [0-9]{0,6} allows you to go up to 9 characters in total.
Further explaining:
^ Start of string/line anchor.
[A-Za-z] First character must be alpha.
( [A-Za-z]{2} Second/third character are either alpha/alpha,
|[A-Za-z][0-9] alpha/digit,
|[0-9]{2} or digit/digit
) (also guarantees minimum length of three).
[0-9]{0,6} Then up to six digits (to give length of 3 thru 9).
$ End of string/line marker.

Try this:
'/^([A-Za-z]{1,3})([0-9]{0,6})$/'
That will give you from 1 to 3 letters and from 3 to 9 total characters.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.