I am having some trouble trying to figure out how to parse information collected from user. The information I am collecting is:
Age
Sex
Zip Code
Following are some examples of how I may receive this from users:
30 Male 90250
30/M/90250
30 M 90250
M 30 90250
30-M-90250
90250,M,30
I started off with explode function but I was left with a huge list of if else statements to try to see how the user separated the information (was it space or comma or slash or hypen)
Any feedback is appreciated.
Thanks
It's easy enough. The ZIP code is always 5 digits, so a simple regex matching /\d{5}/ will work just fine. The Age is a number from 1 to 3 digits, so /\d{1,3}/ takes care of that. As for the gender, you could just look for an f for female and if there isn't one assume male.
With all that said, what's wrong with separate input fields?
You might want to use a few regular expressions:
One that looks for 5 numeric digits: [^\d]\d{5}[^\d]
One that looks for 2 numeric digits: [^\d]\d{2}[^\d]
One that looks for a single letter: [a-zA-Z]
[EDIT]
I've edited the RegExes. They now match every one of the presented alternatives, and don't require any alteration of the input string (which makes it a more efficient choice). They can also be run in any order.
Related
Good afternoon, I am creating a form with a number reservation field, for the user to pick. The system i had would only allow the reservation number by number, example: 1,2,3 and it would book the numbers 1,2 and 3.
Now i would like to add the option to book several numbers at once, example: 1-5,9,10and in this case it would book numbers 1 to 5, 9 and 10.
I'm using the following regex code, but it's not working as I want
^\d{1,5}(?:-\d{1,5})*(?:,\d{1,5})*(?:,\d{1,5}-\d{1,5})*(?:-\d{1,5},\d{1,5})*$
The problem with this code is whenever the user inserts two 1-3,4-6 it only allows one more number. For example 1-3,4-6,2,3 shows error when the ,3 is inserted.
There is also a problem where it allows to write several dashes without commas
example 1-3-6-8-9
perhaps something like this:
\A\d{1,5}(?:-\d{1,5})?(?:,\d{1,5}(?:-\d{1,5})?)*\z
The idea:
the range is optional (?:-\d{1,5})? (and follows the first number)
The group, that contains a comma followed by a number or a range, can occur zero or more times
Note that a problem can't be solved by regex since 6-4 or 1-5,2,3,4 are always possible. So you will need sooner or later to explode the string and to check if numbers and ranges are coherent.
This code to get all sequences of 8 integers works fine:
preg_match_all('/[0-9]{8}/', $string, $match);
However I am only interested if the match starts with 20.
I know I have to add ^20 somewhere but I have tried many times with no success. I have looked at many regex tutorials but none of them seems to explain how to do 2 separate searches.
I am actually trying to parse ICAL files to extract the dates. If the 8 digit integer starts with 20 it almost certainly is a date.
For example: DTSTART:20150112T120000Z
How about this solution:
/(20)\d{6}/
This will probably find what you are looking for:
(?=20)(\d{8})
It does a positive lookahead to capture a group if it starts with 20 along with a 8 digit number.
The answer highly depends on what you want to achieve. Do you want to extract all and any dates from an icalendar file. If so, you might be missing birthday dates as their year are most likely to be starting with 19xx.
Also matching any dates will yield most likely many undesired dates like UNTIL, TRIGGER, DTEND, ...
Assuming from your example you want to extract events start dates, you could try:
DTSTART[a-zA-Z._%+-/=;]*:(\d){8}[T]?[\d]{6}
To be kept in mind: following DTSTART can be a timezone definition like TZID=America/New_York and/or the type definition DATE or DATE-TIME (see RFC5545 DATE-TIME
I am currently working with an Oracle database trying to validate phone numbers within my PHP code. I have one column "PHONE 1" that contains a string phone number. This phone number may contain a leading country code "1" or a trailing phone extension (usually 4 digits). I need PHONE 1 to only contain the 10 digit phone number and then if it has country code or extension, I need to remove them and store them in separate columns which are currently empty within my Oracle database ("PHONE 1 COUNTRY CODE" and "PHONE 1 EXTENSION"). I have found a way to remove the leading country code, but I am not sure how to remove the trailing extension. I looked into possibly using the explode() function but cannot figure it out. Here is my code that I am using to remove the leading 1:
while($row=oci_fetch_array($array, OCI_ASSOC+OCI_RETURN_NULLS)){
//VALIDATE PHONE NUMBERS AND COUNTRY CODES
if(isset($row["PHONE 1"])){
if (strlen($row["PHONE 1"])>10){
$row["PHONE 1"] = preg_replace("/^1/", '',$row["PHONE 1"]);
$row["PHONE 1 COUNTRY CODE"]="1";
if(strlen($row["PHONE 1"])>10){
//insert code here that will remove the extension and add it to the column $row["PHONE 1 COUNTRY CODE"]
}
}
}
I think that using adding the second if statement within the other will be the easiest way to remove the extension. Essentially, this should say if there are more than 10 digits, remove the leading 1, and then if there are still more than 10 digits, remove the trailing extension. I just need to figure out how to code the latter. Any input on how to improve my current code or add the new one will be appreciated.
How you handle the phone numbers varies, depending on what you're starting with and what you want to end up with.
A good strategy for this type of problem is to "divide & conquer".
Divide your data into two groups - for example, those with extensions & those without.
Take the ones with extensions and divide them further, say into groups like "8005551212x100" (ie. with an 'x' denoting the extension) and others. Now you know how to handle the first group - split the string on the 'x', put the phone number into one column in your database and the extension into another. Now you have a few more phone numbers (with a variety formats, probably), but the extensions have been taken care of.
Methods for handling the phone numbers include:
explode - good for separating strings with clearly defined delimiters. Eg. explode("x", "8005551212x100") == array("8005551212", "100")
substr - good for strings where specific information appears at specific locations in the string. Eg. substr("8005551212",3,3) == "555"
regular expressions - can be good for variable data. Eg. for phone numbers where the extension is delimited by "x" or "ext" or "extension": preg_match("/^[0-9]+(x|ext|extension)[0-9]+$/", "8005551212ext100"). Be careful, though, to resist the temptation to write one regular expression that covers ALL types of numbers in your database - it's probably possible, but you're drive yourself crazy trying to write & debug an insanely long regular expression. That's why I suggest dividing your data up into groups of similar formats.
I am asking help with php code to manipulate a string that I am retrieving from an SQL database. The string is in this format: Groups of 3-5 alphanumerics separated by periods. The number of alphanumeric groups in the string is quite variable (from 1 - 20 or more groups).
Example 1: J89.NEWTT.IIU.MZZ.OXI.
Example 2: ORD6.BAE.J89.DLH.YRL.N5500.W9700.NUGSM.N6500.
I need to do 2 things with these:
separate into a new string just the first 3 groups between periods (in Example 1, results would be "J89.NEWTT.IIU")
separate into a new string just the last 3 groups between periods (in Example 2, results would be "W9700.NUGSM.N6500")
I'm having trouble getting the usual players to work with this. Thanks for any help!
Split it (explode), slice out the parts you want, then join them back together with . again.
This is a strange format and picking the first/last three chunks seems really arbitrary. Is it structured data? If so, why are you storing it in a single field in your database?
I have two strings that I need to pull data out of but can't seem to get it working. I wish I knew regular expression but unfortunately I don't. I have read some beginner tutorials but I can't seem to find an expression that will do what I need.
Out of this first string delimited by the equal character, I need to skip the first 6 characters and grab the following 9 characters. After the equal character, I need to grab the first 4 characters which is a day and year. Lastly for this string, I need the remaining numbers which is a date in YYYYmmdd.
636014034657089=130719889904
The second string seems a little more difficult because the spaces between the characters differ but always seem to be delimited by at minimum, a single space. Sometimes, there are as many as 15 or 20 spaces separating the blocks of data.
Here are two different samples that show the space difference.
!!92519 C 01 M600200BLNBRN D55420090205M1O
!!95815 A M511195BRNBRN D62520070906 ":%/]Q2#0*&
The data that I need out of these last two strings are:
The zip code following the 2 exclamation marks.
The single letter 'M' following that. It always appears to be in a 13 character block
The 3 numbers after the single letter
The next 3 numbers which are the person's height
The following next 3 are the person's weight
The next 3 are eye color
The next block of 3 which are the person's hair color
The last block that I need data from:
I need to get the single letter which in the example appears to be a 'D'.
Skip the next 3 numbers
The last and remaining 8 numbers which is a date in YYYYmmdd
If someone could help me resolve this, I'd be very grateful.
For the first string you can use this regular expression:
^[0-9]{6}([0-9]{9})=([0-9]{4})([0-9]{4})([0-9]{2})([0-9]{2})$
Explanation:
^ Start of string/line
[0-9]{6} Match the first 6 digits
([0-9]{9}) Capture the next 9 digits
= Match an equals sign
([0-9]{4}) Capture the "day and year" (what format is this in?)
([0-9]{4}) Capture the year
([0-9]{2}) Capture the month
([0-9]{2}) Capture the date
$ End of string/line
For the second:
^!!([0-9]{5}) +.*? +M([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
Rubular
It works in a similar way to the first. You may need to adjust it slightly if your data is not exactly in the format that the regular expression expects. You might want to replace the .*? with something more precise but I'm not sure what because you haven't described the format of the parts you are not interested in.