I need to write a regex statement for street addresses. It's for a class so it doesn't have to be too fancy. Basically, I want it to accept a) a group of numbers, b) a space, c) a street name, either starting with a letter or number, and d) anything after that.
So far, this is what I have:
^\d+\s[0-9a-zA-Z]*
I'm using the example 123 Sesame Street. It accepts 123 Sesame, but doesn't match Street, or, in other words, d) anything after that.
Thanks!
You forgot a space in street character list.
^\d+\s[0-9a-zA-Z ]* // or \s instead of space
^
Agreed you won’t learn anything by having someone give you the answer.. this guide is really fantastic:
https://www.princeton.edu/\~mlovett/reference/Regular-Expressions.pdf
But, With that said.. in its simplest form to accomplish your goal:
^[0-9]+\s.*$
But what if you lived at 147B N. Henderson Way? This wouldn’t match your criteria.
Related
I have tried the one on the .gov website, as stated on many questions here, but it doesnt seem to work for short postcodes.
My regex:
preg_match('^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))^', $this->post['location'], $matches)
When I use a long postcode of format: AA9 9ZZ it works, but one of format AA9 doesnt. I need the following formats to work:
AA9
AA99
AA9 9ZZ
AA99 9ZZ
According to the pattern you have given and making the second part optional, I obtain:
~^(?:gir(?: *0aa)?|[a-pr-uwyz](?:[a-hk-y]?[0-9]+|[0-9][a-hjkstuw]|[a-hk-y][0-9][abehmnprv-y])(?: *[0-9][abd-hjlnp-uw-z]{2})?)$~i
demo
or to make it more readable:
~ # pattern delimiter
^ # start of the string anchor
(?: # branch 1
gir
(?:[ ]*0aa)? # second part optional (branch 1)
| # branch 2
[a-pr-uwyz] # I put it in factor to shorten the pattern
(?:
[a-hk-y]?[0-9]+
|
[0-9][a-hjkstuw]
|
[a-hk-y][0-9][abehmnprv-y]
)
(?:[ ]*[0-9][abd-hjlnp-uw-z]{2})? # second part optional (branch 2)
)
$ # end of the string anchor
~ix
^[A-Z]{1,2}\d{1,2}(?:(?: )?\d[A-Z]{2})?$ is the pattern I could come up with and it seems to work.
This could probably be improved, though. I'm not a Regexpert.
Here's a live example
Due to the way UK postcodes work, validating them by using regex is not a foolproof solution. Two of the main problems are:
Royal Mail could change the format of some postcodes, adding sub-districts that isn't covered by the regex you choose to use.
Validating by regex only ensures the postcode is in a valid format according to your regex rules, not that it's a postcode that exists as part of an address.
Royal Mail provide a PAF Database, which contains all UK addresses, including postcodes. Many companies have exposed this data through APIs and website plugins.
For example, I work for a company called PCA Predict, and we have a demo of our solution in action. It's a plugin for online checkout forms that allows a customer to start typing any part of their address, and will auto fill the fields when they select their's.
We also offer REST APIs to silently validate and return addresses.
Please feel free to comment if you need any help with address validation, as it can be much harder than it initially looks! I'm also not saying you should use our services, but give it a go, and have a Google about other options as well.
I have had the same problem and it is hard to validate a postcode, Royal Mail Add and Remove postcodes quite frequently. so for the past 2 weeks I have been building an address database and have created a very nasty looking API for free you can validate Postcode and it will return every address for that postcode.
I hope this is helpful.
https://www.pervazive.co.uk/free-api-for-uk-postcode-lookup/
Endpoint
GET: https://api.pervazive.co.uk/postcode.php?postcode=[POSTCODE]
Response
Request GET: -> https://api.pervazive.co.uk/postcode.php?postcode=AB10+1AB
Return Format: JSON
{"predictions":[
{"ID":"0","Address":"Aberdeen City Council, Marischal College, Broad Street, Aberdeen, Aberdeenshire, AB10 1AB","Postcode":"AB10 1AB"}
],"Execution_Time":"0.50983214378357","status":"200"}
I am trying to match street addresses containing the street and number.
I need the expression to match words for the street name, followed by the number.
For example I want to match "somestreet 25", "some other street 23","a-third street 190", but not "a_fourth street 67".
I am trying with it for an hour but I am not even close to good with regex's.
So far all I've got is /^[a-zA-Z]+([\s][a-zA-Z]+)([\s][0-9]){1,4}$/ but needless to say, it is not working.
--- EDIT ---
I understand that there is no standard, global way of writing the street address, and that regular expressions can't really be complicated enough to cover the problem on a global scope, but the site is for a local restaurant, and all I want is the address to look like it could be an address (even then, without map and telephone verification it could still be a fake one).
There will, however, be human verification at all times before anything is sent, and also it is a rather small neighborhood, so both the delivery person and the restaurant owner know if the order is fake or not.
All I want is to keep them from getting spammed with silly !##$ characters in the address, and have a decent readable address formatting for them to work with.
This should work on your examples:
/^[a-zA-Z]([a-zA-Z-]+\s)+\d{1,4}$/
You've overcomplicated it a little bit. This is a case-insensitive expression that looks for letters with hyphens and spaces, followed by numbers, matching your stated criteria.
/^([a-z- ]+)\s+([0-9]+)$/i
But what about me? I live on 30th Ave.
By the way, I used [0-9]+ for one or more numbers at the end, instead of your {1,4} range. If you must not have more than 4, then switch it back to your range {1,4}.
This will do
/^([A-Z][-A-Z ]+)\s+(\d+)$/i
I think street names have no regular formation. So Regular Expression is not applicable for this
I know it can be done for bad words (checking an array of preset words) but how to detect telephone numbers in a long text?
I'm building a website in PHP for a client who needs to avoid people using the description field to put their mobile phone numbers..(see craigslist etc..)
beside he's going to need some moderation but i was wondering if there is a way to block at least the obvious like nnn-nnn-nnnn, not asking to block other weird way of writing like HeiGHT*/four*/nine etc...
Welcome to the world of regular expressions. You're basically going to want to use preg_replace to look for (some pattern) and replace with a string.
Here's something to start you off:
$text = preg_replace('/\+?[0-9][0-9()\-\s+]{4,20}[0-9]/', '[blocked]', $text);
this looks for:
a plus symbol (optional), followed by a number, followed by between 4-20 numbers, brackets, dashes or spaces, followed by a number
and replaces with the string [blocked].
This catches all the obvious combinations I can think of:
012345 123123
+44 1234 123123
+44(0)123 123123
0123456789
Placename 123456 (although this one will leave 'Placename')
however it will also strip out any succession of 6+ numbers, which might not be desirable!
To do so you must use regular expressions as you may know.
I found this pattern that could be useful for your project:
<?php
preg_match("/(^(([\+]\d{1,3})?[ \.-]?[\(]?\d{3}[\)]?)?[ \.-]?\d{3}[ \.-]?\d{4}$)/", $yourText, $matches);
//matches variable will contain the array of matched strings
?>
More information about this pattern can be found here http://gskinner.com/RegExr/?2rirv where you can even test it online. It's a great tool to test regular expressions.
preg_match($pattern, $subject) will return 1 (true) if pattern is found in subject, and 0 (false) otherwise.
A pattern to match the example you give might be '/\d{3}-\d{3}\d{4}/'
However whatever you choose for your pattern will suffer from both false positives and false negatives.
You might also consider looking for words like mob, cell or tel next to the number.
The fill details of the php pattern matching can be found at http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
Ian
p.s. It can't be done for bad words, as the people in Scunthorpe will tell you.
I think that use a too tight regular espression would lead to loose a great number of detections.
You should check for portions of 10 consecutive chatacters containing more than 5 digits.
So it is similar you will have an analisys routine queued to be called after any message insertion due to the computational weight.
After the 6 or more digits have been isolated replace them as you prefer, including other syblings digits.
Better in any case to preserve original data, so you can try and train your detection algorithm until it works the best way.
Then you can also study your user data to create more complex euristics, such like case insensitive numbers written as letters, mixed, dot separated, etc...
It's not about write the most perfect regex, is about approaching the problem statistically and dinamically.
And remember, after you take action, user will change their insertion habits as consequence, so stats will change and you will need to learn and update your euristics.
I need to segment text using fullstops using PHP/Javascript.The problem is if I use "." to split text then abbreviations , date formatting (12.03.2010 ) or urls as well split-ed , which I need to prevent.There are many such possibilities , I might not be able to imagine.
How to recognize that the "." is used as fullstop and nothing else ?
When I googled I found about SRX http://www.lisa.org/fileadmin/standards/srx20.html , is any opensource PHP project segment text using these rules ?
I can do with any Linux based command line utility as well unless it is not paid.
This issue deals with cases where segment is breaking with a dot (.) as it is considered as Fullstop.We need to distinguish between a dot(.) and a Fullstop
Cases where . are not fullstops :
http://www.yahoo.com'>it is a good link. i liked it - only one valid fullstop
This is a test case. Lets try it no valid fullstop
http://www.yahoo.com'>Testing is done by amold12#…. - no valid fullstop
Mr. Abc is in town today - no valid fullstop
S. Khan had done it - no valid fullstop
The U.S. is emerging from a recession. - no valid fullstop
As for as code is concerned - I am using javascript text.split(".") method
Thanks
Human language is quirky. Whatever rules you come up with some corner case is likely to defeat you. How important is it that you are 100% accurate? Would missing the occasional full stop really matter? Or would being a tad too aggressive really matter. If your objective is (for example) to come up with some statistical anlysis of sentance length in published material, then I doubt that some over or under counting would be crucial.
My suggestion would be to look for patterns such as
full-stop space(s) Capital letter
full-stop quote
full-stop new line
Run that across your sample text and see what anomalies remain.
Your's sincerely, David J. N. Artus. (not a complete sentance yet because I didn't use a . in that way, and that previous . isn't one either. But that last . was.
Using PHP, how can I verify if a phone # is well formed?
It seems easiest to simply strip all non-numeric data, leaving only the numbers. Then to check if 10 digits exist.
Is this the best and easiest way?
The best? No. Issues I see with this approach:
Some area codes - like 000-###-#### - are not valid. See http://en.wikipedia.org/wiki/List_of_NANP_area_codes
Some exchanges - like ###-555-#### - are not valid. See http://en.wikipedia.org/wiki/555_%28telephone_number%29
Some people will enter a 1 before their number, i.e. 1-###-###-####.
Some people are only reachable at an extension, like ###-###-#### x####.
Some companies tack on extra digits, like 1-800-GO-FLOWERS. The additional digits are simply ignored by the phone system, but a user might expect to be able to enter the whole thing.
International phone numbers are not necessarily 10 digits, even if you discount the country codes.
Good enough? Quite possibly, but that's up to you and your app.
You can use a regex for it:
$pattern_phone = "|^[0-9\+][0-9\s+\-]*$|i";
if(!preg_match($pattern_phone,$phone)){
//Somethings wrong
}
Haven't tested the regex, so it may not be 100% correct.
Checking for 10 digits after stripping will check the syntax but won't check the validity. For that you'd need to determine what valid numbers are available in the region/country and probably write a regex to match the patterns.
The problem with validating/filtering data like this usually comes down the the answer to this question: "How strict do I want to be?" which then devolves into a series of "feature" questions
Are you going to accept international numbers?
Are you going to accept extensions?
Are you going to allow various formats i.e., (111) 222-3333 vs 111.222.3333
Depending on your business rules, the answers to these questions can vary. But to be the most flexible, I recommend 3 fields to take a phone number
Country Code (optional)
Phone Number
Extension (optional)
All 3 fields can be programmatically limited/filters for numeric values only. You can then combine them before storing into some parse-able format, or store each value individually.
Answering if something is "the best" thing to do, is nearly impossible (unless you're the one answering your own question).
The way you propose it, stripping all non-digits and then check if there are 10 digits, might result in unwanted behaviour for a string like:
George Washington (February 22, 1732 –
December 14, '99) was the commander
of the Continental Army in the
American Revolutionary War and served
as the first President of the United
States of America.
since stripping all non-digits will result in the string 2217321499 which is 10 fdigits long, but I highly doubt that the entire string should be considered as a valid phone number.
What format you need? You can use regular expressions to this.