Extract house numbers from address string - php

I am importing user data from a foreign database on demand. While i keep house numbers separate from the street names, the other database does not.
I use
preg_match_all('!\d+!')
To rule out the numbers. This works fine for an addressline like this:
streetname 60
But it does not work for an addressline like this:
streetname 60/2/3
In that case i end up extracting 60, and /2/3 stay in the name of the street.
Unfortunately i am not a regex expert. Quite to the contrary. My problem is that i need to be able to not only detect numerics, but also slashes and hyphens.
Can someone help me out here?

Try:
preg_match_all('![0-9/-]+!', 'streetname 60/2/3', $matches);

to give a definite answer we would have to know the patterns in your data.
for example, in Germany we sometimes have house numbers like 13a or 23-42, which could also be written as 23 - 42
one possible solution would be to match everything after a whitespace that starts with a digit
preg_match_all('!\s(\d.*)!', 'streetname 60/2/3', $matches);
this would produce false positives, though, if you have American data with streets like 13street

This approach does not use Regex. Will only return when it sees the first number, exploded by space. Ideal for addresses like e.g. 12 Street Road, Street Name 1234B
function getStreetNumberFromStreetAddress($streetAddress){
$array = explode(' ',$streetAddress);
if (count($array) > 0){
foreach($array as $a){
if (is_numeric($a[0])){
return $a;
}
}
}
return null;
}

Related

Stripping down Phonenumber (mobile)

Is there a function or a easy way to strip down phone numbers to a specific format?
Input can be a number (mobile, different country codes)
maybe
+4917112345678
+49171/12345678
0049171 12345678
or maybe from another country
004312345678
+44...
Im doing a
$mobile_new = preg_replace("/[^0-9]/","",$mobile);
to kill everything else than a number, because i need it in the format 49171 (without + or 00 at the beginning), but i need to handle if a 00 is inserted first or maybe someone uses +49(0)171 or or inputs a 0171 (needs to be 49171.
so the first numbers ALWAYS need to be countryside without +/00 and without any (0) between.
can someone give me an advice on how to solve this?
You can use
(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)
to match most of your cases and simply replace them with nothing. For example:
$mobile = "+4917112345678";
$mobile_new = preg_replace("/(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)/","",$mobile);
echo $mobile_new;
//output: 4917112345678
regex101 Demo
Explanation:
I'm making use of OR here, matching each of your cases one by one:
(?:^(?:00|\+|\+\d{2})) matches 00, + or + followed by two numbers at the beginning of your string
\/ matches a / anywhere in the string
\s matches a whitspace anywhere in the string (it matches the newline in the regex101 demo, but I suppose you match each number on its own)
\(\d\) matches a number enclosed in brackets anywhere in the string
The only case not covered by this regex is the input format 01712345678, as you can only take a guess what the country specific prefix can be. If you want it to be 49 by default, then simply replace each input starting with a single 0 with the 49:
$mobile = "01712345678";
$mobile_new = preg_replace("/^0/","49",$mobile);
echo $mobile_new;
//output: 491712345678
This pattern (49)\(?([0-9]{3})[\)\s\/]?([0-9]{8}) will split number in three groups:
49 - country code
3 digits - area code
8 digits - number
After match you can construct clean number just concatnating them by \1\2\3.
Demo: https://regex101.com/r/tE5iY3/1
If this not suits you then please explain more precisely what you want with test input and expected output.
I recommend taking a look at LibPhoneNumber by Google and its port for PHP.
It has support for many formats and countries and is well-maintained. Better not to figure this out yourself.
https://github.com/giggsey/libphonenumber-for-php
$phoneUtil = \libphonenumber\PhoneNumberUtil::getInstance();
$usNumberProto = $phoneUtil->parse("+1 650 253 0000", "US");

PHP RegEx pattern matching - beginner questions to get me started

This is a homework assignment and my first experience using RegEx. I am starting to grasp the syntax and symbols used and can do some simple pattern matching/manipulation, but can't quite foresee how to achieve some of the goals of this assignment.
I have been given a text file that is formatted like this:
Steve Blenheim:238-923-7366:95 Latham Lane, Easton, PA 83755:11/12/56:20300
Betty Boop:245-836-8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500
Igor Chevsky:385-375-8395:3567 Populus Place, Caldwell, NJ 23875:6/18/68:23400
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
There are about 50 lines of names and corresponding info, each entry is on a new line and each 'field' is separated by a colon. Mostly I need to find specific things from the file and print them on a webpage but I don't quite understand.
Here is one problem I solved:
$myFile = "datebook.txt";
$data = file($myFile);//I have used this to place all data in an array, but it may be necessary to place the data into a string?
//1) Print all lines containing the pattern Street (case insensitive).
$pattern = "/street/i";
$linesFound = preg_grep($pattern, $data);
echo "<pre>", print_r($linesFound, true), "</pre>";
Here are some I have not and specific questions regarding them:
2) Print the first and last names in which the first name starts with a letter ‘B’.
How do I only search for first names and not last names, city names, etc?
How do I print the full name and only the full name?
5) Print Lori Gortz’s name and address.
I understand how to find the pattern 'Lori Gortz' but how do I return her address as well?
11) Print lines that end in exactly five digits.
12) Print the file with the first and last names reversed.
14) Give everyone a $250.00 raise.
Don't know how to do any of these. I assume the last number for each entry is their salary.
Any help is appreciated. Please respond with an explanation of the code as well, thank you.
Check the RegEx quick reference, I think you'll figure out most of your tasks there. For example, Lori's address would be a string after the number after the second colon and before the second coma (in her line, of course).
The best way to do all the task would be to go over each line and make an array with all the elements. That way you could easy replace names, increase salaries, check if it ends with 5 digits, etc.
You can also try this online tester. Good luck.
Edit:
Little help for a start:
^[A-z ]* this gets full names
^[A-z]* this gets first names
etc...
Edit2:
See what this code does:
$line = "Betty Boop:245-836-8357:635 Cutesy Lane, Hollywood, CA 91464:6/23/23:14500";
$regex = "/\s|:/";
$result = preg_split($regex, $line);
:)
I don't want to do all of them, but here's some hints.. For question 2:
^[A-Z]* B.*$
^ basically means a new line.
[A-Z]* means any number of characters from A-Z
Next we match a space
Next we match a B
The .* means any number of other characters.
Lastly, we match with an end of line using $
This can definitely be improved and made more flexible, but I'll let you do that..

compare array values

I have a string like this, which I need to extract the address from:
$string="xyz company 7 th floor hotel yyyy 88 main Road mumbai 400000 this is sample comapny address 9456 and some other";
$word=str_word_count($string,1,'0...9');
Now word has each word like word[0]=xyz, word[1]=company, word[2]=7, etc.
I need to compare each value. If the word is a number then I want to save it in a temp variable until I get another number.
For example word[2] is 7, so I need to save the values from then until 88 in a temp variable. So the temp should contain "7 th floor hotel yyyy 88".
If the temp variable has fewer than 25 characters then we compare until we get another number. So here we need to keep going from 88 to 400000 and append that to the temp variable.
The temp should finally look like this: "7 th floor hotel yyyy 88 main Road mumbai 400000"
Any help please?
The question was already asked here, where I responded. Although preg_match does not follow your thought process, it accomplishes the result you're looking for. The only change you've made between that question and this one is the 25 character restriction. This can easily be resolved by accepting 25 characters of any type before checking for the terminating number:
preg_match('/[0-9]+.{0,25}[^0-9]*[0-9]+\s/',$string,$matches);
return $matches[0];
There is no need to use str_word_count. If you insist on using it, say so in the comments and we can try to accommodate a solution using your thought process. However, preg_match is likely the most efficient way of accomplishing the whole task.
Try using preg_match_all():
if (preg_match_all('!(?<=\b)\d\b+.*\b+\d+(?<=\b)!', $string, $matches)) {
echo $matches[0][0];
}
What this is doing is testing for a sequence of numbers followed by any number of characters followed by another sequence of numbers. The expressions are greedy so the middle pattern (.*) should grab as many as possible meaning you'll be grabbing from the first to the last sets of digits.
There is a lookahead and lookbehind in there to check to see if the numbers are on word boundaries. You may or may not need this and you may or may not need to tweak it depending on your exact requirements.
The above works on the whole string.
If you must (or just prefer) to operate on the words:
$start = false;
$last = false;
$i = 0;
foreach ($words as $word) {
if (is_numeric($word)) {
if ($start === false) {
$start = $i;
}
$last = $i;
}
$i++;
}
$word_range = $words;
array_splice($word_range, $start, $last - $start + 1);
$substring = implode(' ', $word_range);

Regex - Return First and Last Name

I'm looking for the best reliable way to return the first and last name of a person given the full name, so far the best I could think of is the following regular expression:
$name = preg_replace('~\b(\p{L}+)\b.+\b(\p{L}+)\b~i', '$1 $2', $name);
The expected output should be something like this:
William -> William // Regex Fails
William Henry -> William Henry
William Henry Gates -> William Gates
I also want it to support accents, for instance "João".
EDIT: I understand that some names will not be properly identified, but this isn't a problem for me, since this is going to be used on a local site where the last word is the last name (might not be the whole surname though) but this isn't a problem since all I want is a quick way to say "Dear FIRST_NAME LAST_NAME"... So all this discussion, while totally valid, is useless to me.
Can someone help me with this?
This might not be what you want to hear, but I don't think this problem is suited to a regular expression since names are not regular. I don't think they are even context-sensitive or context-free. If anything, they are unrestricted (I would have to sit down and think that through more than I did before I say that for sure, though) and no regular expression engine can parse an unrestricted grammar.
Instead of a regex you might find it easier to do something like:
$parts = explode(" ", $name);
$first = $parts[0];
$last = ""
if (count($parts) > 1) {
$last = $parts[count($parts) - 1];
}
You might want to replace multiple consecutive bits of whitespace with a single space first, so you don't get empty bits, and get rid of trailing/leading whitespace:
$name = ereg_replace("[ \t\r\n]+", " ", trim($name));
As is, you're requiring a last name -- which, of course, your first example doesn't have.
Use clustered grouping, (?:...), and 0-or-1 count, ?, for the middle and last names as a whole to allow them to be optional:
'~\b(\p{L}+)\b (?: .+\b(\p{L}+)\b )?~ix' # x for spacing
This should allow the first name to be captured whether middle/last names are given or not.
$name = preg_replace('~\b(\p{L}+)\b(?:.+\b(\p{L}+)\b)?~i', '$1 $2', $name);
Depending on how clean your data is, I think you are going to have a tough time finding a single regex that does what you want. What different formats do you expect the names to be in? I've had to write similar code and there can be a lot of variations:
- first last
- last, first
- first middle last
- last, first middle
And then you have things like suffixes (Junior, senior, III, etc.) and prefixes ( Mr., Mrs, etc), combined names (e.g. John and Mary Smith). As some others have already mentioned you also have to deal with multi-part last names (e.g. Victor de la Hoya) as well.
I found I had to deal with all of those possibilities before I could reliably pull out the first and last names.
If you're defining first and last name as the text before the first space and after the last space, then just split the string on spaces and grab the first and last elements of the array.
However, depending on the context/scope of what you're doing, you may need to re-evaluate things - not all names around the world will meet this pattern.
I think your best option is to simply treat everything after the first name as the surname i.e.
William Henry Gates
Forename: William
Surname: Henry Gates
Its the safest mechanism as not everyone will enter their middle name anyway. You can't simply extract William - ignore Henry - and extract Gates as for all you know, Henry is part of the Surname.
Here is simple non regex way
$name=explode(" ",$name);
$first_name=reset($name);
$last_name=end($name);
$result=$first_name.' '.$last_name;

How do you format a 10 digit string into a phone number?

I have database records in the form of 10 character long strings, such as 4085551234.
I wish to format these into this format: (408) 555-1234.
I think this is regex related. I'm new to programming and completely self-taught here, so any sort of resource relating to performing text processing would be appreciated as well. Thanks!
A regex is definitely overkill for this one. If you wanted to take a "phone number" and normalize it to 10 digits, that would be a good use for a regex. To do what you're asking, just do something like:
echo '('.substr($data, 0, 3).') '.substr($data, 3, 3).'-'.substr($data,6);
Since you already know how to divide up your data, you can just use substr or something similar to grab the parts you want. RegEx is useful for matching strings which don't always have a strict format. (Like variable numbers of spaces, variable stuff before or after it, extra dashes, etc). But in your case the input is always strictly formatted 10 digits, nothing else, so you don't need the extra overhead of a RegEx to format it.
Take a look here: Format phone number
function format_phone($phone)
{
$phone = preg_replace("/^\d/", "", $phone);
if(strlen($phone) == 7)
return preg_replace("/(\d{3})(\d{4})/", "$1-$2", $phone);
elseif(strlen($phone) == 10)
return preg_replace("/(\d{3})(\d{3})(\d{4})/", "($1) $2-$3", $phone);
else
return $phone;
}
I'd probably go with
$num = "4085551234"; // given
$formatted = "(".substr($num,0,3).") ".substr($num,3,3)."-".substr($num,6);
Regex isn't really appropriate here.
Trivially you could do something like:
\(\d\{3\}\)\(\d\{3\}\)\(\d\{4\}\)
To match the 10 digits into 3 subgroup expressions, and then print them out using each subgroup:
"(\1) \2-\3
But in practice free form data is usually a little trickier
I had to do this question for my advanced placement computer science class.
Java:
Write a program that accepts a 10 digit # and formats it as a phone number.
Ex: 705726552
Output: (705)726-2552
import java.util.Scanner;
public class TelNumCorrection{
public static void main(String[]args){
Scanner scan = new Scanner(System.in);
System.out.println("Please enter a 10 digit number");
String num=scan.nextLine();
String a=num.substring(0,3);
String b=num.substring(3,6);
String c=num.substring(6);
System.out.println("("+a+ ")"+b+"-"+c);
}
}

Categories