Find and replace emails and phone numbers in PHP - php

I was hoping for a little help on this, as it's confusing me a little... I run a website that allows users to send messages back and forth, but on the inbox i need to hide both emails and phone numbers.
Example: This is how a sample email would look like.
Hi, my phone is +44 5555555 and email is jack#jack.com
I need it to be like this:
Hi, my phone is (phone hidden) and email is (email hidden)
Do you have any ideas ?... I really appreciate it!..

$x = 'Hi, my phone is +44 5555555 and email is jack#jack.com';
$x = preg_replace('/[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/i','(phone hidden)',$x); // extract email
$x = preg_replace('/(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?/','(email hidden)',$x); // extract phonenumber
echo $x; // Hi, my phone is (phone hidden) and email is (email hidden)
kudo's for the phonenumber regex to fatcat

Trying to do this with 100% accuracy when users can type all sorts of things in is impossible - you can't really definitively say if a substring is a phone number or just another number, or an email address or just something that could be a valid one.
However, if you want to try, you should probably use a regular expression. See http://php.net/manual/en/function.preg-replace.php

<pre>
/*
* first par is given string
* second per is replace string like ****
* return result
*/
function email_phone_validation_replace_php($str='',$rep='*******') {
$str = preg_replace('/[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/i',$rep,$str); // extract email
$str = preg_replace('/(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?/',$rep,$str); // extract phonenumber
return $str; // banarsiamin#gmail.com
}
$str = 'Hi, my name is amin khan banarsi <br> phone is +91 9770534045 and email is banarsiamin#gmail.com';
echo email_phone_validation_replace_php($str);
</pre>

If I understand correctly, your users can send messages to each other and you're worried that if they send a message with personal information in it that information might be too visible.
I guess that you're therefore trying to remove this information from the message's preview (but still have it available if you open the message?).
If this is the case then you can have a really sloppy regular expression removing anything that looks even a little bit like a number or email. It doesn't matter if you hide non-personal information because the non-censored version of the message is always available.
I would go with something like this (untested):
# Take any string that contains an # symbol and replace it with ...
# The # symbol must be surrounded by at least one character on both sides
$message = preg_replace('/[^ ]+#[^ ]+/','...',$message); # for emails
# Take any string that contains only numbers, spaces and dashes, replace with ...
# Can optionally have a + before it.
$message = preg_replace('/\+?[0-9\- ]+/','...',$message); # for phone numbers
This is going to match lots of things, more than just emails and phone numbers. It may also not match emails and phone numbers that I didn't think of, this is one of the problems with writing regular expressions for these kinds of things.

If you want to hide email and phone numbers from your Messages or Chat in PHP or any other language. You need to use regular expressions, read about regex on w3school.
I have an easy and complete solution for you.
<?php
$regex_email = '/[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})/';
$regex_phone = "/[0-9]{5,}|\d[ 0-9 ]{1,}\d|\sone|\stwo|\sthree|\sfour|\sfive|\ssix|\sseven|\seight|\snine|\sten/i";
$str = " Hello My Email soroutlove1996#gmail.com AND Phone No. is +919992799999 and +91 9992799999, or 9 9 9 2 7 9 9 9 9 9 and Nine Nine";;
$str = preg_replace($regex_email,'(email hidden)',$str); // extract email
$str = preg_replace($regex_phone,'(phone hidden)',$str); // extract phone
echo $str;
Output: Hello My Email (email hidden) AND Phone No. is +(phone
hidden) and +(phone hidden), or (phone hidden) and(phone hidden)(phone
hidden)

Related

PHP - Searching a string for phone numbers and emails

i am trying to write a small script to find out whether a given string contains a phone numer and / or email address.
Here is what i have so far:
function findContactInfo($str) {
// Find possible email
$pattern = '/[a-z0-9_\-\+]+#[a-z0-9\-]+\.[a-z]{2,3}?/i';
$emailresult = preg_match($pattern, $privateMessageText);
// Find possible phone number
preg_match_all('/[0-9]{3}[\-][0-9]{6}|[0-9]{3}[\s][0-9]{6}|[0-9]{3}[\s][0-9]{3}[\s][0-9]{4}|[0-9]{9}|[0-9]{3}[\-][0-9]{3}[\-][0-9]{4}/', $text,
$matches);
$matches = $matches[0];
}
The part with the emails works fine but i am open to improvements.
With the phone number i have some problems. First of all, the strings that will be given to the function will most likely contain german phone numbers. The problem with that are all the different formats. It could be something like
030 - 1234567 or 030/1234567 or 02964-723689 or 01718290918
and so on. So basically it is almost impossible for me to find out what combination will be used. So what i was thinking was, maybe it would be a good idea to just find a combination of a minimum of three digits following each other. Example:
$stringOne = "My name is John and my phone number is 040-3627781";
// would be found
$stringTwo "My name is Becky and my phone number is 0 4 0 3 2 0 5 4 3";
// would not be found
The problem i have with that is that i don't know how to find such combinations. Even after almost an hour of searching through the web i can't find a solution.
Does anyone have a suggestion on how to approach this?
Thanks!
You could use
\b\d[- /\d]*\d\b
See a demo on regex101.com.
Long version:
\b\d # this requires a "word boundary" and a digit
[- /\d]* # one of the characters in the class
\d\b # a digit and another boundary.
In PHP:
<?php
$regex = '~\b\d[- /\d]*\d\b~';
preg_match_all($regex, $your_string_here, $numbers);
print_r($numbers);
?>
Problem with this is, that you might get a lot of false positives, so it will certainly improve your accuracy when these matches are cleaned, normalized and then tested against a database.
As for your email question, I often use:
\S+#\S+
# not a whitespace, at least once
# #
# same as above
There are dozens of different valid emails, the only way to prove if there's an actual human being behind one is to send an email with a link (even this could be automated).

Calculating if an email fits within a regex?

So I am trying to determine if someone is using a temporary email made by our system. If a user tries to login with a social account (Twitter / Facebook) and they decline access to email I generate an email for our system which is AccountID#facebook.com or AccountID#twitter.com so an example would be 123456789#facebook.com. This is a temporary email until a user enters a real email. I am trying to compare this using regex.
if (preg_match("/^[0-9]#twitter.com/", Auth::user()->email, $matches)) {
}
However I think my regex is incorrect. How would one check if the format of a string is N Number of digits followed by #twitter.com or #facebook.com
How would one check if the format of a string is N Number of digits followed by #twitter.com or #facebook.com
You can use this regex:
'/^\d+#(?:facebook|twitter)\.com$/'
You are using ^[0-9]# which will allow for only single digit at start. Besides DOT is a special character in regex that needs to be escaped. Also note use of end anchor $ in your anchor to avoid matching unwanted input.
You forget to set ID as MULTIPLE number:
if (preg_match("/^[0-9]+#(twitter|facebook)\.com/", Auth::user()->email, $matches))
{
//Your code here
}

Need to modify the regex for validating the UK Phone

I have a php validation of the UK phone number, inputed by potential clients in my form.
It looks like this:
function phoneUK(phone){
var regex = /^(?:(?:(?:00\s?|\+)44\s?)|(?:\(?0))(?:\d{2}\)?\s?\d{4}\s?\d{4}|\d{3}\)?\s?\d{3}\s?\d{3,4}|\d{4}\)?\s?(?:\d{5}|\d{3}\s?\d{3})|\d{5}\)?\s?\d{4,5})$/i;
return regex.test(phone);
}
The problem is that in this form it only allows phone numbers with a 0 in front followed by ten digits (0XXXX XXXXXX) and since recently there are also legitimate phone numbers with eleven digits after the leading zero.
How to alter the code so these type of phone numbers are allowed?
Try this:
function phoneUK(phone){
var regex = /^(?:(?:(?:00\s?|\+)44\s?)|(?:\(?0))(?:\d{2}\)?\s?\d{4}\s?\d{4}|\d{2}\)?\s?\d{5}\s?\d{4}|\d{3}\)?\s?\d{3}\s?\d{3,4}|\d{3}\)?\s?\d{4}\s?\d{3,4}|\d{4}\)?\s?(?:\d{5}|\d{3}\s?\d{3}|\d{4}\s?\d{3})|\d{5}\)?\s?\d{4,5}|\d{5,6}\)?\s?\d{4,6})$/i;
return regex.test(phone);
}
Hope this helps.

Finding Values Until Regex

I want to extract values from string. here is a sample string
"Sample string Name: Mosa Phone: 020202020 Email: email#domain.com the rest of the sample string"
The phone doesn't necessary to be sequence of numbers it could be like (00-98550-22) or (+025-588) or (92122/222)
The good news is that these fields are always consecutive either separated by tab, white space, or new line.
so I am thinking how can I make it find the fields until the next field is found, so we can say find Name: then continue until you find Phone:
I am trying to achieve this using regex
This is the code I already wrote, but each field is evaluated alone
$namepattern = "/(Name\:\s[a-zA-Z]+\s[a-zA-Z]+)/";
$phonepattern = "/(Phone\:\s\d+)/";
$emailpattern = "/(Email:\s([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})))/";
I'm using this regex to match name, phone and email in one:
/Name:\s(.+)\sPhone:\s(\d+)\sEmail:\s([a-zA-Z0-9_-]+#[a-zA-Z0-9]+.[a-zA-Z]+)/
Here's a demo: http://regex101.com/r/uO4tF7
This will match the needed
/Name:\s(.+)\s*Phone:\s(.*)\s*Email:\s([a-zA-Z0-9_.-]+#[a-zA-Z0-9]+.[a-zA-Z]+)/

Validating telephone, email adress and Name inputs with PHP; Need regexp if there are any available

I have a php script which I need to validate several inputs with.
Is there any reliable and very good regular expression to check against when it comes to telephone nr, name and email adress validation?
Could somebody please supply these as I am very novice in regexp?
What I want is for example:
Telephone Nr: all number allowed, must be atleast 6 numbers, max 12 numbers, '+' sign allowed, space allowed, '-' sign allowed, as well as other things I haven't thought about yet.
Name: No numbers allowed, only characters in both lower and uppercase. Also the three swedish chars 'Å, Ä, Ö' in both lower and uppercase, also space, '-' sign allowed, and all others I havent thought about.
Email: Email adress is pretty standard over the world, so I don't know exactly what to ask for here, but you probably know what I want.
Thanks for all help
As Andrew White said, emails shouldn't be validated [only] by regex, but you can check out this one:
'/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+#((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i'
it's closest to the email address spec I've ever found (no tests have been found which it fails)... can't remember where it's from, will edit my answer as soon as I find it again
[EDIT]
Found it, definitely worth a read: http://fightingforalostcause.net/misc/2006/compare-email-regex.php
[EDIT]
This should do for the phone numbers:
<?php
function is_valid_phonenumber( $subject ) {
// strip all valid chars
$stripped = preg_replace( '{[0-9 +-]}', '', $subject );
// check if there are remains, if yes: fail
if( !empty( $stripped ) )
return false;
// get digit count by replacing everything except digits with nothing
$digits = strlen( preg_replace( '{[^0-9]}', '', $subject ) );
// invalid if less than 6 or more than 12 in length
if( $digits < 6 || $digits > 12 )
return false;
// if nothing fails before this, we're good to go
return true;
}
?>
Similar can be done for the names, but don't forget the case-insensetive flag (i.e. '{pattern}i', there are also some good regex cheat sheets out there, for example this one from addedbytes.com: http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
Validating telephone numbers with a regex is one thing but e-mails should not be validated by regex. I guess if you just wanted a very basic this-is-an-email regex you could use...
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
For telephone I think the following is good for US/Canada...
^[01]?[- .]?\(?[2-9]\d{2}\)?[- .]?\d{3}[- .]?\d{4}$
For names, good luck since names can be just about anything including numbers in some odd cases (Sr. Fracis John 2nd vs II). That all said I recommend you look into library specific validators for each type if it really matters but my PHP is a bit rusty so I don't have a recommendation there.
Have a read of: http://www.regular-expressions.info/email.html which discusses validating email addresses with Regex

Categories