PHP Regular find url and email

PHP Regular find url and email - php

I need to find "http" and "https" and "email" inside text. I have tried:
$regex = "((https|ftp)\:\/\/)"; // http and https
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)"; // email
if(preg_match("/^$regex$/", $comment))
{
$r = 'find';
} else {
$r = 'not find';
}
with the following text:
$comment = 'hello https:// and hello email mail#mail.com'
But it doesn't work. Probably because of wrong split.
I tried with filter like this:
if (filter_var($comment, FILTER_VALIDATE_EMAIL)) {
return new \Exception('exist url');
}
but when I have only email in comment filter, it works and finds this email, if I have 'hello mail#mail.com', it does not find the email.
For filter FILTER_VALIDATE_URL - it won't find the url if it is like this 'hello erl https:\hello'
How to write the right regex to find url or email in some text?

In this line
if(preg_match("/^$regex$/", $comment))
you have ^ then $regex and $. The ^ means to only match at the beginning of the line. The $ means to only match at the end. Therefor, this would only match if that line contained only the match and nothing else.
Remove it to get matches anywhere in a line.
if(preg_match("/$regex/", $comment))
However, I would suggest to simply look for a library that offers the matching you are looking for. A library will have much more testing done to cover edge cases and other things that are easily missed.

Related

Find first position where pattern matching failed.

i am trying to find the common errors users have while entering email ids. I can always validate EMAIL using PHP Email Filter
$email = "someone#exa mple.com";
if(!filter_var($email, FILTER_VALIDATE_EMAIL))
{
echo "E-mail is not valid";
}
else
{
echo "E-mail is valid";
}
or pattern matching
$email = test_input($_POST["email"]);
if (!preg_match("/([\w\-]+\#[\w\-]+\.[\w\-]+)/",$email))
{
$emailErr = "Invalid email format";
}
I agree that these are not full proof ways to validate emails. However they should capture 80% of cases.
What I want is - Which position email became invalid? if its a space, at what position user had entered space. or did it fail because of "." in the end?
Any pointers?
-Ajay
PS : I have seen other thread regarding email validations. I can add complexity and make it 100%. concern here is to capture the most common mistakes made by people when entering Email ID.

This is difficult because sometimes it's not always a single character that makes an email address invalid. The example you give could easily be solved by:
$position = strpos('someone#exa mple.com', ' ');
However, it seems you are not interested in an all encompassing solution but rather something that will catch the majority of character based errors. I would take the approach of using the regular expression but capture each section of the email address in a sub pattern for further validation. For example:
$matches = null;
$result = preg_match("/(([\w\-]+)\#([\w\-]+)\.([\w\-]+))/", $email, $matches);
var_dump($matches);
By capturing sections of the regex validation in sub patterns you could then dive further into each section and run similar or different tests to determine where the user went wrong. For example you could try and match up the TLD of the email address against a whitelist. Of course there are also much more robust email validators in frameworks like Zend or Symfony that will tell you more specifically WHY an email address is not valid, but in terms of knowing which specific character position is at fault (assuming it's a character that is at fault) I think a combination of tactics would work best.

There is no way I know of in Java to report back the point at which a regex failed. What you could do is start building a set of common errors (as described by Manu) that you can check for (this might or might not use regex expressions). Then categorize into these known errors and 'other', counting the frequency of each. When an 'other' error occurs, develop a regex that would catch it.
If you want some assistance with tracking down why the regex failed you could use a utility such as regexbuddy, shown in this answer.

Just implement some checks on your own:
Point at the end:
if(substr($email, -1) == '.')
echo "Please remove the point at the end of you email";
Spaces found:
$spacePos = strpos($email, ' ');
if(spacePos !== false)
echo "Please remove the space at pos: ".$spacePos;
And so on...

First of all, I would like to say that the reason your example fails is not the space. It is the lack of '.' in former part and lack of '#' in the latter part.
If you input
'someone#example.co m' or 's omeone#example.com', it will success.
So you may need 'begin with' and 'end with' pattern to check strictly.
There is no exist method to check where a regular expression match fails as I know since check only gives the matches, but if you really want to find it out , we can do something by 'break down' the regular expression.
Let's take a look at your example check.
preg_match ("/^[\w\-]+\#[\w\-]+\.[\w\-]+$/",'someone#example.com.');
If it fails, you can check where its 'sub expression' successes and find out where the problem is:
$email = "someone#example.com.";
if(!preg_match ("/^[\w\-]+\#[\w\-]+\.[\w\-]+$/",$email)){ // fails because the final '.'
if(preg_match("/^[\w\-]+\#[\w\-]+\./",$email,$matches)){ // successes
$un_match = "[\w\-]+"; // What is taken from the tail of the regular expression.
foreach ($matches as $match){
$email_tail = str_replace($match,'',$email); // The email without the matching part. in this case : 'com.'
if(preg_match('/^'.$un_match.'/',$email_tail,$match_tails)){ // Check and delete the part that tail match the sub expression. In this example, 'com' matches /[\w\-]+/ but '.' doesn't.
$result = str_replace($match_tails[0],'',$email_tail);
}else{
$result = $email_tail;
}
}
}
}
var_dump($result); // you will get the last '.'
IF you understand the upper example, then we can make our solution more common, for instance, something like below:
$email = 'som eone#example.com.';
$pattern_chips = array(
'/^[\w\-]+\#[\w\-]+\./' => '[\w\-]+',
'/^[\w\-]+\#[\w\-]+/' => '\.',
'/^[\w\-]+\#/' => '[\w\-]+',
'/^[\w\-]+/' => '\#',
);
if(!preg_match ("/^[\w\-]+\#[\w\-]+\.[\w\-]+$/",$email)){
$result = $email;
foreach ($pattern_chips as $pattern => $un_match){
if(preg_match($pattern,$email,$matches)){
$email_tail = str_replace($matches[0],'',$email);
if(preg_match('/^'.$un_match.'/',$email_tail,$match_tails)){
$result = str_replace($match_tails[0],'',$email_tail);
}else{
$result = $email_tail;
}
break;
}
}
if(empty($result)){
echo "There has to be something more follows {$email}";
}else{
var_dump($result);
}
}else{
echo "success";
}
and you will get output:
string ' eone#example.com.' (length=18)

PHP: need explanation using [a-zA-Z0-9]

I am new to PHP (not programming overall), and having problems with this simple line of code. I want to check whether some input field has been filled as anysymbolornumber#anysymbolornumber just for checking whether correct email was typed. I don't get any error, but the whole check system doesn't work. Here is my code and thanks!
if ($email = "[a-zA-Z0-9]#[a-zA-Z0-9]")
{

Since your new to php , i suggest you should buy a book or read an tutorial or two.
For email validation you should use filter_var an build in function that comes with with php 5.2 and up :
<?php
if(!filter_var("someone#example....com", FILTER_VALIDATE_EMAIL)){
echo("E-mail is not valid");
}else{
echo("E-mail is valid");
}
?>

you can use other functions .. instead of regular expressions
if(filter_var($email,FILTER_VALIDATE_EMAIL)){
echo "Valid email";
}else{
echo "Not a valid email";
}

As correctly pointed out in the comments, the regex you are using isn't actually a very good way of validating the email. There are much better ways, but if you are just wanting to get a look at how regular expressions work, it is a starting point. I am not an expert in regex, but this will at least get your if statement working :)
if(preg_match("[a-zA-Z0-9]#[a-zA-Z0-9]",$email)
{
// Your stuff
}

It looks like you're trying to verify that an email address matches a certain pattern. But you're not using the proper function. You probably want something like preg_match( $pattern, $target ).
Also, your regex isn't doing what you would want anyway. In particular, you need some quantifiers, or else your email addresses will only be able to consist of one character ahead of the #, and one after. And you need anchors at the beginning and end of the sequence so that you're matching against the entire address, not just the two characters closest to the #.
Consider this:
if( preg_match("^[a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+$", $email ) ) {
// Whatever
}
Keep in mind, however, that this is really a poor-man's approach to validating an email address. Email addresses can contain a lot more characters than those listed in the character class I provided. Furthermore, it would also be possible to construct an invalid email address with those same character classes. It doesn't even begin to deal with Unicode. Using a regex to validate an email address is quite difficult. Friedl takes a shot at it in Mastering Regular Expressions (O'Reilly), and his effort takes a 2KB regular expression pattern. At best, this is only a basic sanity check. It's not a secure means of verifying an email address. At worst, it literally misses valid regexes, and still matches invalid ones.
There is the mailparse_rfc822_parse_addresses function which is more reliable in detecting and matching email addresses.

You need to use preg_match to run the regular expression.
Now you're setting the $email = to the regular expression.
It could look like:
if ( preg_match("[a-zA-Z0-9]#[a-zA-Z0-9]", $email ))
Also keep in mind when matching in an if you must use the == operator.
I believe best pratice would be to use a filter_var instead like:
if( ! filter_var( $email , FILTER_VALIDATE_EMAIL )) {
// Failed.
}

Another way taken from: http://www.linuxjournal.com/article/9585
function check_email_address($email) {
// First, we check that there's one # symbol,
// and that the lengths are right.
if (!ereg("^[^#]{1,64}#[^#]{1,255}$", $email)) {
// Email invalid because wrong number of characters
// in one section or wrong number of # symbols.
return false;
}
// Split it into sections to make life easier
$email_array = explode("#", $email);
$local_array = explode(".", $email_array[0]);
for ($i = 0; $i < sizeof($local_array); $i++) {
if
(!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&
↪'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$",
$local_array[$i])) {
return false;
}
}
// Check if domain is IP. If not,
// it should be valid domain name
if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) {
$domain_array = explode(".", $email_array[1]);
if (sizeof($domain_array) < 2) {
return false; // Not enough parts to domain
}
for ($i = 0; $i < sizeof($domain_array); $i++) {
if
(!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|
↪([A-Za-z0-9]+))$",
$domain_array[$i])) {
return false;
}
}
}
return true;
}

PHP: Extract video ID from YouTube URLs

Do I do something wrong?
I need the youtube code, but it doesn't return the real value.
if(preg_match_all("http:\/\/www\.youtube\.com\/v\/(.*)(.*)", $row->n_texto, $matches){
$code = $image_to_thumb .= "http://i1.ytimg.com/vi/".$matches[1][0]."/0.jpg";
}
Edit - ircmaxell Based on the comment, the link structure in the text is:
http:// www.youtube.com/v/plMvAh10HVg%26hl=en%26fs=1%26rel=0
Update
The problem is: my code return a link like this:
http://www.youtube.com/v/plMvAh10HVg%26hl=en%26fs=1%26rel=0
Can I stop it with regexp before appear %26hl=en%26fs=1%26rel=0?

Your regex is not correct. There are more than a few things wrong with it. Now, as far as what you want, try this:
#http://(?:.*)youtube.com/v/([^/\#?]+)#
Now, as for why, let's look at the regex:
http://(?:.*)youtube.com
You're looking for a string that starts with http://, has anything after (www., ww2., or nothing).
/v/
You're looking for /v/ as the start of the URL.
([^/\\#?]+)
You're looking for everything else UP TO another /, a query string (?) or a anchor (#). So that should match the ID you're looking for.
So, it would be
if(preg_match("#http://(?:.*)youtube.com/v/([^/\#?]+)#", $row->n_texto, $matches){
$code = $image_to_thumb .= "http://i1.ytimg.com/vi/".$matches[1]."/0.jpg";
}
If you wanted to find all:
if(preg_match_all("#http://(?:.*)youtube.com/v/([^/\#?]+)#", $row->n_texto, $matches){
foreach ($matches[1] as $match) {
$code = $image_to_thumb .= "http://i1.ytimg.com/vi/".$match."/0.jpg";
}
}

the link provided has a space before the 1st w in www.youtube.com, the code you need is :
if(preg_match_all("%http://www\.youtube\.com/v/([\w]+)%i", $row->n_texto , $matches)){
$code = $image_to_thumb .= "http://i1.ytimg.com/vi/".$matches[1][0]."/0.jpg";
}
also, the url you have is encoded, you may want to use urldecode($row->n_texto) before using it.

^http://\w{0,3}.?youtube+\.\w{2,3}/watch\?v=[\w-]{11}
according to http://www.regexlib.com/REDetails.aspx?regexp_id=2569

Only execute script if entered email is from a specific domain

I am trying to create a script that will only execute its actions if the email address the user enters is from a specific domain. I created a regex that seems to work when testing it via regex utility, but when its used in my PHP script, it tells me that valid emails are invalid. In this case, I want any email that is from #secondgearsoftware.com, #secondgearllc.com or asia.secondgearsoftware.com to echo success and all others to be rejected.
$pattern = '/\b[A-Z0-9\._%+-]+#((secondgearsoftware|secondgearllc|euro\.secondgearsoftware|asia\.secondgearsoftware)+\.)+com/';
$email = urldecode($_POST['email']);
if (preg_match($pattern, $email))
{
echo 'success';
}
else
{
echo 'opposite success';
}
I am not really sure what's futzed with the pattern. Any help would be appreciated.

Your regular expression is a bit off (it will allow foo#secondgearsoftwaresecondgearsoftware.com) and can be simplified:
$pattern = '/#((euro\.|asia\.)?secondgearsoftware|secondgearllc)\.com$/i';
I've made it case-insensitive and anchored it to the end of the string.
There doesn't seem to be a need to check what's before the "#" - you should have a proper validation routine for that if necessary, but it seems you just want to check if the email address belongs to one of these domains.

You probably need to use /\b[A-Z0-9\._%+-]+#((euro\.|asia\.)secondgearsoftware|secondgearllc)\.com/i (note the i at the end) in order to make the regex case-insensitive. I also dropped the +s as they allow for infinite repetition which doesn't make sense in this case.

Here's an easy to maintain solution using regular expressions
$domains = array(
'secondgearsoftware',
'secondgearllc',
'euro\.secondgearsoftware',
'asia\.secondgearsoftware'
);
preg_match("`#(" .implode("|", $domains). ")\.com$`i", $userProvidedEmail);
Here's a couple of tests:
$tests = array(
'bob#secondgearsoftware.com',
'bob#secondgearllc.com',
'bob#Xsecondgearllc.com',
'bob#secondgearllc.net',
'bob#euro.secondgearsoftware.org',
'bob#euro.secondgearsoftware.com',
'bob#euroxsecondgearsoftware.com',
'bob#asia.secondgearsoftware.com'
);
foreach ( $tests as $test ) {
echo preg_match("`#(" .implode("|", $domains). ")\.com$`i", $test),
" <- $test\n";
}
Result (1 is passing of course)
1 <- bob#secondgearsoftware.com
1 <- bob#secondgearllc.com
0 <- bob#Xsecondgearllc.com
0 <- bob#secondgearllc.net
0 <- bob#euro.secondgearsoftware.org
1 <- bob#euro.secondgearsoftware.com
0 <- bob#euroxsecondgearsoftware.com
1 <- bob#asia.secondgearsoftware.com

I suggest you drop the regex and simply use stristr to check if it matches. Something like this should work:
<?php
// Fill out as needed
$domains = array('secondgearsoftware.com', 'secondgearllc.com');
$email = urldecode($_POST['email']);
$found = false;
for(i=0;i<count($domains);i++)
{
if ($domains[i] == stristr($email, $domains[i]))
$found = true;
}
if ($found) ...
?>
The function stristr returns the e-mail address from the part where it found a match to the end, which should be the same as the match in this case. Technically there could be something prior to the domains (fkdskjfsdksfks.secondgeartsoftware.com), but you can just insert "#domainneeded.com" to prevent this. This code is also slightly longer, but easily extended with new domains without worrying about regex.

basic if/then with PHP

Okay so i set up this thing so that I can print out page that people came from, and then put dummy tags on certain pages. Some pages have commented out "linkto" tags with text in between them.
My problem is that some of my pages don't have "linkto" text. When I link to this page from there I want it to grab everything between "title" and "/title". How can I change the eregi so that if it turns up empty, it should then grab the title?
Here is what I have so far, I know I just need some kind of if/then but I'm a rank beginner. Thank you in advance for any help:
<?php
$filesource = $_SERVER['HTTP_REFERER'];
$a = fopen($filesource,"r"); //fopen("html_file.html","r");
$string = fread($a,1024);
?>
<?php
if (eregi("<linkto>(.*)</linkto>", $string, $out)) {
$outdata = $out[1];
}
//echo $outdata;
$outdatapart = explode( " " , $outdata);
echo $part[0];
?>

Here you go: if eregi() fails to match, the $outdata assignment will never happen as the if block will not be executed. If it matches, but there's nothing between the tags, $outdata will be assigned an empty string. In both cases, !$outdata will be true, so we can fallback to a second match on the title tag instead.
if(eregi("<linkto>(.*?)</linkto>", $string, $link_match)) {
$outdata = $link_match[1];
}
if(!$outdata && eregi("<title>(.*?)</title>", $string, $title_match)) {
$outdata = $title_match[1];
}
I also changed the (.*) in the match to (.*?). This means, don't be greedy. In the (.*) form, if you had $string set to
<title>Page Title</title> ...
... <iframe><title>A second title tag!</title></iframe>
The regex would match
Page Title</title> ... ... <iframe><title>A second title tag!
Because it tries to match as much as possible, as long as the text is between any and any other !. In the (.*?) form, the match does what you'd expect - it matches
Page Title
And stops as soon as it is able.
...
As an aside, this thing is an interesting scheme, but why do you need it? Pages can link to other pages and pass parameters via the query string:
...
Then somescript.php can access the prevpage parameter via the $_GET['prevpage'] superglobal variable.
Would that solve your problem?

The POSIX regex extension (ereg etc.) will be deprecated as of PHP 5.3.0 and may be gone completely come PHP 6, you're better off using the PCRE functions (preg_match and friends).
The PCRE functions are also faster, binary safe and support more features like non-greedy matching etc.
Just a pointer.

you need if, else.
if(eregi(...))
{
.
.
.
}
else
{
just grab title;
}
perhaps you should have done a quick google search to find this very simple answer.

Just add another if test before you assign the match to $outdata:
if (eregi("<linkto>(.*)</linkto>", $string, $out)) {
if ($out[1] != "") {
$outdata = $out[1];
} else {
// Look in the title.
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regular find url and email - php

Related

Find first position where pattern matching failed.

PHP: need explanation using [a-zA-Z0-9]

PHP: Extract video ID from YouTube URLs

Only execute script if entered email is from a specific domain

basic if/then with PHP

Categories

Resources