I'm trying to get a part of a string that starts with for example Name:. If the whole string looks like Name: Carl, I just want the Carl part and not the Name: prefix.
How can I do that? I have tried with:
$data = file_get_contents('page.html');
$regex = '/Name:.*/';
preg_match($regex,$data,$match);
var_dump($match);
But I get the output:
array(1) { [0]=> string(28) "Name: Carl"
The other thing I don't understand is why the array(1) { [0]=> string(28) is showing.
You have to put what you want to retrieve in ():
'/Name:(.*)/i'
For your match line, do the following instead:
$regex = '/Name:(.*)/';
The matched portion (inside (.*)) will be in $match.
Related
I am new to RegEx. I am parsing a HTML page and because it is buggy I cannot use a XML or HTML parser. So I am using a regular expression.
My code looks like this:
$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="[A-Z\\d]+" data-index="\\d+"/', $html, $result);
var_dump($result);
The output looks good so the code is working. Now I want to extract the matched values. I did it exactly as described in this answer and now the code looks like this:
$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="#([A-Z\\d]+)" data-index="#(\\d+)"/', $html, $result);
var_dump($result);
But it outputs an empty array. What is wrong? Please don't improve the pattern by adding the closing '>' or making it robust against white spaces. I just need to get the code running.
You could write the code and the pattern like this, using a single backslash to match digits \d and omit the # in the pattern as that is not in the example data:
$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="([A-Z\d]+)" data-index="(\d+)"/', $html, $result);
var_dump($result);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(38) "<div data-id="ABC012" data-index="123""
}
[1]=>
array(1) {
[0]=>
string(6) "ABC012"
}
[2]=>
array(1) {
[0]=>
string(3) "123"
}
}
my HTML form code replaces some words with <-#word#-> using the code
$string = preg_replace("/($p)/i", '<-#$1#->', $string);
the problem is that if the form has some errors, upon resubmitting the form the word becomes <-#<-#<-#word#->#->#-> every time someone resubmits the form. Is it possible to replace but if it is already replaced then do not.
This is what I tried using NOT operator but it is not working
$string = preg_replace("/^(<-#)($p)^(#->)/i", '<-#$1#->', $string);
You could use a negative lookarounds to assert what is directly on the left an on the right is not <-# and
(?<!<-#)(word)(?!#->)
Regex demo | Php demo
Your code could look like:
$string = preg_replace("/(?<!<-#)($p)(?!#->)/i", '<-#$1#->', $string);
Another method might be to check with preg_match_all() to ensure if your matches are returning:
$string = '<-#<-#<-#Any alphanumeric input that user may wish#->#->#->';
preg_match_all("/(<-#)+([A-Za-z0-9_\s]+)(#->)+/s", $string, $matches);
$string = '<-#' . $matches[2][0] . '#->';
var_dump($string);
which outputs:
string(47) "<-#Any alphanumeric input that user may wish#->"
var_dump($matches); would return:
array(4) {
[0]=>
array(1) {
[0]=>
string(59) "<-#<-#<-#Any alphanumeric input that user may wish#->#->#->"
}
[1]=>
array(1) {
[0]=>
string(3) "<-#"
}
[2]=>
array(1) {
[0]=>
string(41) "Any alphanumeric input that user may wish"
}
[3]=>
array(1) {
[0]=>
string(3) "#->"
}
}
Code:
$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$urls = array();
preg_match($pattern, $comment, $urls);
return $urls;
According to an online regex tester, this regex is correct and should be working:
http://regexr.com?35nf9
I am outputting the $links array using:
$linkItems = $model->getLinksInComment($model->comments);
//die(print_r($linkItems));
echo '<ul>';
foreach($linkItems as $link) {
echo '<li>'.$link.'</li>';
}
echo '</ul>';
The output looks like the following:
http://google.com
http
The $model->comments looks like the following:
destined for surplus
RT#83015
RT#83617
http://google.com
https://google.com
non-link
The list generated is only suppose to be links, and there should be no lines that are empty. Is there something wrong with what I did, because the Regex seems to be correct.
If I'm understanding right, you should use preg_match_all in your getLinksInComment function instead:
preg_match_all($pattern, $comment, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return array(); #in case there are no matches
preg_match_all gets all matches in a string (even if the string contains newlines) and puts them into the array you supply as the third argument. However, anything matched by your regex's capture groups (e.g. (http|https|ftp|ftps)) will also be put into your $matches array (as $matches[1] and so on). That's why you want to return just $matches[0] as your final array of matches.
I just ran this exact code:
$line = "destined for surplus\n
RT#83015\n
RT#83617\n
http://google.com\n
https://google.com\n
non-link";
$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all($pattern, $line, $matches);
var_dump($matches);
and got this for my output:
array(3) {
[0]=>
array(2) {
[0]=>
string(17) "http://google.com"
[1]=>
string(18) "https://google.com"
}
[1]=>
array(2) {
[0]=>
string(4) "http"
[1]=>
string(5) "https"
}
[2]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(0) ""
}
}
Your comment is structured as multiple lines, some of which contain the URLs in which you're interested and nothing else. This being the case, you need not use anything remotely resembling that disaster of a regex to try to pick URLs out of the full comment text; you can instead split by newline, and examine each line individually to see whether it contains a URL. You might therefore implement a much more reliable getLinksInComment() thus:
function getLinksInComment($comment) {
$links = array();
foreach (preg_split('/\r?\n/', $comment) as $line) {
if (!preg_match('/^http/', $line)) { continue; };
array_push($links, $line);
};
return $links;
};
With suitable adjustment to serve as an object method instead of a bare function, this should solve your problem entirely and free you to go about your day.
I have for example such string - "7-th Road" or "7th number some other words" or "Some word 8-th word".
I need to get the first occurrence of number and all other next symbols to first occurrence of space.
So for examples above i need such values "7-th", "7th", "8-th".
And then from these matches like "7-th" i need extract only numbers in other operations.
Thanks in advance!
Regex should be /(\d+)([^\d]+)\s/ and the numbers would resolve to $1 and the ending characters to $2
Sample Code:
$string = '7-th Road';
preg_match_all('/(\d+)([^\d]+)\s/', $string, $result, PREG_PATTERN_ORDER);
var_dump($result[1]);
array(1) {
[0]=> string(1) "7"
}
var_dump($result[2]);
array(1) {
[0]=> string(1) "-th"
}
Are you asking for something like this?
#(\d+)-?(?:st|nd|rd|th)#
Example
If you would like to get just nums from the text use it:
preg_match_all('/(\d+)[th|\-th]*?/','7-th", "7th", "8-th', $matches);
But if you would like to remove 'th' or other just do replacement:
preg_replace('/(\d+)[th|\-th]*?/','$1', 'some string')
Not sure about the last one...
I'd like a reg exp which can take a block of string, and find the strings matching the format:
....
And for all strings which match this format, it will extract out the email address found after the mailto:. Any thoughts?
This is needed for an internal app and not for any spammer purposes!
If you want to match the whole thing from :
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>(.*?)\<\/a\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
To fastern and shortern it:
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
The 2nd matching group will be whatever email it is.
Example:
$html ='<div>test</div>';
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>(.*?)\<\/a\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
var_dump($matches);
Output:
array(1) {
[0]=>
array(5) {
[0]=>
string(39) "test"
[1]=>
string(1) " "
[2]=>
string(13) "test#live.com"
[3]=>
string(0) ""
[4]=>
string(4) "test"
}
}
There are plenty of different options on regexp.info
One example would be:
\b[A-Z0-9._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b
The "mailto:" is trivial to prepend to that.
/(mailto:)(.+)(\")/
The second matching group will be the email address.
You can work with the internal PHP filter http://us3.php.net/manual/en/book.filter.php
(they have one which is specially there for validating or sanitizing email -> FILTER_VALIDATE_EMAIL)
Greets
for me worked ~<mailto(.*?)>~
will return an array containing elements found.
Here you can test it: https://regex101.com/r/rTmKR4/1