Get a list of domains from a table via regex - php

I have list of domains in table with more info and
<td>example1.com</td>
<td>example2.org</td>
<td>example3.com</td>
<td>example4.com</td>
I need get .com domains using a regex. I tried to use something like :
'<td>(.............).com'
But what can I write instead of dots? What do I need to use?
I need get the data between the tags: <td>domain.com</td> -> domain.com
'<td>([^<]+\.com)</td>'
- it's more better, but i need get without tags

<?php
$html = '<td>example1.com</td>
<td>example2.org</td>
<td>example3.com</td>
<td>example4.com</td>';
$matches = array();
preg_match_all('/<td>(.*?.com)<\/td>/i', $html, $matches);
var_dump($matches[1]);
prints:
array(3) {
[0]=>
string(12) "example1.com"
[1]=>
string(12) "example3.com"
[2]=>
string(12) "example4.com"
}

Something like that:
'<td>([^<]+\.com)</td>'
but you shouldn't use regular expressions to parse html.

You can use look aheads and look behinds if you want to capture something but make sure it's surrounded by something else. Here we're capturing .com only.
<?php
$html = '<td>example1.com</td>
<td>example2.org</td>
<td>example3.com</td>
<td>example4.com</td>';
$pattern = "!(?<=<td>).*\.com*(?=</td>)!";
preg_match_all($pattern,$html,$matches);
$urls = $matches[0];
print_r($urls);
?>
Output
Array
(
[0] => example1.com
[1] => example3.com
[2] => example4.com
)

Related

remove part of string after 4th slash in php

I have an array which is contains links and trying to edit those links. Trying to cut links after 4th slash.
[0]=>
string(97) "https://www.nowhere.com./downtoalley/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shimokita4040/outline"
[1]=>
string(105) "https://www.example.com./wowar-waseda/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shinjuku-w25861/outline"
[2]=>
string(91) "https://www.hey.com./gotoashbourn/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=kinuta7429/outline"
expected output is like this:
[0]=>
string(97) "https://www.nowhere.com./downtoalley/"
[1]=>
string(105) "https://www.example.com./wowar-waseda/"
[2]=>
string(91) "https://www.hey.com./gotoashbourn/"
Lengths are different, so I can't use strtok any other options for this?
Try following code:
<?php
$arr = array(
0 => "https://www.nowhere.com./downtoalley/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shimokita4040/outline",
1 => "https://www.example.com./wowar-waseda/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shinjuku-w25861/outline",
2 => "https://www.hey.com./gotoashbourn/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=kinuta7429/outline");
$resultArray = array();
foreach($arr as $str) {
array_push($resultArray, current(explode("?",$str)));
}
print_r($resultArray);
?>
You can test this code here
You can use preg_replace to replace everything in each string after the fourth / with nothing using this regex
^(([^/]*/){4}).*$
which looks for 4 sets of non-/ characters followed by a /, collecting that text in capture group 1; and then replacing with $1 which gives only the text up to the 4th /:
$strings = array("https://www.nowhere.com./downtoalley/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shimokita4040/outline",
"https://www.example.com./wowar-waseda/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shinjuku-w25861/outline",
"https://www.hey.com./gotoashbourn/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=kinuta7429/outline");
print_r(array_map(function ($v) { return preg_replace('#^(([^/]*/){4}).*$#', '$1', $v); }, $strings));
Output:
Array (
[0] => https://www.nowhere.com./downtoalley/
[1] => https://www.example.com./wowar-waseda/
[2] => https://www.hey.com./gotoashbourn/
)
Demo on 3v4l.org
There is no direct function to achieve this. You can follow PHP code as below:
$explodingLimit = 4;
$string = "https://www.nowhere.com./downtoalley/?iad2=sumai-pickup&argument=CH4fRVnN&dmai=shimokita4040/outline";
$stringArray = explode ("/", $string);
$neededElements = array_slice($stringArray, 0, $explodingLimit);
echo implode("/", $neededElements);
I have made this for one element which you can use for you array. Also you can add last '/' if you need that. Hope it helps you.

Remove All Text 2 Places After Decimal Place PHP

I'm trying to scrape a products price from a page, however unfortunately it's not in a nice clean div so I'm having to clear all the other junk.
Note: I have looked at several examples however they all assume you only have nice organised numbers in your variable, not raw HTML stuffed on the end.
An example of the string my variable may hold:
$2.87 <span>10% Off Sale</span>
I've played about with substr and sttrpos, read the manual and still can't figure it out on my own.
I want to just cut the string two digits after the first decimal place is found... No doubt it's extremely simple when you know how!
What I want to end up with:
$2.87
An example of the mess I've got myself into trying:
$whatIWant = substr($data, strrpos($data, ".") + 2);
Thanks in advance,
Try this solution:
<?php
$string = "$2.87 <span>10% Off Sale</span>";
$matches = array();
preg_match('/(\$\d+\.\d{2})/', $string, $matches);
var_dump($matches);
Output:
array(2) {
[0]=>
string(5) "$2.87"
[1]=>
string(5) "$2.87"
}
For more info (why result is array etc.) you should check PHP manual on preg_match() function: link
The below should grab the matches for you;
$pattern = '/(\$\d+\.\d{2})/';
$string = '$2.87 <span>10% Off Sale</span>';
$matches = array();
preg_match($pattern, $string, $matches);
Outputs:
Array ( [0] => $2.87 [1] => $2.87 )
There is def better way to do this. For a start, assuming the html structure that you have above will always be the same, you could do something like:
$var = "$85.25 <span>10% Off Sale</span>";
$spl = explode("<", $var);
echo $spl[0];

Extract email address from string - php

I want to extract email address from a string, for example:
<?php // code
$string = 'Ruchika <ruchika#example.com>';
?>
From the above string I only want to get email address ruchika#example.com.
Kindly, recommend how to achieve this.
Try this
<?php
$string = 'Ruchika < ruchika#example.com >';
$pattern = '/[a-z0-9_\-\+\.]+#[a-z0-9\-]+\.([a-z]{2,4})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
var_dump($matches[0]);
?>
see demo here
Second method
<?php
$text = 'Ruchika < ruchika#example.com >';
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $text, $matches);
print_r($matches[0]);
?>
See demo here
Parsing e-mail addresses is an insane work and would result in a very complicated regular expression. For example, consider this official regular expression to catch an e-mail address: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
Amazing right?
Instead, there is a standard php function to do this called mailparse_rfc822_parse_addresses() and documented here.
It takes a string as argument and returns an array of associative array with keys display, address and is_group.
So,
$to = 'Wez Furlong <wez#example.com>, doe#example.com';
var_dump(mailparse_rfc822_parse_addresses($to));
would yield:
array(2) {
[0]=>
array(3) {
["display"]=>
string(11) "Wez Furlong"
["address"]=>
string(15) "wez#example.com"
["is_group"]=>
bool(false)
}
[1]=>
array(3) {
["display"]=>
string(15) "doe#example.com"
["address"]=>
string(15) "doe#example.com"
["is_group"]=>
bool(false)
}
}
try this code.
<?php
function extract_emails_from($string){
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $string, $matches);
return $matches[0];
}
$text = "blah blah blah blah blah blah email2#address.com";
$emails = extract_emails_from($text);
print(implode("\n", $emails));
?>
This will work.
Thanks.
This is based on Niranjan's response, assuming you have the input email enclosed within < and > characters). Instead of using a regular expression to grab the email address, here I get the text part between the < and > characters. Otherwise, I use the string to get the entire email. Of course, I didn't make any validation on the email address, this will depend on your scenario.
<?php
$string = 'Ruchika <ruchika#example.com>';
$pattern = '/<(.*?)>/i';
preg_match_all($pattern, $string, $matches);
var_dump($matches);
$email = $matches[1][0] ?? $string;
echo $email;
?>
Here is a forked demo.
Of course, if my assumption isn't correct, then this approach will fail. But based on your input, I believe you wanted to extract emails enclosed within < and > chars.
This function extract all email from a string and return it in an array.
function extract_emails_from($string){
preg_match_all( '/([\w+\.]*\w+#[\w+\.]*\w+[\w+\-\w+]*\.\w+)/is', $string, $matches );
return $matches[0];
};
This works great and it's minimal:
$email = strpos($from, '<') ? substr($from, strpos($from, '<') + 1, -1) : $from
use (my) function getEmailArrayFromString to easily extract email adresses from a given string.
<?php
function getEmailArrayFromString($sString = '')
{
$sPattern = '/[\._\p{L}\p{M}\p{N}-]+#[\._\p{L}\p{M}\p{N}-]+/u';
preg_match_all($sPattern, $sString, $aMatch);
$aMatch = array_keys(array_flip(current($aMatch)));
return $aMatch;
}
// Example
$sString = 'foo#example.com XXX bar#example.com XXX <baz#example.com>';
$aEmail = getEmailArrayFromString($sString);
/**
* array(3) {
[0]=>
string(15) "foo#example.com"
[1]=>
string(15) "bar#example.com"
[2]=>
string(15) "baz#example.com"
}
*/
var_dump($aEmail);
Based on Priya Rajaram's code, I have optimised the function a little more so that each email address only appears once.
If, for example, an HTML document is parsed, you usually get everything twice, because the mail address is also used in the mailto link, too.
function extract_emails_from($string){
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $string, $matches);
return array_values(array_unique($matches[0]));
}
This will work even on subdomains. It extracts all emails from text.
$marches[0] has all emails.
$pattern = "/[a-zA-Z0-9-_]{1,}#[a-zA-Z0-9-_]{1,}(.[a-zA-Z]{1,}){1,}/";
preg_match_all ($pattern , $string, $matches);
print_r($matches);
$marches[0] has all emails.
Array
(
[0] => Array
(
[0] => clotdesormakilgehr#prednisonecy.com
[1] => **********#******.co.za.com
[2] => clotdesormakilgehr#prednisonecy.com
[3] => clotdesormakilgehr#prednisonecy.mikedomain.com
[4] => clotdesormakilgehr#prednisonecy.com
)
[1] => Array
(
[0] => .com
[1] => .com
[2] => .com
[3] => .com
[4] => .com
)
)
A relatively straight forward approach is to use PHP built-in methods for splitting texts into words and validating E-Mails:
function fetchEmails($text) {
$words = str_word_count($text, 1, '.#-_1234567890');
return array_filter($words, function($word) {return filter_var($word, FILTER_VALIDATE_EMAIL);});
}
Will return the e-mail addresses within the text variable.

Replace all links from the text that don't contain query string (PHP)

I want to replace (well, alter) all youtube links from the text block that don't have query string with manual query string.
For example, text code could look like this:
http://youtube.com/embed/ABC
http://youtube.com/embed/DEF?foo=bar
http://youtube.com/embed/EFG
And I want it to look like:
http://youtube.com/embed/ABC?sup=bro
http://youtube.com/embed/DEF?foo=bar
http://youtube.com/embed/EFG?sup=bro
What is the best way of achieving that using PHP?
Simply check if :
there's no ?, with ^([^?]+)$
the query string is empty, with \?$
$links = array(
'http://youtube.com/embed/ABC',
'http://youtube.com/embed/DEF?foo=bar',
'http://youtube.com/embed/EFG',
'http://youtube.com/embed/HIJ?',
);
$nlinks = preg_replace('/^([^?]+)$|\?$/', '$1?sup=bro', $links);
var_dump($nlinks);
/*
* array(3) {
* [0]=> string(36) "http://youtube.com/embed/ABC?sup=bro"
* [1]=> string(36) "http://youtube.com/embed/DEF?foo=bar"
* [2]=> string(36) "http://youtube.com/embed/EFG?sup=bro"
* [2]=> string(36) "http://youtube.com/embed/HIJ?sup=bro"
* }
*/
EDIT
I added a case for urls with empty query string, like http://youtube.com/embed/HIJ?
A solution without regex. Get all the URLs into an array, use parse_url() to grab the query string part (if there exists one) and append the custom $query inside the loop.
$query = 'sup=bro'; // define this
foreach($urls as &$url) {
$parts = parse_url($url);
if (!isset($parts['query'])) {
$url .= '?' . $query;
}
}
print_r($urls);
Output:
Array
(
[0] => http://youtube.com/embed/ABC?sup=bro
[1] => http://youtube.com/embed/DEF?foo=bar
[2] => http://youtube.com/embed/EFG?sup=bro
)
Demo.
I think that you need check strings using preg_match. If you have remainder after "?", leave it as is; otherwise, add "sup=bro".
You can use this pattern:
$urls = <<<LOD
http://youtube.com/embed/ABC
http://youtube.com/embed/DEF?foo=bar
http://youtube.com/embed/EFG
LOD;
$urls = preg_replace('~/[^/?\s]*\K$~m', '?sup=bro', $urls);

Why preg_match fails to get the result?

I have the below text displayed on the browser and trying to get the URL from the string.
string 1 = voice-to-text from #switzerland: http://bit.ly/lnpDC12D
When I try to use preg_match and trying to get the URL, but it fails
$urlstr = "";
preg_match('/\b((?#protocol)https?|ftp):\/\/((?#domain)[-A-Z0-9.]+)((?#file)\/[-A-Z0-9+&##\/%=~_|!:,.;]*)?((?#parameters)\?[A-Z0-9+&##\/%
=~_|!:,.;]*)?/i', $urlstr, $match);
echo $match[0];
I think #switzerland: has one more http// ... will it be problem ?
the above split works perfect for the below string,
voice-to-text: http://bit.ly/jDcXrZg
In this case I think parse_url will be better choice than regex based code. Something like this may work (assuming your URL always starts with http):
$str = "voice-to-text from #switzerland: http://bit.ly/lnpDC12D";
$pos = strrpos($str, "http://");
if ($pos>=0) {
var_dump(parse_url(substr($str, $pos)));
}
OUTPUT
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(6) "bit.ly"
["path"]=>
string(9) "/lnpDC12D"
}
As far as I understand your request, here is a way to do it :
$str = 'voice-to-text from <a href="search.twitter.com/…;: http://bit.ly/lnpDC12D';
preg_match("~(bit.ly/\S+)~", $str, $m);
print_r($m);
output:
Array
(
[0] => bit.ly/lnpDC12D
[1] => bit.ly/lnpDC12D
)

Categories