I've the following code which extracts domain name from the input and stores them in an array
foreach ($output as $domList)
{
$extensionList = explode(",", "org,com,net");
$pattern = '/(\s{0,}|\.)([-a-z0-9]+\.(' . implode("|", $extensionList) . '))\s{1,}/i';
$matches = array();
preg_match_all($pattern, $domList, $matches);
}
matches[0] contains all domains extracted
How can i modify it to extract subdomains as well ?
Sample input and expected output would definitely help (I took creative license with the input). The idea in the new regex is to continue eating up anything that isn't .com,.org, or .net. Matches[0] should now yield all domains and subdomains.
$output = array("a" => " test.com test.sub.com", "b"=> "a.com a.b.com b.c.a.com" );
foreach ($output as $domList)
{
$extensionList = explode(",", "org,com,net");
$pattern = '/\s*([-a-z0-9]+\.)+' . implode("|", $extensionList) . '\s*/i';
$matches = array();
preg_match_all($pattern, $domList, $matches);
// foreach ($matches[0] as $val) {
// echo "matched: " . $val . "\n";
}
It shouldn't be difficult to tweak this to your needs.
Related
In my article, I want to automatically add links to keywords.
My keywords array:
$keywords = [
0=>['id'=>1,'slug'=>'getName','url'=>'https://example.com/1'],
1=>['id'=>2,'slug'=>'testName','url'=>'https://example.com/2'],
2=>['id'=>3,'slug'=>'ign','url'=>'https://example.com/3'],
];
This is my code:
private function keywords_replace(string $string, array $key_array)
{
$array_first = $key_array;
$array_last = [];
foreach ($array_first as $key=>$value)
{
$array_last[$key] = [$key, $value['slug'], '<a target="_blank" href="' . $value['url'] . '" title="' . $value['slug'] . '">' . $value['slug'] . '</a>'];
}
$count = count($array_last);
for ($i=0; $i<$count;$i++)
{
for ($j=$count-1;$j>$i;$j--)
{
if (strlen($array_last[$j][1]) > strlen($array_last[$j-1][1]))
{
$tmp = $array_last[$j];
$array_last[$j] = $array_last[$j-1];
$array_last[$j-1] = $tmp;
}
}
}
$keys = $array_last;
foreach ($keys as $key)
{
$string = str_ireplace($key[1],$key[0],$string);
}
foreach ($keys as $key)
{
$string = str_ireplace($key[0],$key[2],$string);
}
return $string;
}
result:
$str = "<p>Just a test: getName testName";
echo $this->keywords_replace($str,$keywords);
like this:Just a test: getName testName
very import: If the string has no spaces, it will not match.Because I will use other languages, sentences will not have spaces like English. Like Wordpress key words auto link
I think my code is not perfect,Is there a better algorithm to implement this function? Thanks!
You can use array_reduce and preg_replace to replace all occurrences of the slug words in your string with the corresponding url values:
$keywords = [
0=>['id'=>1,'slug'=>'getName','url'=>'https://www.getname.com'],
1=>['id'=>2,'slug'=>'testName','url'=>'https://www.testname.com'],
2=>['id'=>3,'slug'=>'ign','url'=>'https://www.ign.com'],
];
$str = "<p>Just a test: getName testName";
echo array_reduce($keywords, function ($c, $v) { return preg_replace('/\\b(' . $v['slug'] . ')\\b/', $v['url'], $c); }, $str);
Output:
<p>Just a test: https://www.getname.com https://www.testname.com
Demo on 3v4l.org
Update
To change the text into links, you need to use this:
echo array_reduce($keywords,
function ($c, $v) {
return preg_replace('/\\b(' . $v['slug'] . ')\\b/',
'$1', $c);
},
$str);
Output:
<p>Just a test: getName testName
Updated demo
Update 2
Because some of the links that are being substituted include words that are also values of slug, it's necessary to do all the replacements at once using the array format of strtr. We build an array of patterns and replacements using array_column, array_combine and array_map, then pass that to strtr:
$reps = array_combine(array_column($keywords, 'slug'),
array_map(function ($k) { return '' . $k['slug'] . ''; }, $keywords
));
$newstr = strtr($str, $reps);
New demo
First you need to change structure of array to key/value using loop that result stored in $newKeywords. Then using preg_replace_callback() select every word in string and check that it exist in key of array. If exist, wrap it in anchor tag.
$newKeywords = [];
foreach ($keywords as $keyword)
$newKeywords[$keyword['slug']] = $keyword['url'];
$newStr = preg_replace_callback("/(\w+)/", function($m) use($newKeywords){
return isset($newKeywords[$m[0]]) ? "<a href='{$newKeywords[$m[0]]}'>{$m[0]}</a>" : $m[0];
}, $str);
Output:
<p>Just a test: <a href='https://www.getname.com'>getName</a> <a href='https://www.testname.com'>testName</a></p>
Check result in demo
My answer uses preg_replace as does Nick's above.
It relies on the patterns and replacements being equally sized arrays, with corresponding patterns and replacements.
Word boundaries need to be respected, which I doubt you can do with a simple string replacement.
<?php
$keywords = [
0=>['id'=>1,'slug'=>'foo','url'=>'https://www.example.com/foo'],
1=>['id'=>2,'slug'=>'bar','url'=>'https://www.example.com/bar'],
2=>['id'=>3,'slug'=>'baz','url'=>'https://www.example.com/baz'],
];
foreach ($keywords as $item)
{
$patterns[] = '#\b(' . $item['slug'] . ')\b#i';
$replacements[] = '$1';
}
$html = "<p>I once knew a barbed man named <i>Foo</i>, he often visited the bar.</p>";
print preg_replace($patterns, $replacements, $html);
Output:
<p>I once knew a barbed man named <i>Foo</i>, he often visited the bar.</p>
This is my answer: thanks for #Nick
$content = array_reduce($keywords , function ($c, $v) {
return preg_replace('/(>[^<>]*?)(' . $v['slug'] . ')([^<>]*?<)/', '$1$2$3', $c);
}, $str);
I am very new to PHP and want to learn. I am trying to make a top-list for my server but I have a problem. My file is built like this:
"Name" "Kills"
"^0user1^7" "2"
"user2" "2"
"user3" "6"
"user with spaces" "91"
But if I want to read this with PHP it fails because the user has spaces.
That's the method I use to read the file:
$lines = file('top.txt');
foreach ($lines as $line) {
$parts = explode(' ', $line);
echo isset($parts[0]) ? $parts[0] : 'N/A' ;
}
Maybe someone knows a better method, because this don't work very well :D.
You need REGEX :-)
<?php
$lines = array(
'"^0user1^7" "2"',
'"user2" "2"',
'"user3" "6"',
'"user with spaces" "91"',
);
$regex = '#"(?<user>[a-zA-Z0-9\^\s]+)"\s"(?<num>\d+)"#';
foreach ($lines as $line) {
preg_match($regex, $line, $matches);
echo 'user = '.$matches['user'].', num = '.$matches['num']."\n";
}
In the regex, we have # delimiters, then look for stuff between quotes. Using (?PATTERN) gives you a named capture group. The first looks for letters etc, the second digits only.
See here to understand how the regex is matching!
https://regex101.com/r/023LlL/1/
See it here in action https://3v4l.org/qDVuf
For your process this might help
$lines = file('top.txt');
$line = explode(PHP_EOL, $lines); // this will split file content line by line
foreach ($line as $key=>$value_line ) {
echo str_replace(" ","",$value_line);
}
As I commented above, below is a simple example with JSON.
Assuming, you have stored records in JSON format:
$json = '{
"user1": "12",
"sad sad":"23"
}';
$decoded = json_decode($json);
foreach($decoded as $key => $value){
echo 'Key: ' . $key . ' And value is ' . $value;
}
And here is the demo link: https://3v4l.org/ih1P7
How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
I need some help with twitter hashtag, I need to extract a certain hashtag as string variable in PHP.
Until now I have this
$hash = preg_replace ("/#(\\w+)/", "#$1", $tweet_text);
but this just transforms hashtag_string into link
Use preg_match() to identify the hash and capture it to a variable, like so:
$string = 'Tweet #hashtag';
preg_match("/#(\\w+)/", $string, $matches);
$hash = $matches[1];
var_dump( $hash); // Outputs 'hashtag'
Demo
I think this function will help you:
echo get_hashtags($string);
function get_hashtags($string, $str = 1) {
preg_match_all('/#(\w+)/',$string,$matches);
$i = 0;
if ($str) {
foreach ($matches[1] as $match) {
$count = count($matches[1]);
$keywords .= "$match";
$i++;
if ($count > $i) $keywords .= ", ";
}
} else {
foreach ($matches[1] as $match) {
$keyword[] = $match;
}
$keywords = $keyword;
}
return $keywords;
}
As i understand you are saying that
in text/pargraph/post you want to show tag with hash sign(#) like this:- #tag
and in url you want to remove # sign because the string after # is not sended to server in request so i have edited your code and try out this:-
$string="www.funnenjoy.com is best #SocialNetworking #website";
$text=preg_replace('/#(\\w+)/','<a href=/hash/$1>$0</a>',$string);
echo $text; // output will be www.funnenjoy.com is best <a href=search/SocialNetworking>#SocialNetworking</a> <a href=/search/website>#website</a>
Extract multiple hashtag to array
$body = 'My #name is #Eminem, I am rap #god, #Yoyoya check it #out';
$hashtag_set = [];
$array = explode('#', $body);
foreach ($array as $key => $row) {
$hashtag = [];
if (!empty($row)) {
$hashtag = explode(' ', $row);
$hashtag_set[] = '#' . $hashtag[0];
}
}
print_r($hashtag_set);
You can use preg_match_all() PHP function
preg_match_all('/(?<!\w)#\w+/', $description, $allMatches);
will give you only hastag array
preg_match_all('/#(\w+)/', $description, $allMatches);
will give you hastag and without hastag array
print_r($allMatches)
You can extract a value in a string with preg_match function
preg_match("/#(\w+)/", $tweet_text, $matches);
$hash = $matches[1];
preg_match will store matching results in an array. You should take a look at the doc to see how to play with it.
Here's a non Regex way to do it:
<?php
$tweet = "Foo bar #hashTag hello world";
$hashPos = strpos($tweet,'#');
$hashTag = '';
while ($tweet[$hashPos] !== ' ') {
$hashTag .= $tweet[$hashPos++];
}
echo $hashTag;
Demo
Note: This will only pickup the first hashtag in the tweet.
If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site http://www.tunyurl.com/34934";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site www.tinyurl.com/55555";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
return implode(' ',$U);
}
$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
preg_match(
"/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
$string,
$M);
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
}
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
}
return implode(' ',$U);
}
$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
returns:
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
'/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i'
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
break;
}
}
$new_string .= " ".$value;
}
return $new_string;
}
You would need to write a regular expression to extract out the urls.