this is my code
<?php
$string = 'this
this
good
good
hahah';
$rows = explode("\n",$string);
$unwanted = 'this|good';
$cleanArray= preg_grep("/$unwanted/i",$rows,PREG_GREP_INVERT);
$cleanString=implode("\n",$cleanArray);
print_r ( $cleanString );
?>
display
hahah
i want like this
this
good
hahah
i want to keep one...
please help me, thanks guys
This code resorts to checking each line to see if it matches your $unwanted string, but it also creates an array of strings it has already encountered so it checks if it has previously been encountered ( using in_array()). If it matches and has been encountered before it uses unset() in the original $rows to remove the line...
$string = 'this
this
good
good
hahah';
$rows = explode("\n",$string);
$unwanted = 'this|good';
$matched = [];
foreach ( $rows as $line => $row ) {
if ( preg_match("/$unwanted/i",$row, $matches)) {
if ( in_array(trim($matches[0]), $matched) === true ) {
unset($rows[$line]);
}
$matched[] = $matches[0];
}
}
$cleanString=implode("\n",$rows);
print_r ( $cleanString );
<?php
$string = 'this
this
good
yyyy
good
xxxx
hahah';
print_r(
implode("\n",
array_diff(array_unique(
array_map(function($v) { return trim($v);}, explode("\n",$string))
)
,array('xxxx', 'yyyy')))
);
?>
output:
this
good
hahah
Refer: https://ideone.com/Eo0MIM
You can use this simple code to get result:
$result = array_unique(explode("\n",str_replace(" ", "", $string)));
print_r ($result);
If you want more control over your data, use this code
$rows = explode("\n", $string);
$words = [];
foreach($rows as $row) {
$row = trim($row);
$words[$row] = true;
}
foreach($words as $word => $tmp) {
echo $word . "\n";
}
Here is one way you could do this:
$string = 'this
this
good
good
hahah';
preg_match_all('/([a-z])+/', $string, $matches);
$string = implode("\n",array_unique($matches[0]));
echo $string;
You can use php inbuilt array_unique function
<?php
$string = 'this
this
good
good
haha';
$rows = explode("\n",$string);
$cleanArray = array_unique($rows);
$cleanString=implode("\n",$cleanArray);
print_r ( $cleanString );
//result is this good haha
I'm trying to extract links from html page using DOM:
$html = file_get_contents('links.html');
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$a = $DOM->getElementsByTagName('a');
foreach($a as $link){
//echo out the href attribute of the <A> tag.
echo $link->getAttribute('href').'<br/>';
}
Output:
http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html
http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html
http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html
I would like to remove all results matching dontwantthisdomain.com, dontwantthisdomain2.com and dontwantthisdomain3.com so the output will looks like that:
http://domain1.com/page-X-on-domain-com.html
http://domain.com/page-XZ-on-domain-com.html
http://domain3.com/page-XYZ-on-domain3-com.html
Some people saying I should not use regex for html and others that it's ok. Could somebody point the best way how I can remove unwanted urls from my html file? :)
Maybe something like this:
function extract_domains($buffer, $whitelist) {
preg_match_all("#<a\s+.*?href=\"(.+?)\".*?>(.+?)</a>#i", $buffer, $matches);
$result = array();
foreach($matches[1] as $url) {
$url = urldecode($url);
$parts = #parse_url((string) $url);
if ($parts !== false && in_array($parts['host'], $whitelist)) {
$result[] = $parts['host'];
}
}
return $result;
}
$domains = extract_domains(file_get_contents("/path/to/html.htm"), array('stackoverflow.com', 'google.com', 'sub.example.com')));
It does a rough match on the all the <a> with href=, grabs what's between the quotes, then filters it based on your whitelist of domains.
None regex solution (without potential errors :-) :
$html='
http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html
http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html
http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html
';
$html=explode("\n", $html);
$dontWant=array('dontwantthisdomain.com','dontwantthisdomain2.com','dontwantthisdomain3.com');
foreach ($html as $link) {
$ok=true;
foreach($dontWant as $notWanted) {
if (strpos($link, $notWanted)>0) {
$ok=false;
}
if (trim($link=='')) $ok=false;
}
if ($ok) $final_result[]=$link;
}
echo '<pre>';
print_r($final_result);
echo '</pre>';
outputs
Array
(
[0] => http://domain1.com/page-X-on-domain-com.html
[1] => http://domain.com/page-XZ-on-domain-com.html
[2] => http://domain3.com/page-XYZ-on-domain3-com.html
)
// get CONTENT from united domains footer
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
// remove spaces from CONTENT
$content = preg_replace('/\s+/', '', $content);
// match all tld tags
$regex = '#target="_parent">.(.*?)</a></li><li>#';
preg_match($regex, $source, $matches);
print_r($matches);
I am wanting to match all of the TLDs:
Each tld is preceded by target="_parent">. and followed by </a></li><li>
I am wanting to end up with an array like array('africa','amsterdam','bnc'...ect ect )
What am I doing wrong here?
NOTE: The second step to remove all the spaces is just to simplify things.
Here's a regular expression that will do it for that page.
\.\w+(?=</a></li>)
REY
PHP
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
preg_match_all('/\.\w+(?=<\/a><\/li>)/m', $content, $matches);
print_r($matches);
PHPFiddle
Here are the results:
.africa, .amsterdam, .bcn, .berlin, .boston, .brussels, .budapest, .gent, .hamburg, .koeln, .london, .madrid, .melbourne, .moscow, .miami, .nagoya, .nyc, .okinawa, .osaka, .paris, .quebec, .roma, .ryukyu, .stockholm, .sydney, .tokyo, .vegas, .wien, .yokohama, .africa, .arab, .bayern, .bzh, .cymru, .kiwi, .lat, .scot, .vlaanderen, .wales, .app, .blog, .chat, .cloud, .digital, .email, .mobile, .online, .site, .mls, .secure, .web, .wiki, .associates, .business, .car, .careers, .contractors, .clothing, .design, .equipment, .estate, .gallery, .graphics, .hotel, .immo, .investments, .law, .management, .media, .money, .solutions, .sucks, .taxi, .trade, .archi, .adult, .bio, .center, .city, .club, .cool, .date, .earth, .energy, .family, .free, .green, .live, .lol, .love, .med, .ngo, .news, .phone, .pictures, .radio, .reviews, .rip, .team, .technology, .today, .voting, .buy, .deal, .luxe, .sale, .shop, .shopping, .store, .eus, .gay, .eco, .hiv, .irish, .one, .pics, .porn, .sex, .singles, .vin, .vip, .bar, .pizza, .wine, .bike, .book, .holiday, .horse, .film, .music, .party, .email, .pets, .play, .rocks, .rugby, .ski, .sport, .surf, .tour, .video
Using the DOM is cleaner:
$doc = new DOMDocument();
#$doc->loadHTMLFile('http://www.uniteddomains.com/index/footer/');
$xpath = new DOMXPath($doc);
$items = $xpath->query('/html/body/div/ul/li/ul/li[not(#class)]/a[#target="_parent"]/text()');
$result = '';
foreach($items as $item) {
$result .= $item->nodeValue; }
$result = explode('.', $result);
array_shift($result);
print_r($result);
How can I extract the number 12345 from the following string in PHP ?
<span id="jordan934" itemprop="distance"><span class='WebDistance'>#$#20B9; </span>12345</span></h3>
I was using the following until that '#$#20B9' string was not in it .
$results = $dom->query('#jordan934"]');
$distance = false;
if (count($results)) {
$distance = (int)trim($results->current()->textContent);
}
return $distance;
}
Try using regular expression
$str = 'jordan934';
preg_match_all('!\d+!', $str, $matches);
print_r($matches);
You could use a dom object
<?php
$html = '<span id="jordan934" itemprop="distance"><span class=\'WebDistance\'>#$#20B9; </span>12345</span></h3>';
$dom = new DomDocument();
$dom->loadHTML($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
var_dump($nodes);
Other not widely known function is:
$str = 'jordan934';
$int = filter_var($str, FILTER_SANITIZE_NUMBER_INT);
http://php.net/manual/pl/function.filter-var.php
I want to modify the withdrawal of an array of strings where the start and end are found
<?php
$file = ('http://gdata.youtube.com/feeds/base/users/BBCArabicNews/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile');
$string=file_get_contents($file);
function findinside($start, $end, $string) {
preg_match_all('/' . preg_quote($start,'/') . '(.+?)'. preg_quote($end, '/').'/si', $string, $m);
return $m[1];
}
$start = ':video:';
$end = '</guid>';
$out = findinside($start, $end, $string);
$out = findinside($start, $end, $string);
foreach($out as $string){
echo $string;
echo "<p></td>\n";
}
?>
Results
Q80QSzgPDD8
ozei4GysBN8
ak3bbs_UxP0
rUs-r3ilTG4
p4BO6FI5sPY
j5lclrPzeVU
dK5VWTYsJaM
mERug-d536k
h0zqd3bC0-E
ije5kuSfLKY
H9XXMPvEpHM
EK5UoQqYl4U
This works properly in withdrawing of an array of strings I want to add also
$start = '</pubDate><atom:updated>';
$end = '</atom:updated>';
I want to be Show two array of strings
Example
xSD0XJLkLQid
2011-11-08T17:36:14.000Z
bFU066NwVnD
2011-12-08T17:36:14.000Z
Can I do this with this code
Greetings
You can use PHP's DOMDocument parser like this:
$objDOM = new DOMDocument();
$objDOM->load($file); // the long one from youtube
$dates = $objDOM->getElementsByTagName("pubDate");
foreach ($dates as $node)
{
echo $node->nodeValue;
}
Use a DOM parser and then a regex parser in individual elements in the DOM (using things like getElementById()). It works better and is more failsafe.