I am using simple html dom to fetch datas from other websites. while fetching data it fetches both hyperlinks with plain text and without plain text. I want to remove hyperlinks without plain text(link text) while fetching the data ..
i have tried below codes
if($title==""){ echo "No text";}
and
if(ctype_space($title)) { echo "No text";}
where $title is the plaintext fetched from the website
but both method didnt worked..can any one help
Advance thanks for your help
Until you give us more information on what value is my best guess would be to try something like this
if(empty($title))
{
echo "No Text";
}
Does it really need to be "plain text validation"?
Reading your question it seems you just want to remove links with empty values.
If the latter is true, you can do something like this:
$html = <<<EOL
Text
More Text
EOL;
$dom = new DOMDocument;
$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
if (strlen(trim($link->nodeValue)) == 0) {
$link->parentNode->removeChild($link);
}
}
var_dump($dom->saveHTML());
$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($html);
$links_array = $xPath->query("//a"); // select all a tags
$totalLinks = $links_array->length; // how many links there are.
for($i = 0; $i < $totalLinks; $i++) // process each link one by one
{
$title = $links_array->item($i)->nodeValue; // get LInkText
if($title == '') // if no link text
{
$url = $links_array->item($i)->getAttribute('href');
// do here what you want
}
}
You need to use preg_match, with a regular expression, to extract the link text. For example
if (preg_match("/<a.*?>(.*?)</",$title,$matches))
{
echo $matches[1];
}
Related
I've searched around and around and I'm not sure how this really works.
I have the tags
<taghere>content</taghere>
and i want to pull the "content" so i can put an ifstatement depending on what the "content" is as the "content" is varrying depending on the page
i.e
<taghere>HelloWorld</taghere>
$content = //function that returns the text between <taghere> and </taghere>
if($content == "HelloWorld")
{
//execute function;
}
else if($content =="Bonjour")
{
//execute seperate function
}
i tried using preg but it doesnt seem to work and just returns whatever value is in the lines field instead of actually giving me the information within the tags
If I understand your question correctly, you want the data INSIDE the tag "taghere".
If you are parsing HTML, you should use DOMDocument
Try something similar to this:
<?php
// Assuming your content (the html where those tags are found) is available as $html
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your HTML
libxml_clear_errors();
// Note: Tag names are case sensitive
$text = $dom->getElementsByTagName('taghere');
// Echo the content
echo $text
you can use DomDocument and loadXML to do this
<?php
function doAction($word=""){
$html="<taghere>$word</taghere>";
$doc = new DOMDocument();
$doc->loadXML($html);
//discard white space
$hTwo= $doc->getElementsByTagName('taghere'); // here u use your desired tag
if($hTwo->item(0)->nodeValue== "HelloWorld")
{
echo "1";
}
else if($hTwo->item(0)->nodeValue== "Bonjour")
{
echo "2";
//execute seperate function
}
}
doAction($word="Bonjour");
You cannot do it like that. Technically it is possible but it's more than an overkill. And you mixed up PHP with HTML in a way that doesn't work.
To achieve the thing that you want you have to do something like this:
$content = 'something';
if ($comtent === 'something') {
//do something
}
if ($content === 'something else') {
//do something else
}
echo '<tag>'. $content . '</tag>' ;
Of course you can change $content in the ifs.
Dont forget, you can allways add an ID into a tag so you can reference it with java script.
<tag id='tagid'>blah blah blah </tag>
<script>
document.getElementById(tagid)
</script>
This might be a much simpler way to get what you are thinking about then some of the other responses
I don't know what regex you tried and therefor not what would have been wrong. Might have been the escaping of the <
<?php
if(preg_match('#\<taghere>(.*)\</taghere>#', $document, $a)){
$content = $a[1];
}
?>
I suppose there will be only one
I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
I Have a HTML Table
My Parsing Code is
$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
#$src->loadHTML($result);
$xpath = new DOMXPath($src);
$data=$xpath->query('//td[ contains (#class, "bodytext1") ]');
foreach($data as $datas)
{
echo $datas->nodeValue."<br />";
}
$values=$xpath->query('//tr[ contains (#bgcolor, "f3fafe") ]');
foreach($values as $value)
{
echo $value->nodeValue."<br />";
}
$values1=$xpath->query('//tr[ contains (#bgcolor, "def0fa") ]');
foreach($values1 as $value1)
{
echo $value1->nodeValue."<br />";
}
to be printed, and I want them to be repeated along with other lines as shown above in output i need.
and I want this whole thing in a array so that i can insert it in the database
Can anyone please guide me or give me any hint so that I can do this
This should get you started.
$src = new DOMDocument('1.0', 'utf-8');
$src->formatOutput = true;
$src->preserveWhiteSpace = false;
$src->loadHTML($result);
$xpath = new DOMXPath($src);
// get header data
$data=$xpath->query('//table[1]//td');
$htno = trim(explode(":",$data->item(0)->nodeValue)[1]);
$name = trim(explode(":",$data->item(1)->nodeValue)[1]);
$fatherName=trim(explode(":",$data->item(2)->nodeValue)[1]);
// rows from 2nd table
$values1=$xpath->query('//table[2]//tr');
$header = true; // flag to track whether we've read the header row.
foreach($values1 as $value1)
{
if (!$header) {
$rowdata = str_replace("\r\n"," ",$value1->nodeValue);
echo $htno," ",$name," ",$fatherName," ",$rowdata,"\n";
}
$header = false;
}
Note:
The $header flag is a quick fix. A better Xpath query might eliminate the need for it.
the str_replace near the bottom is ugly but expedient. You might want to play with the xpath query to see if you can improve it.
Output is not formatted for HTML - lines are delimited by \n
I got a warning on one line where it contained &, so I changed it to AND. You might have to preprocess your tables to eliminate those somehow.
you could use third party's dll,such as "Html Agility Pack". a tool which is professional to convert html into xml.
I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>
’I have the following scenario and I'm already spending hours trying to handle it: I'm developing a Wordpress theme (hence PHP) and I want to check whether the content of a post (which is HTML) contains a tag with a certain id/class. If so, I want to extract it from the content and place it somewhere else.
Example: Let's say the text content of the Wordpress post is
<?php
/* $content actually comes from WP function get_the_content() */
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
?>
So how can I extract that div with the class (could also live with giving it an ID), output it (with tags and all that) in one place of the template, and output the rest (without the extracted tag, of course) in another place of the template?
I've already tried with the DOMDocument class, p.i.t.a. to me, maybe I'm too stupid.
Try:
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
$dom = new DomDocument;
$dom->loadHtml($content);
$xpath = new DomXpath($dom);
$contents = '';
foreach ($xpath->query('//div[#class="the-wanted-element"]') as $node) {
$contents = $dom->saveXml($node);
break;
}
echo $contents;
How to get the remaining xml/html:
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
$dom = new DomDocument;
$dom->loadHtml($content);
$xpath = new DomXpath($dom);
foreach ($xpath->query('//div[#class="the-wanted-element"]') as $node) {
$node->parentNode->removeChild($node);
break;
}
$contents = '';
foreach ($xpath->query('//body/*') as $node) {
$contents .= $dom->saveXml($node);
}
echo $contents;