simple html dom parser or a regular expression

simple html dom parser or a regular expression - php

There is a html page, it contains a block:
<table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center">
<tr>
<td class="tcat" colspan="2">
Some regular text <span class="normal">the desired text 1</span>
</td>
</tr>
<tr>
<td class="alt1" colspan="2">
<span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>
</td>
</tr>
</table>
Help me to parse with simple html dom library or a regular expression, so that would be deduced only here it is:
the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>
If I do this:
<?
include 'simple_html_dom.php';
$html = file_get_html('http://some-url.com/power.html');
foreach($html->find('td[class="tcat"]') as $element1)
echo $element1. '<br>';
foreach($html->find('span[class="smallfont"]') as $element2)
echo $element2. '<br>';
?>
So, along with the necessary data also are displayed more similar elements that presents on the page. (with the same parameters 'td class="tcat"' and 'class="smallfont"')
I need that would be deduced only that:
the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>

It's all about knowing css:
echo $html->find('td.tcat span', 0)->text();
echo $html->find('span.smallfont', 0);
//the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>

Related

Simple HTML DOM: accessing html elements within results

I'm trying to get a better understanding of PHP Simple HTML DOM and am kinda stuck on the following.
I am trying to retrieve information from one of my user pages by using the following code :
$dom = file_get_html('http://127.0.0.1/comments/top-commenters/');
foreach($dom->find('tr[id*=commenter]') as $result) {
print_r($result->innertext);
}
Which produces for each commenter profile ($result->innertext) the following :
<td class="Position"># 3 </td>
<td class="img" align="center">
<a href="/images/users/814ocnqlN6.jpg">
<img src="/images/users/814ocnqlN6.jpg" info="Image" border="0"/></a>
<a uid="814ocnqlN6"></td>
<td> <b>User 3.</b>
<div class="tiny">Most recent comments</div>
</td>
<td class="NumCredits"> 471 </td>
<td class="NumComments"> 5.439 </td>
<td class="PercUpVotes"> 93% </td>
Now if I would like to access within each result (same foreach loop) for example :
<td class="Position"># 3 </td>
And
<td class="NumComments"> 5.439 </td>
What would be the best way to accomplish this ?

Try:
$dom = file_get_html('http://127.0.0.1/comments/top-commenters/');
foreach($dom->find('tr[id*=commenter]') as $result) {
print_r($result->find('td.Position'));
print_r($result->find('td.NumComments'));
}
}

use of not in xpath query

i am fetching specific data from a site for which i am using XPath but for this i have to exclude few variables for which i have to use NOT. but this NOT is not working in the code please explain what i have to do to make it work :
heres the html code
<tr><td colspan="2" valign="top" align="left"><span class="tl-document">
<left>some text here
</left>
</span></td></tr>
<tr><td colspan="2" valign="top" align="left">
<span class="text-id">some text here,<sup>a</sup><sup>b</sup></span>
<span class="text-id">some text here,<sup>a</sup></span>
</td></tr>
<tr><td colspan="2" valign="top" class="right">
<sup>a</sup>some text here<br>
</td></tr>
<tr><td colspan="2" valign="top" class="right">
<sup>b</sup>some text here<br>
</td></tr>
<td colspan="2" valign="top">
<br><div>
<span class="tl-default">Objective</span>
<p>some text here,</p>
</div>
<div>
<span class="tl-default">Methods</span>
<p>some text here,</p>
</div>
<div>
</td>
<td colspan="2" valign="top">
<br><div>
<span class="tl-default">Objective</span>
<p>some text here,</p>
</div>
</td>
trying to fetch only not td containing class and align and for this i am using this method for my xpath :
$getnew="http://www.example.com/;
$html = new DOMDocument();
#$html->loadHtmlFile($getnew);
$xpath = new DOMXPath( $html );
$y = $xpath->query('//td[#colspan="2" and valign="top" and (not(#class and #align))]');
$ycnt = $y->length;
for ( $idf=6; $idf<$ycnt; $idf++)
{ if($idf==6){
echo "<p class='artbox'>".$y->item($idf)->nodeValue."</p>";}
}
i am new to this so please suggest your opinions

The problem with your logic is that no elements have both #class and #align, so the not() will always yield true.
Instead you should exclude elements that have either attribute:
//td[#colspan="2" and #valign="top" and not(#class or #align)]
Alternatively, to match elements that only have those two attributes, you can add a count() condition:
//td[#colspan="2" and #valign="top" and count(#*)=2]
Update
$query = '//td[#colspan="2" and #valign="top" and not(#class or #align)]';
foreach ($xpath->query($query) as $node) {
// do something with $node
}

how to fetch required content only using strip_tags function in php

I'am using strip_tags function to fetch only required content but it fetches the whole data from a link
see the example code below i m using to fetch content from a link:
<?php
$a=fopen("http://example.com/","r");
$contents=stream_get_contents($a);
fclose($a);
$contents1=strtolower($contents);
$start='<div id="content">';
$start_pos=strpos($contents1,$start);
$first_trim=substr($contents1,$start_pos);
$stop='</div><!-- content -->';
$stop_pos=strpos($first_trim,$stop);
$second_trim=substr($first_trim,0,$stop_pos+6);
$second_trim = strip_tags($second_trim, '<div><table><tbody><tr><td><a><h2><h4>');
echo "<div>$second_trim</div>";
?>
here is the html code fetched in $second_trim:
<div><div id="content">
<div id="issuedescription"></div>
<h2 class="wsite-content-title" style="text-align:center;">download content<br /><font color="#f30519">table of content</font><br /> <font color="#f80117"> content </font></h2>
<h2>table of contents</h2>
<h4 class="tocsectiontitle">editorial</h4>
<h2 class="wsite-content-title" style="text-align:left;">technical note</h2>
<table class="tocarticle" width="100%">
<tr valign="top">
<td class="toctitle" width="95%" align="left">where are we at and where are we heading to? </td>
<td class="tocgalleys" width="5%" align="left">
pdf
</td>
</tr>
<tr>
<td class="tocauthors" width="95%" align="left">
sergio eduardo de paiva gonã§alves </td>
<td class="tocpages" width="5%" align="left">1-2</td>
</tr>
</table>
<div class="separator"></div>
h4 class="tocsectiontitle">some text here</h4>
<table class="tocarticle" width="100%">
<tr valign="top">
<td class="toctitle" width="95%" align="left">some text here</td>
<td class="tocgalleys" width="5%" align="left">
pdf
</td>
</tr>
<tr>
<td class="tocauthors" width="95%" align="left">
some text here, some text here, some text here, some text here, some text here, some text here </td>
<td class="tocpages" width="5%" align="left">3-10</td>
</tr>
</table>
<a target="_blank" rel="license" href="http://example.com/">
</a>
some text here<a rel="license" target="_blank" href="http://example.com/">example</a>.
</div></div>
Now my problem is i want to fetch a particular tag only, from the whole content like 2nd anchor from two of given below using strip_tag function
pdf
some text here
and 2nd header tag from two of given below:
<h2 class="wsite-content-title" style="text-align:center;">download content<br /><font color="#f30519">table of content</font><br /> <font color="#f80117"> content </font></h2>
<h2>table of contents</h2>
but strip tag function is either fetching all of them or none of them , So how can i make them identify to fetch the tag I want instead of fetching all the similar tags.If their is any better way to do this please share your ideas here !!

A regexp can do such a thing:
function handle_link($data) {
list($link, $attributes, $content) = $data;
$classes = preg_match('#class=[\'"]([^\'"]+)[\'"]#', $attributes, $match) ? preg_split('#\s+#', $match[1]) : array();
// If the link has the "file" class
if(in_array('file', $classes)) {
return $content; // only the internal content (like strip_tags would do)
// or you can return a new link:
// return '' . $content . '';
} else {
return $link; // all the link not filtered
}
}
$second_trim = strip_tags($second_trim, '<div><table><tbody><tr><td><h2><h4>');
$second_trim = preg_replace_callback('#<a([^>]*)>(.+)</a>#U', 'handle_link', $second_trim);

How to make my output text scrollable using PHP?

The text that I am trying to echo from $data is way too long and it is going off the screen beyond the boundaries of the table. In addition to that all the text is getting displayed without line breaks (or blank lines) or proper spacing.
My simple PHP code:
<div id="sampleid1" class="tabcontent" style="margin-left:48px;">
<table width="510" border="0" cellspacing="4" cellpadding="4" class="SampleClass">
<tr>
<td>
<?php echo Sample1_LABEL;?>
</td>
<th align="left"><strong>:</strong>
</th>
<td>
<?php echo $data[ 'Sample1'];?>
</td>
</tr>
<tr>
<td>
<?php echo Sample2_LABEL;?>
</td>
<th align="left"><strong>:</strong>
</th>
<td>
<?php echo $data[ 'Sample2'];?>
</td>
</tr>
</table>
</div>
In Summary:
I need the text that I am retrieving using the $data to get formatted in such a way that the line breaks are displayed in the output upon echoing.
I need that output text to be scrollable so no text will go off the screen beyond the boundaries.

you need to set CSS to the TD tag where you echo that $data...
<td style="height:150px; overflow-y:scroll;">
My mistake.. TD dont accept overflow so you may do this::
<td style="height:150px"><div style="height:100%; overflow-y:scroll;">..PHPCODE...</div></td>

You could echo a <pre> tag around output where whitespace is significant. Something like:
<?php echo "<pre>".Sample1_LABEL."</pre>";?>

You can try this
<td>
<div style="height:100px; overflow:auto">
<?php echo $data['Sample2'];?>
</div>
</td>
You can also try
echo(nl2br($data['Sample2'])); //converts newline to <br /> html tag

How to echo a form in a php var without losing editor highlight syntax

I've been trying the heredoc method like this:
<?php echo $form = <<<HTML ?>
here follows the html...
<form method="post" >
<table border="0" cellpadding="0" cellspacing="0" id="reserv_table">
<tr>
<td><span id="message"><b>for you : 9€</b> </span></td>
<td><input type="text" name="_membre" id="_membre" style="width: 40px;" class="text disabled" disabled="disabled" /></td>
<td valign="top">
<table>
<tr>
<?php HTML; ?>
But I get an error.

If you're just echoing the form you don't even need PHP's heredoc syntax. Just break out of PHP using the closing delimiter, then return to PHP once you're done outputting the HTML form.
EDIT: OK, so you need the form output to be stored in a variable, but you also want it to be printed to the browser? No problem, use output buffering (in particular, ob_start() and ob_get_flush():
<?php
// Begin output buffering
ob_start();
// Break out of PHP...
?>
<form method="post" >
<table border="0" cellpadding="0" cellspacing="0" id="reserv_table">
<tr>
<td><span id="message"><b>for you : 9€</b> </span></td>
<td><input type="text" name="_membre" id="_membre" style="width: 40px;" class="text disabled" disabled="disabled" /></td>
<td valign="top">
<table>
<tr>
<?php
// Back to PHP, store the HTML in $form AND print it to the browser
$form = ob_get_flush();
?>

Your PHP tags <?php and ?> are messing with your Heredoc tags.
Remove the ?> at the end of the first line
Remove the <?php and the space at the start of the last line
Remove the space and the ?> at the end of the last line and place it one line below
This is all assuming that you need all the code to be placed into the $form variable for some reason. If you don't, pick BoltClock's answer.
If after fixing your code, your editor won't syntax highlight it properly, then your editor is at fault. You could either replace it with a different editor, or just avoid using Heredoc syntax. I tend to avoid using Heredoc but mainly because I think it makes things messy.

Why not to try heredoc according to it's syntax rules?
though to print out this form, you need no heredoc.
just print it out as is
here follows the html...
<form method="post" >
<table border="0" cellpadding="0" cellspacing="0" id="reserv_table">
<tr>
<td><span id="message"><b>for you : 9€</b> </span></td>
<td><input type="text" name="_membre" id="_membre" style="width: 40px;" class="text disabled" disabled="disabled" /></td>
<td valign="top">
<table>
<tr>

Why don't you just remove the php code like this
<form method="post" >
<table border="0" cellpadding="0" cellspacing="0" id="reserv_table">
<tr>
<td><span id="message"><b>for you : 9€</b> </span></td>
<td><input type="text" name="_membre" id="_membre" style="width: 40px;" class="text disabled" disabled="disabled" /></td>
<td valign="top">
<table>
<tr>
Mean no echo is needed.

If you need to send static data you can do it putting it directly in the code outside the PHP tags
ex:
<?php
echo "PHP Code";
?>
HTML code
<?php
echo "PHP Code again";
?>
Also if you are triying to create a PHP String you need to keep all the content in your PHP Code
ex:
<?php
echo "PHP Code";
$heredoc_variable = <<<HTMLCODE
<b> This is html code into a heredoc variable in php </b>
HTMLCODE;
?>

If you really need it to be a variable and are looking for a really simple method, just include the entire form syntax in a variable with escaped quotes like so:
<?php
$output = "
<form method=\"post\" >
<table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" id=\"reserv_table\">
<tr>
<td><span id=\"message\"><b>for you : 9€</b> </span></td>
<td><input type=\"text\" name=\"_membre\" id=\"_membre\" style=\"width: 40px;\" class=\"text disabled\" disabled=\"disabled\" /></td>
<td valign=\"top\">
<table>
<tr>
";
echo $output; //or wherever you need to echo it...
?>
This method is a simple solution that is bound to use less resources than ob_start(); in my opinion.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

simple html dom parser or a regular expression - php

It's all about knowing css: echo $html->find('td.tcat span', 0)->text(); echo $html->find('span.smallfont', 0); //the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>

Related

Simple HTML DOM: accessing html elements within results

use of not in xpath query

how to fetch required content only using strip_tags function in php

How to make my output text scrollable using PHP?

How to echo a form in a php var without losing editor highlight syntax

Categories

Resources