Im trying to extract a part of my website to get some content information. The content of that I'm trying to put into a variable is like:
<table class="tabelaHistorico">
<tr>
<td bgcolor=#ccccc></td>
<td bgcolor=#ccccc>2014</td>
</tr>
<tr>
<td>
Jan
</td>
<td>
9719,46
</td>
</tr>
<tr>
<td>
Fev
</td>
<td>
9421,65
</td>
</tr>
</table>
I tried to do:
$content = file_get_contents("www.website.com");
$pos = strpos($content,"table" , 0);
echo $pos;
printf($pos);
$rest = substr($content, $pos, 5);
echo $rest;
You'll need a proper HTML parser. Luckily, PHP has one built-in.
http://www.php.net/manual/en/domdocument.loadhtml.php
preg_match('!<table class="tabelaHistorico">.+?</table>!s', $content, $match);
echo $match[0];
Related
I'm using the Symfony Crawler component which use XPath itself.
I have a HTML of a nutritional table
<table>
<tr>
<td> Carbohydrate </td>
<td> 10g </td>
</tr>
<tr>
<td> Fat </td>
<td> < 0,1 </td>
</tr>
</table>
This is what I tried
$fatCell = $browser->filterXPath('//td[contains(text(), "Fat")]');
$fatCell->outerHtml() will return
<td>\n
Fat\n
</td>
$fatCell->nextAll()->outerHtml() will return
<td>\n
\n
</td>
And I try to get the information with XPath query, but when I try to access to fat informations, it's empty, it seems that the character < is misunderstood by XPath,
can I do something for this ?
Try
<table>
<tbody>
<tr>
<td>Carbohydrate</td>
<td>10g</td>
</tr>
<tr>
<td>Fat</td>
<td>< 0,1</td>
</tr>
</tbody>
</table>
I have this table:
<?php
$a ="<table class='table table-condensed'>
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>
<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>
<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>
</tr>
<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>
</table>
What I need is to get the value of each <td> inside <tr> as key value pairs as in:
monthly rent => Fr. 1'950.
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17;
So, far only this code returns some result close to what I need but not like the format I was expecting
preg_match_all("/<td>.*/", $a, $matches);
I am trying to find any improvements on this.
You can use the following regex to get the contents from table rows as key/value pairs :
regex to get keys >> (?<=<td>)(?!<strong>).*?(?=<\/td>)
. . . values >> (?<=<strong>).*?(?=<\/strong>)
see demo
PHP
<?php
$re = '/(?<=<strong>).*?(?=<\/strong>)/';
$str = '<table class=\'table table-condensed\'>
<tr>
<td>Monthly rent</td>
<td><strong>Fr. 1\'950. </strong></td>
</tr>
<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>
<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>
</tr>
<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>
</table>';
preg_match_all($re, $str, $matches);
print_r($matches);
?>
So, I was just trying some things, and I wanted to do a table, but when I try to run it, it gives me an error in this part of the code:
echo '
echo '
<table border = 1 width = 300px>
<tr>
<td> Name:
</td>
<td '.$Contacts[$Index]['Name']>
</td>
</tr>
<tr>
<td> Phone:
</td>
<td '.$Contacts[$Index]['Phone number']>
</td>
</tr>
<tr>
<td> Email:
</td>
<td '.$Contacts[$Index]['email']>
</td>
</tr>
';
Can someone please tell me if there is anything wrong I am not seeing?
Concatenation is not done properly.
<?php
echo '
<table border = 1 width = 300px>
<tr>
<td> Name:
</td>
<td>' . $Contacts[$Index]['Name'] .'
</td>
</tr>
<tr>
<td> Phone:
</td>
<td>' . $Contacts[$Index]['Phone number'] . '
</td>
</tr>
<tr>
<td> Email:
</td>
<td>'. $Contacts[$Index]['email'] . '
</td>
</tr>
';
There is an error in your concatenation. I would prefer you close your php tags, output simple html and use php to echo variables when required. It will keep the code clean. Here is an example
?>
<table border = 1 width = 300px>
<tr>
<td> Name:</td>
<td> <?php echo $Contacts[$Index]['Name']; ?> </td>
</tr>
<tr>
<td> Phone:</td>
<td><?php echo $Contacts[$Index]['Phone number']; ?></td>
</tr>
<tr>
<td> Email:</td>
<td> <?php echo $Contacts[$Index]['email'];?></td>
</tr>
<?php
There is another error that you are add values as <td value></td> while the correct way is <td>value<td>
I have resolved that issue.
I'm trying to extract a specific link from a table but is not displaying anything. It's the 3rd link in the td. I thought this would work but doesn't.
here the code:
<?php
$site = 'site';
$html = file_get_html($site);
foreach($html->find('td a', 3) as $element)
echo $element->href;
?>
Here is the HTML
<tr class="evenrow team-600-359">
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal
</td>
<td align="center">
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
</td>
<td>
Premier League
</td>
</tr>
You have invalid HTML. It can be the cause.
Check double closing of TD with 60,003 value.
Just use native DomDocument:
$str = <<<STR
<tr class="evenrow team-600-359">
<td>
Aug 17
</td>
<td>
FT
</td>
<td align="right">
Arsenal
</td>
<td align="center">
1-3
</td>
<td>Aston Villa</td>
<td style="text-align:right;">60,003</td>
</td>
<td>
Premier League
</td>
</tr>
STR;
$dom = new DOMDocument();
#$dom->loadHTML($str);
$elements = $dom->getElementsByTagName('td');
echo '<pre>' . print_r($dom->saveXML($elements->item(2)), true) . '</pre>';
OUTPUT
<td align="right">
Arsenal
</td>
I'm grabbing the content from all the td's in this table with the class="job" using this.
$table01 = $salary->find('table.table01');
$rows = $table01[0]->find('td.job');
Then I'm using this to output it which works, but obviously only outputs it as plaintext, I need to do some more with it...
foreach($table01[0]->find('td.job') as $element) {
$jobs .= $element->plaintext . '<br />';
}
Ultimately I would like it outputted to this format. Notice the a href is using the job name and replacing spaces and / with a -.
<tr>
<td class="small"> Graphic Artist / Designer
$23,755 – $55,335 </td>
</tr>
<tr>
<td class="small"> Sales Associate<br />
$15,577 – $56,290 </td>
</tr>
<tr>
<td class="small"> Film / Video Editor<br />
$24,184 – $94,493 </td>
</tr>
Heres the table im scraping
<table cellpadding="0" cellspacing="0" border="0" class="table01">
<tr>
<td class="head">Test</td>
<td class="job">
Graphic Artist / Designer<br/>
$23,755 – $55,335
</td>
</tr>
<tr>
<td class="head">Test</td>
<td class="job">
Sales Associate<br/>
$15,577 – $56,290
</td>
</tr>
<tr>
<td class="head">Test</td>
<td class="job">
Film / Video Editor<br/>
$24,184 – $94,493
</td>
</tr>
</table>
may be better to use regexps
<?php
$html=file_get_contents('1.html');
$jobs='';
if(preg_match_all("/<tr>.*?<td.*?>.*?<\/td>.*?<td\sclass=\"job\">.*?<a.+?href=\"(.+?)\".+?>(.*?)<\/a>(.*?)<\/td>.*?<\/tr>/ims", $html, $res))
{
foreach($res[1] as $i=>$uri)
{
$uri=strtolower(urldecode($uri));
$uri=preg_replace("/_\/_/",'-',$uri);
$uri=preg_replace("/_/",'-',$uri);
$jobs.='<tr><td class="small"> '.$res[2][$i].''.$res[3][$i].'</td></tr>'."\n";
}
}
echo $jobs;