Get data from table using regex php

Get data from table using regex php - php

I want to extract some data from a table using php preg_match_all(). I have the html as under,
I want to get the values in td,
say Product code: RC063154016.
How can I do that? I don'y have any experience with regex,
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td><span>Product code:</span> RC063154016</td>
<td><span>Gender:</span> Female</td>
</tr>
</tbody>
</table>

Use DomDocument
$str = <<<STR
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td><span>Product code:</span> RC063154016</td>
<td><span>Gender:</span> Female</td>
</tr>
</tbody>
</table>
STR;
$dom = new DOMDocument();
#$dom->loadHTML($str);
$tds = $dom->getElementsByTagName('td');
foreach($tds as $td){
echo $td->nodeValue . '<br>';
}
OUTPUT
Product code: RC063154016
Gender: Female

This should do for you:
preg_match_all('|<td><span>Product code:</span>([^<]*)</td>|', $html, $match);
But if you think there can be random white spaces around tags, then this one:
preg_match_all('|<td>\s*<span>\s*Product code:\s*</span>([^<]*)</td>|', $html, $match);

$data = <<<HTML
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td><span>Product code:</span> RC063154016</td>
<td><span>Gender:</span> Female</td>
</tr>
</tbody>
</table>
HTML;
if(preg_match_all('#<td>\s*<span>Product code:</span>\s*([^<]*)</td>#i', $data, $matches)) {
print_r($matches);
}

Use any one parser and parse the HTML and use it. Don't use preg* functions here. Please read this answer How do you parse and process HTML/XML in PHP?

Related

PHP parsing won't find "span" tags

I'm trying to find the span tags on a website similar to this: http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225. The tags I need are these:
However, when I use code such as the following:
$my_url = 'http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225';
$html = file_get_contents($my_url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//Put your XPath Query here
$my_xpath_query = "//span";
$result_rows = $xpath->query($my_xpath_query);
// Create an array to hold the content of the nodes
$statsListings = array();
//here we loop through our results (a DOMDocument Object)
foreach ($result_rows as $result_object) {
$statsListings[] = $result_object->nodeValue;
}
echo json_encode($statsListings);
The only output I get is [].
If I replace $statsListings[] = $result_object->nodeValue; with $statsListings[] = $result_object->childNodes->item(0)->nodeValue;, I still get the same [] as output. When there are clearly span tags with values, why am I getting nothing?

XPath is not guilty at all.
Span tags are added dinamically. Just have a look at the source code of the page, not the DOM-Structure, which may be already modified by javascript, but use "view-source:" and you will see exactly the same html, as it is parsed by XPath.
It would be a good idea to have a look at the table with class tablelines? probably, you have there everything you may need.
You should skip "maincolor" and "tableheader", and start processing with "light" class.
<table width="98%" class="tablelines" cellpadding="2" border="0" cellspacing="1">
<tr class="maincolor">
<td colspan="8" align="right">All Times Local</td>
</tr>
<tr class="tableheader">
<td width="4%">
<b>GN</b>
</td>
<td nowrap width="21%">
<b>AWAY</b>
</td>
<td nowrap width="21%">
<b>HOME</b>
</td>
<td width="14%"><b>DATE</b></td>
<td width="11%"><b>TIME</b></td>
<td width="8%"><b>SCORE</b></td>
<td nowrap align="right" width="*"><b>BOXSCORE</b></td>
<td nowrap align="center" width="4%"><b>GS</b></td>
</tr>
<tr class="light">
<td></td>
<td>Sioux City
<b>1</b></td>
<td>Sioux Falls
<b>5</b></td>
<td>Tue, Apr 14</td>
<td> 7:05 PM</td>
<td> <b>1 - 5</b> </td>
<td align="right">
<img src="/images/gamelive_icon.gif" title="Click here for Game Live!" alt="Click here for Game Live" border="0">
Final</td>
<td align="center">
<img src="/images/playersection/prostats/gslink.gif" border="0">
</td>
</tr>
For example, try this:
$my_url = 'http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225';
$html = file_get_contents($my_url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//Put your XPath Query here
$my_xpath_query = "//tr[#class='light']/td";
$result_rows = $xpath->query($my_xpath_query);
echo $result_rows->length;
// Create an array to hold the content of the nodes
$statsListings = array();
//here we loop through our results (a DOMDocument Object)
foreach ($result_rows as $result_object) {
$statsListings[] = $result_object->nodeValue;
}
echo json_encode($statsListings);
Probably I have found what you need, and even in nice JSON form:
http://www.pointstreak.com/ajax/trending_ajax.html?action=divisionscoreboard&divisionid=12299&seasonid=14225
{"trending_list":null,"lacrosse_list":null,"hockey_list":null,"soccer_list":null,"baseball_list":null,"softball_list":null,"basketball_list":null,"news_list":null,"news_hockey_list":null,"news_baseball_list":null,"news_baseball_list2":null,"news_softball_list":null,"news_basketball_list":null,"games_list":[{"status":"FINAL","hometeam":"Sioux Falls","homescore":"4","awayteam":"Muskegon","awayscore":"2","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"15\/05","link":"..\/prostats\/boxscore.html?gameid=2672134"},{"status":"FINAL","hometeam":"Muskegon","homescore":"1","awayteam":"Sioux Falls","awayscore":"6","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"10\/05","link":"..\/prostats\/boxscore.html?gameid=2672133"},{"status":"FINAL","hometeam":"Muskegon","homescore":"2","awayteam":"Sioux Falls","awayscore":"3","timeremaining":"0:00","currentperiod":"1st","schedtime":"7:15 pm","gamedate":"09\/05","link":"..\/prostats\/boxscore.html?gameid=2672132"},{"status":"FINAL","hometeam":"Dubuque","homescore":"3","awayteam":"Muskegon","awayscore":"4","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"05\/05","link":"..\/prostats\/boxscore.html?gameid=2662061"},{"status":"FINAL","hometeam":"Muskegon","homescore":"0","awayteam":"Dubuque","awayscore":"6","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"02\/05","link":"..\/prostats\/boxscore.html?gameid=2662060"},{"status":"FINAL","hometeam":"Sioux Falls","homescore":"7","awayteam":"Tri-City","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"02\/05","link":"..\/prostats\/boxscore.html?gameid=2662055"},{"status":"FINAL","hometeam":"Muskegon","homescore":"3","awayteam":"Dubuque","awayscore":"1","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"01\/05","link":"..\/prostats\/boxscore.html?gameid=2662059"},{"status":"FINAL","hometeam":"Sioux Falls","homescore":"4","awayteam":"Tri-City","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:04 pm","gamedate":"01\/05","link":"..\/prostats\/boxscore.html?gameid=2662054"},{"status":"FINAL","hometeam":"Tri-City","homescore":"2","awayteam":"Sioux Falls","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"29\/04","link":"..\/prostats\/boxscore.html?gameid=2664638"},{"status":"FINAL","hometeam":"Dubuque","homescore":"7","awayteam":"Muskegon","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"25\/04","link":"..\/prostats\/boxscore.html?gameid=2662058"}],"division_list":null,"site_network_title":null,"leagueshortname":"USHL","includesportlink":null,"showleaguename":0}

PHP Table Reader

How to read and get the ISP value from html table?
<table style="padding-top:10px;">
<tbody>
<tr>
<th>ISP:</th>
<td>My Provider</td>
</tr>
<tr><th>Organization:</th><td nowrap=""></td>
</tr>
<tr><th>Connection:</th>
</tbody></table>

Given you lack of information, a regular expression would be the easiest solution.
$matches = array();
preg_match("<th>ISP:</th>[\r\n\s\t]*<td>(.*)</td>", "<th>ISP:</th><td>My Provider</td>...", $matches);
var_dump($matches);

PHP Simple HTML DOM Parser how to get Third table using find method

I have HTML code like following structure.
How can I fetch the 3rd table's content from this HTML code using PHP Simple HTML DOM find method?
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
.......
</table>
</td>
<tr>
..........
<td>
<tr>
</tbody>
</table>

To get third table pass second argument(index start from 0) in find method.
$html = file_get_html(<your_file_url/html_code>);
$html->find("table", 2);

The tables are nested so:
$dom->find("table", 0); # first table
$dom->find("table table", 0); # second table
$dom->find("table table table", 0); # third table

just an idea, try this:
// Find first <table> in first <td>
$html = file_get_html('yours.htm');
$var = $html->find('td', 0)->find('table', 0);

I am not that sure but you can try something like below:
$dom = new DomDocument;
$dom->loadXML($YourHTML);
//I have written as item(1) to point at the second table
$params = $dom->getElementsByTagName('table')->item(1);

php regex or html dom parsing

I use regex for HTML parsing but I need your help to parse the following table:
<table class="resultstable" width="100%" align="center">
<tr>
<th width="10">#</th>
<th width="10"></th>
<th width="100">External Volume</th>
</tr>
<tr class='odd'>
<td align="center">1</td>
<td align="left">
http://xyz.com
</td>
<td align="right">210,779,783<br />(939,265 / 499,584)</td>
</tr>
<tr class='even'>
<td align="center">2</td>
<td align="left">
http://abc.com
</td>
<td align="right">57,450,834<br />(288,915 / 62,935)</td>
</tr>
</table>
I want to get all domains with their volume(in array or var) for example
http://xyz.com - 210,779,783
Should I use regex or HTML dom in this case. I don't know how to parse large table, can you please help, thanks.

here's an XPath example that happens to parse the HTML from the question.
<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);
$trs = $xpath->query("//table[#class='resultstable'][1]/tr");
foreach ($trs as $tr) {
$tdList = $xpath->query("td[2]/a", $tr);
if ($tdList->length == 0) continue;
$name = $tdList->item(0)->nodeValue;
$tdList = $xpath->query("td[3]", $tr);
$vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
echo "name: {$name}, vol: {$vol}\n";
}
?>

What's wrong with this preg_match_all

I'm using file_get_contents to read a .html file that has a table.
<table id="someTable" style="width:100%;margin-bottom:0;">
<tr style="display:none;">
<td style="padding-left:25px;">Some text</td>
</tr>
<tr style="display:none;">
<td style="padding-left:25px;">another text</td>
</tr>
</table>
When I use preg_match_all to read the table, I get nothing when I count $matches[1]
preg_match_all('/<table id="someTable" style="width:100%;margin-bottom:0;">(.*)<\/table>/', $html, $matches);
$co = count($matches[1]);

Add modifier s to your preg_match.
preg_match_all('/<table id="someTable" style="width:100%;margin-bottom:0;">(.*)<\/table>/s', $html, $matches);
See http://ideone.com/3w0K2

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get data from table using regex php - php

This should do for you: preg_match_all('|<td><span>Product code:</span>([^<])</td>|', $html, $match); But if you think there can be random white spaces around tags, then this one: preg_match_all('|<td>\s<span>\sProduct code:\s</span>([^<]*)</td>|', $html, $match);

Use any one parser and parse the HTML and use it. Don't use preg* functions here. Please read this answer How do you parse and process HTML/XML in PHP?

Related

PHP parsing won't find "span" tags

PHP Table Reader

PHP Simple HTML DOM Parser how to get Third table using find method

php regex or html dom parsing

What's wrong with this preg_match_all

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get data from table using regex php - php

This should do for you: preg_match_all('|<td><span>Product code:</span>([^<]*)</td>|', $html, $match); But if you think there can be random white spaces around tags, then this one: preg_match_all('|<td>\s*<span>\s*Product code:\s*</span>([^<]*)</td>|', $html, $match);

Use any one parser and parse the HTML and use it. Don't use preg* functions here. Please read this answer How do you parse and process HTML/XML in PHP?

Related

PHP parsing won't find "span" tags

PHP Table Reader

PHP Simple HTML DOM Parser how to get Third table using find method

php regex or html dom parsing

What's wrong with this preg_match_all

Categories

Resources

This should do for you: preg_match_all('|<td><span>Product code:</span>([^<])</td>|', $html, $match); But if you think there can be random white spaces around tags, then this one: preg_match_all('|<td>\s<span>\sProduct code:\s</span>([^<]*)</td>|', $html, $match);