Extracting text using preg_match - php

I am trying to extract a piece of text from an HTML using PHP command preg_match.
Ive successfully parsed the HTML into a variable, but now I got stuck with extracting the right piece of information - probably because I am a bit confused by the syntax of preg_match.
So basically, here is a piece of the HTML I am interested in:
...<tr >
<td >Metuje</td>
<td ><a href="./detail_stanice/307158.html" >Maršov nad Metují</a></td>
<td >A</td>
<td >90</td>
<td >120</td>
<td >150</td>
<td >cm</td>
<td >04.08. 14:20</td>
<td >31</td>
<td >0.53</td>
<td ><img src="./img/ldown.png" width="15" /></td>
</tr>...
What I need is to find this particular row in the table (which contains couple of other rows), so basically I need to search for the name "Maršov nad Metují" in the second cell and then, extract the values of the subsequent cells on that row into a string, in other words in this particular case I would like to have a string with values A, 90, 120, etc. until the end of the row.
On the website there are then other rows with the exact same format just with different values, so I would then use the same syntax to extract values for rows with different names in the second cell.
I have tried it myself, but I was not able to get the right output.
I tried something like this, but this does not solve the problem, I know I have to somehow implement the cell TD commands, but unfortunately I wasnt able to get it right in this particular case.:
preg_match("/Maršov nad Metují(.*?)\<\/tr/", $html, $results);
Any help is very much appreciated.
Thanks

Try this :
<?php
$info = '<tr ><td >Metuje</td><td ><a href="./detail_stanice/307158.html" >Maršov nad Metují</a></td><td >A</td><td >90</td><td >120</td><td >150</td><td >cm</td><td >04.08. 14:20</td><td >31</td><td >0.53</td><td ><img src="./img/ldown.png" width="15" /></td></tr>';
preg_match('/<a href="(.*)" >(.*)</Ui',$info,$result);
print_r($result[2]);// Maršov nad Metují

preg_match_all("/<td.*?>(.+?)<\/td>/is", $html, $matches);
$result = $matches[1];
array_shift($result);
array_shift($result);
print implode(', ', $result);

Related

Scraping using php - preg_match_all

Trying to get the value of Internet Data Volume Balance - the script should echo 146.30mb
New to all these, having a look at all the tutorials.
How can this be done?
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Account Status</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">You exceeded your allowed credit.</FONT></div></td>
</tr>
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Period Free Time Remaining</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text">0:00:00 hours</FONT></div></td>
</tr>
<tr >
<td bgcolor="#F8F8F8"><div align="left"><B><FONT class="tplus_text">Internet Data Volume Balance</FONT></B></div></td>
<td bgcolor="#FFFFFF"><div align="left"><FONT class="tplus_text" style="text-transform:none;">146.30 MB</FONT></div></td>
</tr>
If you were willing to or have already installed phpQuery, you can use that.
phpQuery::newDocumentFileHTML('htmlpage.html');
echo pq('td:eq(6)')->text();
PHP can interact with the DOM just like JavaScript can. This is vastly superior to parsing the markup, as most people will tell you is the wrong approach anyway:
Loading from an HTML File
// Start by creating a new document
$doc = new DOMDocument();
// I've loaded the table into an external file, and am loading it into the $doc
$doc->loadHTMLFile( 'htmlpage.html' );
// Since you have six table cells, I'm calling up all of them
$cells = $doc->getElementsByTagName("td");
// I'm grabbing the sixth cell's textContent property
echo $cells->item(5)->textContent;
This code will output "146.30 MB" to the screen.
Loading from a String
If you have the HTML stored within a string, you can load that into your document as well. We'll change the method used to load the file, into the method used to load from a string:
$str = "<table><tr><td>Foo</td></tr>...</table>";
$doc->loadHTML( $str );
We would then proceed with the same code as above to select the cells, and show their textContent in the output.
Check out the DOMDocument Class.

Reducing amount of images shown out database PHP MySQL

Lets start with;
echo $query_row['winkels'];
This will echo;
<td style="margin-left:3px;"><img src="logo/15.png"/></td> <td style="margin-left:3px;"><img src="logo/11.png"/></td>
Out of my MySql Database, but on the page it will echo one image. If I put more in like example;
<td style="margin-left:3px;"><img src="logo/15.png"/></td> <td style="margin-left:3px;"><img src="logo/11.png"/></td> <td style="margin-left:3px;"><img src="logo/15.png"/></td> <td style="margin-left:3px;"><img src="logo/11.png"/></td>
It will echo 2 images.
When I have more than 20 images shown I want to reduce it to 5 images.
How can I do that?
For example;
$winkels_inject = $query_row['winkels'];
$sub_winkels = substr($winkels_inject, 0, 191);
echo $sub_winkels;
This is perfect when trying to reduce text, but that is what it does when I use it. It reduces the image links and removes html so the images will not be shown. So no image will be shown at all.
How to fix this?
Regards,
F4LLCON
It seems you have a design problem, the only thing you would need to store in a DB, is the number and then every number in a different row.
Anyway, a quick and very dirty solution:
$string_with_breaks = str_replace('td> <td', 'td>__break_here__<td', $query_row['winkels']);
$img_array = explode('__break_here__', $string_with_breaks);
// loop through array and only echo the first 5 elements
$count = 0;
foreach($img_array as $store)
{
echo $store;
$count++;
if ($count > 4)
{
break;
}
}
Miss (or add...) a space between the td tags and it will not work anymore...

Digging deeper into DOMElement

I've used Zend_Dom_Query to extract some <tr> elements and I want to now loop through them and do some more. Each <tr> looks like this, so how can I print the title Title 1 and the id of the second td id=categ-113?
<tr class="sometr">
<th><a class="title">Title1</a></th>
<td class="category" id="categ-113"></td>
<td class="somename">Title 1 name</td>
</tr>
You should just play around with the results. I've never worked with it, but this is how far i got (and im kinda new to Zend myself):
$dom = new ZEnd_Dom_Query($html);
$res = $dom->query('.sometr');
foreach($res as $dom) {
$a = $obj->getElementsByTagName('a');
echo $a->item(0)->textContent; // the title
}
And with this i think you're set to go. For further information and functions to be used of the result look up DOMElement ( http://php.net/manual/de/class.domelement.php ). With this information you should be able to grab all that. But my question is:
Why doing this so complicated, i don't really see a use-case for doing this. As the title and everything else should be something coming from the database? And if it's an XML there's better solutions than relying on Dom_Query.
Anyways, if this was helpful to you please accept and/or vote the answer.

PHP Using domdocument to extract data from html

I have a table with the following structure. I cannot seem to get the data I want.
<table class="gsborder" cellspacing="0" cellpadding="2" rules="cols" border="1" id="d00">
<tr class="gridItem">
<td>Code</td><td>0adf</td>
</tr><tr class="AltItem">
<td>CompanyName</td><td>Some Company</td>
</tr><tr class="Item">
<td>Owner</td><td>Jim Jim</td>
</tr><tr class="AltItem">
<td>DivisionName</td><td> </td>
</tr><tr class="Item">
<td>AddressLine1</td><td>9314 W. SPRING ST.</td>
</tr>
</table>
This table is of course nested within another table within the page. How can I use DomDocument for example to refer to "Code" and "0adf" as a key value pair? They actually don't need to be in a key value pair but I should be able to call them each separately.
EDIT:
Using PHP Simple HTML, I was able to extract the data I needed using this:
$foo = $html->getElementById("d00")->childNodes(1)->childNodes(1);
The problem with this though is that I am getting the two <td></td> tags with my data. Is there a way to only grab the raw data without the tags?
Also, is this the right way to get my data out of this table?
If you're not dead set on using DOMDocument, try using the PHP Simple HTML DOM Parser. This has the benefit of allowing you to parse HTML which is not valid XML as well as providing a nicer interface to the parsed document.
You could write something like:
$html = str_get_html(...);
foreach($html->find('tr') as $tr)
{
print 'First td: ' . $tr->find('td', 0)->plaintext;
print 'Second td: ' . $tr->find('td', 1)->plaintext;
}

I'm using Simple HTML to grab data out of a table and need help

Sorry for the poor title guys, but I'm whooped. I have a table as such:
<table class="gsborder" cellspacing="0" cellpadding="2" rules="cols" border="1" id="d00">
<tr class="gridItem">
<td>Code</td><td>0adf</td>
</tr><tr class="AltItem">
<td>CompanyName</td><td>Some Company</td>
</tr><tr class="Item">
<td>Owner</td><td>Jim Jim</td>
</tr><tr class="AltItem">
<td>DivisionName</td><td> </td>
</tr><tr class="Item">
<td>AddressLine1</td><td>9314 W. SPRING ST.</td>
</tr>
</table>
I'm using the following code to get my data out:
$foo = $html->getElementById("d00")->childNodes(1)->childNodes(1);
The problem with this though is that I am getting the two <td></td> tags with my data. Is there a way to only grab the raw data without the tags?
Also, is this the right way to get my data out of this table?
Try using:
$foo = $html->getElementById("d00")->childNodes(1)->childNodes(1)->plaintext;
or innertext.
// Example
$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);
echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"
taken from: http://simplehtmldom.sourceforge.net/manual.htm
As a rule of thumb, whatever DOM API you are using, once you've located the element(s) you are interested in getting data from, accessing the text nodes they contain requires a bit more work.
Use strip_tags to get raw text.
http://us.php.net/manual/en/function.strip-tags.php
So:
$foo = strip_tags($html->getElementById("d00")->childNodes(1)->childNodes(1));

Categories