preg_match_all() Uses - php

I have a variable $html
in which this code is stored
<form action="track_mobile.asp" method="post" name="TrackMobile">
<table width="99%" height="55" border="0" cellpadding="2" cellspacing="0">
<tr>
<td class="Heading2" colspan="2"> Track Any Mobile Location</td>
</tr>
<tr>
<td width="21" valign="top" rowspan="2" class="s2stextbox" valign="top"><img src="../images/operators_logo/Tata.png" width="134" height="121" align="left"> </td>
<td width="897" valign="top" class="s2stextbox"><font size="2"><b>Mobile Number: </b>918888888888</font></td>
</tr>
<tr>
<td width="897" valign="top" class="s2stextbox" valign="top"><font size="2"> <b>User Name:</b> We are unable to trace the Name for this Mobile Number<br>
<font size="2"><b>Mobile Operator Name:</b> TATA TELESERVICES<br>
<b>State/Region: </b>Maharashtra</font></td>
</tr> </table>
</form>
In which every item after ":(semicollon)" are random which comes different every time.
Plz give me correct syntax to get echo as
Mobile Number: 918888888888
User Name: We are unable to trace the Name for this Mobile Number
Mobile Operator Name: IDEA
State/Region: Maharashtra
in which
these are ramdomly generated, different evey time so, that preg_match search this loaction and echo the text which are there in same loaction
918888888888
We are unable to trace the Name for this Mobile Number
IDEA
Maharashtra

1. Strip-out HTML tag
$text = strip_tags($html);
2. Match using text before column
I just change the regexp and display each value match by (.*) (everything that follow the pattern up to the end of line:
preg_match('/Mobile Number: (.*)/', $html, $matches);
echo $matches[1];
preg_match('/User Name: (.*)/', $html, $matches);
echo $matches[1];
preg_match('/Mobile Operator Name: (.*)/', $html, $matches);
echo $matches[1];
preg_match('/State/Region: (.*)/', $html, $matches);
echo $matches[1];

This is a work for XPath (see [SimpleXMLElement::xpath][1], your XPaths look like be:
918888888888
/form/table/tr[1]/td[2]/font[substring-after(./text(), ':') ->
We are unable to trace the Name for this Mobile Number
/form/table/tr[2]/td[1]/font[1][substring-after(./text(), ':')
TATA TELESERVICES
/form/table/tr[2]/td[1]/font[2][substring-before(substring-after(./text(), ':'), 'State')
Maharashtra
/form/table/tr[2]/td[1]/font[2][substring-after(./text(), 'Region')

Related

split content of string by space to new line [duplicate]

This question already has answers here:
replace space with new line
(2 answers)
Closed 7 years ago.
I have $str= "ns.kimsufi.com ks392904.kimsufi.com ks392904.kimsufi.com"
I want to have them like that as string :
$str= "ns.kimsufi.com
ks392904.kimsufi.com
ks392904.kimsufi.com"
What is the easiest way to do such thing in PHP ?
Problem didn't solved...
Here is whole code :
<?php
$str= 'res Athéna 2 rue Henri Bergson<br/>
<b>Tech City: </b>STRASBOURG<br/>
<b>Tech State/Province:<br/>
</b><b>Tech Postal Code: </b>67200<br/>
<b>Tech Country: </b>FR<br/>
<b>Tech Phone: </b>+33.679795486<br/>
<b>Tech Phone Ext:<br/>
</b><b>Tech Fax:<br/>
</b><b>Tech Fax Ext:<br/>
</b><b>Tech Email: </b>fnt25qgfilw16kj60goe#h.o-w-o.info<br/>
<b>Name Server: </b>ns.kimsufi.com<br/>
<b>Name Server: </b>ks392904.kimsufi.com<br/>
<b>Name Server: </b>ks392904.kimsufi.com<br/>
<b>DNSSEC: </b>unsigned<br/>
<b>URL of the ICANN WHOIS Data Problem Reporting System:<br/>
</b>http://wdprs.internic.net/<br/>
>>> Last update of WHOIS database: 2015-06-07T10:20:36.0Z <br />
</td></tr>
</table><br />
<form name="queryform" method="post" action="/index.php">
<table cellpadding="6" cellspacing="0" border="0" width="540" dir="ltr">
<tr><td bgcolor="#92CAFE">
<table width="100%" cellpadding="0" cellspacing="0" border="0" dir="ltr">
<tr class="upperrow">
<td align="left" valign="top" nowrap="nowrap"><font face="Arial" size="+0"><b>Enter any domain name:</b></font></td>
</tr>
<tr class="middlerow">
<td align="center" valign="middle" nowrap="nowrap"><input type="text" name="query" value="" class="queryinput" /> <input type="submit" name="submit" value="Check Domain" /></td>
</tr>
<tr class="lowerrow">
<td align="right" valign="bottom"></td>
</tr>
</table>';
$dom = new DOMDocument;
#$dom->loadHTML($str);
$xp = new DOMXPath($dom);
$links = $xp->query('//b[text()="Name Server: "]/following-sibling::a[1]');
foreach ($links as $link) {
$newlink = $link->nodeValue . PHP_EOL;
$newlink = str_replace(' ', "\n", $newlink);
echo $newlink;
}
?>
it still echo like that :
ns.kimsufi.com ks392904.kimsufi.com ks392904.kimsufi.com
what is the problem ?
if its not space between them what it is ?
any chance to change the code without using str_replace ?
You could use str_replace to do that.
$str = str_replace(" ", "\n", $str);
echo $str;
\n will do a linebreak.
Just replace space with newline character.
preg_replace('~\h~', '\n', $str);
str_replace alone would be enough for this job. But if you want to replace also the tabs, you must go with the above.
str_replace(' ', "\n", $str);

Simple DOM html parser read html table

I am trying to read specific values of this HTML table via a php dom parser. I want my code to only read the "td width" tags and output only these items from the table and look like this:
" WAITLIST, 91630, ACCY 2001, 10, Intro Financial Accounting, 3.00, Zou, Y, Duques 251, 9:35AM-10:50AM, 01/13/14-04/28/14 "
Here is the HTML table:
<table width="100%" border="0" cellspacing="1" cellpadding="0" bgcolor="#006699">
<tr align="center" class="tableRow1Font">
<td width="7%">WAITLIST</td>
<td width="5%">91630</td>
<td width="11%">
ACCY 2001
</td>
<td width="5%">10</td>
<td width="16%">Intro Financial Accounting</td>
<td width="6%">3.00</td>
<td width="8%"> Zou, Y</td>
<td width="8%"><A HREF="http://www.gwu.edu/~map/building.cfm?BLDG=DUQUES" target="_blank" >DUQUES</a> 251</td>
<td width="13%">TR<br>09:35AM - 10:50AM</td>
<td width="14%">
01/13/14 - 04/28/14
</td>
<td width="7%">
</td>
</tr>
</table
Here is my php code which grabs the whole table, some elements of which I don't want in my output, and repeats the output multiple times:
// Retrieve the DOM from a given URL
$html = file_get_html('testdata.html');
foreach($html->find('table') as $e){
foreach($html->find('td') as $f){
echo $f->innertext . '<br>';
}
}
How can I change my code to only grab and output these elements:
"WAITLIST, 91630, ACCY 2001, 10, Intro Financial Accounting, 3.00, Zou, Y, Duques 251, 9:35AM-10:50AM, 01/13/14-04/28/14"
// Retrieve the DOM from a given URL
$html = file_get_html('testdata.html');
foreach($html->find('table') as $e){
foreach($e->find('td') as $f){
echo strip_tags($f->innertext) . '<br>';
}
}
You were pretty close already...
Forgot about the tag. See if strip_tags works for you.
http://us3.php.net/strip_tags

simple html dom parser or a regular expression

There is a html page, it contains a block:
<table class="tborder" cellpadding="6" cellspacing="1" border="0" width="100%" align="center">
<tr>
<td class="tcat" colspan="2">
Some regular text <span class="normal">the desired text 1</span>
</td>
</tr>
<tr>
<td class="alt1" colspan="2">
<span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>
</td>
</tr>
</table>
Help me to parse with simple html dom library or a regular expression, so that would be deduced only here it is:
the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>
If I do this:
<?
include 'simple_html_dom.php';
$html = file_get_html('http://some-url.com/power.html');
foreach($html->find('td[class="tcat"]') as $element1)
echo $element1. '<br>';
foreach($html->find('span[class="smallfont"]') as $element2)
echo $element2. '<br>';
?>
So, along with the necessary data also are displayed more similar elements that presents on the page. (with the same parameters 'td class="tcat"' and 'class="smallfont"')
I need that would be deduced only that:
the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>
It's all about knowing css:
echo $html->find('td.tcat span', 0)->text();
echo $html->find('span.smallfont', 0);
//the desired text 1 <span class="smallfont">link1, <i><b><font color="#006400">link2</font></b></i></span>

How To Format This Scraped Content

I'm grabbing the content from all the td's in this table with the class="job" using this.
$table01 = $salary->find('table.table01');
$rows = $table01[0]->find('td.job');
Then I'm using this to output it which works, but obviously only outputs it as plaintext, I need to do some more with it...
foreach($table01[0]->find('td.job') as $element) {
$jobs .= $element->plaintext . '<br />';
}
Ultimately I would like it outputted to this format. Notice the a href is using the job name and replacing spaces and / with a -.
<tr>
<td class="small"> Graphic Artist / Designer
$23,755 – $55,335 </td>
</tr>
<tr>
<td class="small"> Sales Associate<br />
$15,577 – $56,290 </td>
</tr>
<tr>
<td class="small"> Film / Video Editor<br />
$24,184 – $94,493 </td>
</tr>
Heres the table im scraping
<table cellpadding="0" cellspacing="0" border="0" class="table01">
<tr>
<td class="head">Test</td>
<td class="job">
Graphic Artist / Designer<br/>
$23,755 – $55,335
</td>
</tr>
<tr>
<td class="head">Test</td>
<td class="job">
Sales Associate<br/>
$15,577 – $56,290
</td>
</tr>
<tr>
<td class="head">Test</td>
<td class="job">
Film / Video Editor<br/>
$24,184 – $94,493
</td>
</tr>
</table>
may be better to use regexps
<?php
$html=file_get_contents('1.html');
$jobs='';
if(preg_match_all("/<tr>.*?<td.*?>.*?<\/td>.*?<td\sclass=\"job\">.*?<a.+?href=\"(.+?)\".+?>(.*?)<\/a>(.*?)<\/td>.*?<\/tr>/ims", $html, $res))
{
foreach($res[1] as $i=>$uri)
{
$uri=strtolower(urldecode($uri));
$uri=preg_replace("/_\/_/",'-',$uri);
$uri=preg_replace("/_/",'-',$uri);
$jobs.='<tr><td class="small"> '.$res[2][$i].''.$res[3][$i].'</td></tr>'."\n";
}
}
echo $jobs;

What's wrong with this preg_match_all

I'm using file_get_contents to read a .html file that has a table.
<table id="someTable" style="width:100%;margin-bottom:0;">
<tr style="display:none;">
<td style="padding-left:25px;">Some text</td>
</tr>
<tr style="display:none;">
<td style="padding-left:25px;">another text</td>
</tr>
</table>
When I use preg_match_all to read the table, I get nothing when I count $matches[1]
preg_match_all('/<table id="someTable" style="width:100%;margin-bottom:0;">(.*)<\/table>/', $html, $matches);
$co = count($matches[1]);
Add modifier s to your preg_match.
preg_match_all('/<table id="someTable" style="width:100%;margin-bottom:0;">(.*)<\/table>/s', $html, $matches);
See http://ideone.com/3w0K2

Categories