Get all strings between two other strings in html document in PHP - php

I'm creating some kind of crawler/proxy at the moment. It can navigate a website and still remain on my website while browsing. But I thought about while loading the website, get all the links and data at the same time.
So the website contains many "< tr>"(without the space) which again contains a lot of other stuff.
Here is 1 example of many on the website:
<tr>
<td class="vertTh">
<center>
Other
<br>
Document
</center>
</td>
<td>
<div class="Name">
Document Title Info
</div>
<a href="http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<font class="Desc">Uploaded 03-24 14:02, Size 267.35 KB, ULed by <a class="Desc" href="/s/user/username/" title="Browse username">username</a></font>
</td>
<td align="right">67</td>
<td align="right">9</td>
</tr>
Users browse the proxy site, and while they do, it catches info from the original website.
I figured out how to get a string between two words, but I don't know how to make this to a "foreach" code or something else.
So let's say I want to get the source link. Then I would do something like this:
$url = $_GET['url'];
$str = file_get_contents('https://database.com/' . $url);
$source = 'http://example.com/source/to/' . getStringBetween($str,'example.com/source/to/','" title="Source">'); // Output looking like this: http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters
function getStringBetween($str,$from,$to)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
return substr($sub,0,strpos($sub,$to));
}
But I can't just do this, because there are multiple of these strings. So I'm wondering if there is any kind of way I can get Source, name and size on all of these strings?

You might want to use preg_match_all so that you get a list of many matches. Then you can loop over it.
http://php.net/manual/en/function.preg-match-all.php
$html = '<tr>
<td class="vertTh">
<center>
Other
<br>
Document
</center>
</td>
<td>
<div class="Name">
Document Title Info
</div>
<a href="http://another-example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<a href="http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters" title="Source">
<img src="/static/img/icon-source.png" alt="Source">
</a>
<font class="Desc">Uploaded 03-24 14:02, Size 267.35 KB, ULed by <a class="Desc" href="/s/user/username/" title="Browse username">username</a></font>
</td>
<td align="right">67</td>
<td align="right">9</td>
</tr>';
// use | as delimiter for pattern to make it a little cleaner
preg_match_all('|href="(http://.+?)" title="Source"|', $html, $matches);
// loop over $matches
var_dump($matches);
foreach ($matches[1] as $match) {
// $match == http://example.com/source/to/document/which%20can%20be%20very%20long%20and%20have%20weird%20characters
}
You can try this example at... http://phpfiddle.org/ or run it in a .php file locally. Good luck.
FYI: I added an extra anchor tag to illustrate finding another source.

Related

Using DOMXpath to find data in not so nice html

I am trying to get some data from a plant list site. This proves to be a bit problematic because their html isn't really well-formed. These are two lines from the search result (disclaimer: I am not responsible for this code):
<tr>
<td>
<i class="glyphicons-icon leaf"></i>
</td>
<td>
<a title="Cimicifuga simplex" href="/taxon/wfo-0000604773" class="result">
<h4 class="h4Results"><em>Cimicifuga simplex</em>(DC.) Wormsk. ex Turcz.</h4>
</a>
Bull. Soc. Imp. Naturalistes Moscou<br/>
<div>
<em>Status:</em><span id="entryStatus">Synonym of </span>
<em>Actaea simplex</em>(DC.) Wormsk. ex Prantl
</div>
<div>
<em>Rank:</em><span id="entryRank">Species</span>
</div>
<div>
<em>Family:</em> Ranunculaceae
</div>
</td>
<td>
<img title="No Image Available" src="/css/images/no_image.jpg" class="thumbnail pull-right"/>
</td>
</tr>
<tr>
<td>
<i class="glyphicons-icon leaf"></i>
</td>
<td>
<a title="Actaea simplex" href="/taxon/wfo-0000519124" class="result">
<h4 class="h4Results"><strong><em>Actaea simplex</em>(DC.) Wormsk. ex Prantl</strong></h4>
</a>
Bot. Jahrb. Syst.<br/>
<div>
<em>Status:</em><span id="entryStatus">Accepted Name</span>
</div>
<div>
<em>Rank:</em><span id="entryRank">Species</span>
</div>
<div>
<em>Family:</em> Ranunculaceae</div>
<div>
<em>Order:</em> Ranunculales
</div>
</td>
<td>
<img title="No Image Available" src="/css/images/no_image.jpg" class="thumbnail pull-right"/>
</td>
</tr>
I added some layout myself, otherwise it wasn't readable.
Anyway, I loaded the page in php and DOMXpath and now I want to get two things:
Select the row that has Accepted Name in it
Get the species name and the corresponding link from it
In this case the result would be "Actaea simplex" and "/taxon/wfo-0000519124". Mind that there will be more results resembling the first row, and that the position of the row that I am looking for doesn't have to be the second one.
Normally I just try, use google and try some more and in the end I get there, but in this case IDs are used as classes, and are not unique. This make it impossible to use an Xpath tester, and perhaps even useless for DOMXpath.
So, is it possible to get my data with DOMXpath, and if yes - what query do I use?
Try something like:
$dom = new DOMDocument();
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
$target = $xpath->query("//td[.//span[.='Accepted Name']]/a");
$link = $target[0]->getAttribute('href');
$title = $target[0]->getAttribute('title');
echo $title," ",$link;
Output
Actaea simplex /taxon/wfo-0000519124

How to remove HTML tags in CSV file- PHP

I am trying to download CSV file from php values. But the output file is included with all the html tags.
I have used the following part of code for CSV file.
header('Content-Type: application/x-csv');
header('Content-disposition:attachment;filename=file.csv');
CURRENT OUTPUT:
<tr><td> 2017-11-05 10:38:05 </td> <td> 3 </a> </td> <td> Full Speed </a>
</td><td>56</td><td> <div id= "level2" > Level 2 </div> </td></tr>
<tr><td>2017-11-05 10:37:03 </td> <td>
Expected OUTPUT:
2017-11-05 10:38:05, 3 , Full , Level 2 , 2017-11-05 10:37:03.
NOTE: data is extracted from array variables in php (not from sql/database)
Just run strip_tags() on the string you want to filter.
If you dont want to strip all tags, you can use the second parameter of strip_tags*
$str = ' <tr><td> 2017-11-05 10:38:05 </td> <td> 3 </a> </td> <td> Full Speed </a>
</td><td>56</td><td> <div id= "level2" > Level 2 </div> </td></tr>
<tr><td>2017-11-05 10:37:03 </td> <td> ';
$str = strip_tags($str); // Your data without html tags
If you want to have it in a single line, you can go further and do a str_replace() on it:
$str = str_replace(PHP_EOL, '', $str);
*Lets say, you dont want to remove paragraphs or links, you would call strip_tags() like that:
$str = strip_tags($str, '<p><a>');

Using a technick how can I get the following that they do not have any specific structure?

From the following HTML code that does not have a specific structure but are just plain , how can I get the: (below you can find what I did using regex)
231435424
1800cc
163bhp
Automatic
Petrol
Blue
Here is the HTML code
<td class="details">
<span class="p_t">Audi A4</span>
<a class="info" href="./view/3505089/">(Details)</a><br>
<div class="attribs">
Roadster
<br>
P.O: 35562, <span class="p_l">BURON</span>, Phone. 231435424<br>
1800cc,
163bhp,
Automatic,
Petrol,
Blue,
</div>
</td>
Here is what I was doing with regex
$bhps = array();
$gears = array();
preg_match_all('/(\d{2,3})bhp\b,/', $str2b, $bhps);
preg_match_all('#(A(.*?)tomatic|Ma(.*?)ual)#u', $str2b, $gears);
foreach .......
$bhp = $bhps[1][$key];
$gear = $gears[1][$key];
........

make array object in for loop with horizontal view

<?php
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=London');
$information = $xml->xpath("/xml_api_reply/weather/forecast_information");
$current = $xml->xpath("/xml_api_reply/weather/current_conditions");
$forecast_list = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
?>
<html>
<head>
<title>Google Weather API</title>
</head>
<body>
<h1><?php print $information[0]->city['data']; ?></h1>
<h2>Today's weather</h2>
<div class="weather">
<img src="<?php echo 'http://www.google.com' . $current[0]->icon['data']?>" alt="weather"?>
<span class="condition">
<?php echo round(conver_f_c($current[0]->temp_f['data'])); ?>° C,
<?php echo $current[0]->condition['data'] ?>
</span>
</div>
<h2>Forecast</h2>
<?php foreach ($forecast_list as $forecast) : ?>
<div class="weather">
<img src="<?php echo 'http://www.google.com' . $forecast->icon['data']?>" alt="weather"?>
<div><?php echo $forecast->day_of_week['data']; ?></div>
<span class="condition">
<?php echo round(conver_f_c($forecast->low['data'])); ?>° C - <?php echo round(conver_f_c($forecast->high['data'])); ?>° C,
<?php echo $forecast->condition['data'] ?>
</span>
</div>
<?php endforeach ?>
</body>
</html>
<?php
function conver_f_c($F){
return $C = ($F − 32) * 5/9;
}
I want Out somthing like this manner of the horizontal ,
Even i tried UL LI WITH display inline but it goes failed,
Tell me some good suggestion for my probme,
I want exactly like horizontal, expecting exactly like screen shot ,
Tell me How to render using php
Thanks
alt text http://img163.imageshack.us/img163/7518/weatherhori.jpg
Above snippet present display out verticly , i want o change that verticle to horizonatal ,
somthing like this screen shot
<table>...</table>
Update
From your latest comment so far:
i know how to fetch array and display
it, but i dont know to fetch and
display in the verticl manner that is
the stuck up
I feel this is going to be a stupid answer but it appears to be what you don't understand...
The web is based in a markup language called HTML. This language has tags (delimited by angle-brackets) that allow you to define the structure of a document. On top of this, you have another language called CSS. This other lang allow you to define how HTML is going to be displayed on screen.
You may argue that you already have a web page and you've written it with the PHP language instead of the two other langs I've mentioned. That's not enterely true: you code in PHP, sure, but you use PHP to generate HTML. And it's HTML what finally reaches the browser (Firefox, Explorer...). PHP is executed in the web server, not in the browser. The browser can only see whatever HTML you've generated.
To sum up: you have to forget about PHP, Google and the whole weather thingy. You first need to write a static HTML document and style it with CSS. Once you've done with it, you can finally replace the parts of the information that are dynamic with values taken from your PHP variables.
And since you seem to need a table to display tabular data, the appropriate HTML tag is <table>:
<table>
<tr>
<th>Web</th>
<th>Thu</th>
<th>Fri</th>
<th>Sat</th>
</tr>
<tr>
<td><img src="/path/to/pics/cloudy.png" width="25" height="25" alt="Cloudy"></td>
<td><img src="/path/to/pics/sunny.png" width="25" height="25" alt="Sunny"></td>
<td><img src="/path/to/pics/rainy.png" width="25" height="25" alt="Rainy"></td>
<td><img src="/path/to/pics/cloudy.png" width="25" height="25" alt="Cloudy"></td>
</tr>
<tr>
<td>26ºC</td>
<td>26ºC</td>
<td>22ºC</td>
<td>25ºC</td>
</tr>
<table>
I suggest you find some tutorials about basic HTML and CSS. They'll be of invaluable help.
This is what's done by Google :
http://jsfiddle.net/bW8NA/1

basic php form help

I'm trying to make a calculator that will take inputs from users and estimate for them how much money they'll save if they use various different VoIP services.
I've set it up like this:
<form method="get" action="voip_calculator.php">
How much is your monthly phone bill?
<input name="monthlybill" type="text" value="$" size="8">
<p><input type="submit" name="Submit" value="Submit">
</p>
</form>
On voipcalculator.php, the page I point to, I want to call "monthlybill" but I can't figure out how to do it. I also can't figure out how to make it do the subtraction on the numbers in the rows.
This may be very simple to you but it's very frustrating to me and I am humbly asking for a bit of help. Thank you!
Here is the relevant stuff from voip_calculator, you can also click on the url and submit a number to see it in (in)action. I tried various times to call it with no success:
<table width="100%;" border="0" cellspacing="0" cellpadding="0"class="credit_table2" >
<tr class="credit_table2_brd">
<td class="credit_table2_brd_lbl" width="100px;">Services:</td>
<td class="credit_table2_brd_lbl" width="120px;">Our Ratings:</td>
<td class="credit_table2_brd_lbl" width="155px;">Your Annual Savings:</td>
</tr>
Your monthly bill was <?php echo 'monthlybill' ?>
<?php echo "$monthlybill"; ?>
<?php echo "monthlybill"; ?>
<?php echo '$monthlybill'; ?>
<?php echo 'monthlybill'; ?>
<?php
$monthybill="monthlybill";
$re=1;
$offer ='offer'.$re.'name';
$offername= ${$offer};
while($offername!="") {
$offerlo ='offer'.$re.'logo';
$offerlogo=${$offerlo};
$offerli ='offer'.$re.'link';
$offerlink=${$offerli};
$offeran ='offer'.$re.'anchor';
$offeranchor=${$offeran};
$offerst ='offer'.$re.'star1';
$offerstar=${$offerst};
$offerbot='offer'.$re.'bottomline';
$offerbottomline=${$offerbot};
$offerca ='offer'.$re.'calcsavings';
$offercalcsavings=${$offerca};
echo '<tr >
<td >
<a href="'.$offerlink.'" target="blank">
<img src="http://www.nextadvisor.com'.$offerlogo.'" alt="'.$offername.'" />
</a>
</td>
<td >
<span class="rating_text">Rating:</span>
<span class="star_rating1">
<img src="IMAGE'.$offerstar.'" alt="" />
</span>
<br />
<div style="margin-top:5px; color:#0000FF;">
Go to Site
<span style="margin:0px 7px 0px 7px;">|</span>
Review
</div>
</td>
<td >'.$offercalcsavings.'</td>
</tr>';
$re=$re+1;
$offer ='offer'.$re.'name';
$offername= ${$offer};
}
?>
offercal(1,2,3,4,5,6,7)savings calls to a file called values.php where they are defined like this:
$offer1calcsavings="24.99";
$offer2calcsavings="20.00";
$offer3calcsavings="21.95";
$offer4calcsavings="23.95";
$offer5calcsavings="19.95";
$offer6calcsavings="23.97";
$offer7calcsavings="24.99";
One thing i do is this
echo "<pre>";
print_r($_GET);
echo "</pre>";
Put this somewhere in your receiving end and you'll get an understanding of whats happening.
Not enough detail provided for an answer but let's simplify and assume you have the 'savings' numbers in an array, say, companySavings . So, you need to subtract each of these from the value the user specifies right? You don't need to call something (you could if you want...)
when the user clicks 'Submit' and the page is loaded again pull the monthlybill into a var e.g.
$monthlyBill = $_GET['monthlybill']; //you should do some checking to prevent attacks but that's another matter
Then, when you are building the list of savings it would look something like this
<?php
//...code for the rest of the page and starting your table
foreach($companySavings as $savings){//insert each row into the table
echo("<tr><td>".(comapnyName/Image whatever..)."</td><td>$".$monthlyBill-$savings."</td></tr>);
}
//... end the table and rest of code
?>
You need to get the value from the QueryString and put it into a PHP variable.
Like this:
$monthlyBill = $_GET['monthlybill'];
Now the variable $monthlyBill contains the value from the QueryString.
To display it:
echo "Your monthly bill is: $monthlyBill";

Categories