extract specific data from webpage using php - php

I wants to create a php script for alerts from my work website when new notice is published, so following the page url
http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767
from this page i want a variable for Date & Time of Meeting (Date and time seperately two variables)
Place of Meeting and Published On
please help me to create a perfect php script.
I tried to create following script but it gives to many errors
<?php
$url1 = "http://www.mahapwd.com/nit/ueIndex.asp?district=12";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
preg_match("/href=(.*)\", $data, $urldata);
$url2 = "http://www.mahapwd.com/nit/$urldata[1];
curl_setopt($ch, CURLOPT_URL, $url2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data2 = curl_exec($ch);
preg_match("/Published On:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
preg_match("/Time of Meeting:</b>(.*)&nbsp", $data, $MtDt);
$MeetDate = $MtDt[1];
preg_match("/Time of Meeting:</b>$MtDt[1]&nbsp(.*)</font>", $data, $MtTime);
$MeetTime = $MtTime[1];
preg_match("/Place of Meeting:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
?>

Hello i have done simple code for you. You can download simple_html_dom.php from http://simplehtmldom.sourceforge.net/
require_once "simple_html_dom.php";
$url='http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767';
//parse url
for ($i=0;$i<1;$i++) {
$html1 = file_get_html($url);
if(!$html1){ echo "no content"; }
else {
//here is parsed html
$string1 = $html1;
//now you need to find table
$element1=$html1->find('table');
//here is a table you need
$input=$element1[2];
//now you can select row from here
foreach($input->find('td') as $element) {
//in here you can find name than save it to database than check it
}
}
}

Related

How to pass store selection in Simple HTML DOM parser

I'm trying to fetch a product name and price on this website Toplivo.bg
I am using the Simple HTML DOM parser to get it. Here is my code
include_once('simple_html_dom.php');
$link="https://toplivo.bg/en/products/Construction-materials/Dry-construction-mixtures/Screeds-and-flooring";
$html = file_get_html($link);
//Price
foreach ($html->find('div[class="content"]') as $text){
echo $text -> plaintext.'<br>';
}
?>
The problem is that first, I need to select the warehouse on the website to get the price for "Baumit Cement screed Baumit Solido E160, 25 kg".
Can I select it by default through PHP code? For example, I want to select the "Plovdiv region -> Plovdiv Store"
Thanks for helping!
This can be achieved using cURL. Complete code below:
<?php
include_once('simple_html_dom.php');
$link = "https://toplivo.bg/en/products/Construction-materials/Dry-construction-mixtures/Screeds-and-flooring";
// let's use curl to create a get request first to select a store while keeping the session using a cookie file
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://toplivo.bg/izborNaSklad/39');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie-45fg.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie-45fg.txt');
$output = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, $link); // now let's fetch the raw content of the store products page
$output = curl_exec($ch);
$html = str_get_html($output); // since we have the raw input, we can use the str_get_html method instead of file_get_html
//Price
foreach ($html->find('div[class="content"]') as $text){
echo $text->plaintext . '<br>';
}
?>

curl_exec returns empty string

I'm still a bit new to using curl to pull data and I've recently started using Fiddler to help find what options need to be set.
I'm trying to see if I can pull an image from a site. I first hit a search page - I set the search parameters, then start hitting links in the results. When I attempt to go a link in one of the results for an image, I get an empty string returned from curl_exec().
The weird thing is - at one point, it worked - I got the data back and successfully saved the image locally. But then it stopped, and I have no idea what I was doing to have it working. Naturally, everything works OK in the browser. :(
I'm using Simple HTML DOM to parse through results and cUrl for the actual page requests. curl_error() does not show an error, curl_getinfo() thinks everything is OK too. It's probably something trivial, but I'm not sure how to troubleshoot it beyond where I am.
<?php
include 'includes/simple_html_dom.php';
$url = "http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal/Corrections/InmateInquiry.aspx";
// Get Cookie - ASP.NET_SessionId
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$r = curl_exec($ch);
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $r, $matches);
$cookies = array();
foreach($matches[1] as $item)
{
parse_str($item, $cookie);
$cookies = array_merge($cookies, $cookie);
}
$sessionCookie = "ASP_NET_SessionId=".$cookies['ASP_NET_SessionId'];
// now load up page into Simple HTML DOM and get all inputs - ignore buttons and populate our dates
$startDate = "02%2F01%2F2000";
$endDate = "02%2F07%2F2016";
$getInputs = str_get_html($r);
$inputs = $getInputs->find('input');
$inputs_array = array();
$buttons_array = array();
for ($i=0; $i<count($inputs); $i++)
{
if ($inputs[$i]->type != "submit")
{
$inputs_array[$inputs[$i]->id] = $inputs[$i]->value;
if (stripos($inputs[$i]->id, "FromDate") > 0)
$inputs_array[$inputs[$i]->id] = $startDate;
if (stripos($inputs[$i]->id, "ToDate") > 0)
$inputs_array[$inputs[$i]->id] = $endDate;
}
}
// build up our curl data - includes hidden inputs, our to & from dates, plus the Search button
$curl_data = http_build_query($inputs_array)."&ctl00%24DefaultContent%24uxSearch=Search";
// POST the data, include session cookie
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curl_data);
curl_setopt($ch, CURLOPT_COOKIE, $sessionCookie);
$response = curl_exec($ch);
// this shows that we can get data
// find the links from the HTML
$htmlDom = str_get_html($response); // load up Simple HTML DOM
// get the table of results
$divTable = $htmlDom->find('div#ctl00_DefaultContent_uxResultsWrapper',0)->find('table',0);
$rows = $divTable->find('tr');
for ($i=1; $i<count($rows);$i++)
{
if ($i>3) break; // limit the length of script for debugging
$link = $rows[$i]->find('td',1)->find('a',0)->href;
// build up query to get inmate details from the link above
$url = "http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal/Corrections/".$link;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIE, $sessionCookie);
$page = curl_exec($ch);
$pageData = str_get_html($page);
// Now find the Photo, there's a thumb in div.BookingPhotos
// It is linked to a full size image, the link is of the form http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal/GetImage.aspx?ImageKey=17C030IS, but in the href, it has ../GetImage.aspx?ImageKey=xxxx
$photoLink = $pageData->find('div.BookingPhotos',0)->find('a',0)->href;
// get rid of .. and put the base URL on the front
$imgLink = str_replace("..", "http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal", $photoLink);
// now attempt to pull the image
$ch = curl_init($imgLink);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIE, $sessionCookie);
// here is the PROBLEM - NO DATA RETURNED
$imgData = curl_exec($ch); // I get a header back, but NO data
}
?>

Processing CSV through php function

I made a PHP function that works with an API to show me the dollar balance of an account. This function is called: get_balance
function get_balance($account){
$url = 'http://mycompaniesurl.com/' . $account; // url of website
global $response;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
$response = curl_exec($ch);
curl_close($ch);
}
The function get_balance returns the output in the variable $response
I'm certain that the function works, so I have no questions on that part. However, I'm trying to process accountnumbers written down in a CSV file. I call the CSV file with the following code:
$file = new SplFileObject("test.csv");
$file->setFlags(SplFileObject::READ_CSV);
$data = call_user_func_array('array_merge', iterator_to_array($file));
$data = array_combine(range(1, count($data)), $data);
extract($data, EXTR_PREFIX_ALL, 'variable');
I'm testing my code with a csv file called test.csv, containing 4 addresses (first one has a balance of 0, other 3 have a balance of >0).
With the following code I get the balance with the accountnumber printed to my screen:
get_balance($data[2]);
if ($response > 0){
echo $response,"---------------",$data[2];
}
Because $data[1] has a balance of 0, nothing is printed. $data[2],$data[3] and $data[4] have a balance of more than 0, so they do return the balance together with the accountnumber.
Now what my question is; is there a way to 'automatically' do this? Something like
get_balance($data[]);
seems to not work. The CSV that this php file has to process is about ~1000 accountnumber and may have more in the future, so typing a get_balance($data[1]) up to get_balance($data[999]) will be a time consuming business.
Is there a (simple) way to apply the function to ALL the $data[] ?
Yes, there is, called array_map:
$balanced_data = array_map("get_balance", $data);
This will leave $balanced_data empty, because at the moment get_balance has no return, but uses a global variable. Change it to something like this
function get_balance($account){
$url = 'http://mycompaniesurl.com/' . $account; // url of website
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
If you don't want to or can't change get_balance, you will need a workaround like this
$max = count($data);
for ($i=0; $i<$max; $i++) {
$response = 0; // reset to make sure we get a new value
get_balance($data[$i]);
$balanced_data[$i] = $response;
}

php curl search list of webpage

i have the list of http in txt separately by \n
I want by php curl every page and search specify string
the specifc string is :
http://www.....com/xyz/...png or .gif
$ch = curl_init("ARRAY of page from txt????");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$text = curl_exec($ch);
$test = strpos($text, "HOW CREATE THE SPECIFIC STRING???");
if ($test==false)
{
echo $test;
}
else
{
echo "no exist";
}
<?
$array = explode("\n", file_get_contents('fileName.txt'));
foreach($array as $url){
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
$result[] = $output;
}
?>
finally array $result contain all html for your links that was inside your text file
Do an explode on the url like:
$urls = explode ('\n', urlstring)
Now you can loop through the array of urls.

simplexml_load_file from file not ending with .xml

I'm trying to parse an xml file by starting with simplexml_load_file to load the contents. The file comes from a wordpress using an xml feed generated by a .php file.
The problem is it never can load the xml file..I'm not sure what I can do to make this work. Here is the code
<?php
$url = "http://marshallmashup.usc.edu/feed.php";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$result = curl_exec($ch);
curl_close($ch);
$rss = simplexml_load_string($result);
if( ! $rss = simplexml_load_file($url,NULL, LIBXML_NOERROR | LIBXML_NOWARNING) )
{
echo 'unable to load XML file';
}
else
{
echo 'XML file loaded successfully';
}
?>
First of all after this line:
$result = curl_exec($ch);
you should add this one:
$result = utf8_encode($result);
Said that, you'll have no problems with the function simplexml_load_string($result); which will correctly create a DOM based on the string you give to the function and that is the feed gotten from the php page. You can see the result using var_dump($rss); after the statement $rss = simplexml_load_string($result);.

Categories