web scrape using preg_match_all - php

I'm trying to get the contact information from this site http://www.internic.net/registrars/registrar-967.html using PHP.. I was able to get the e-email ad by using the href links by doing this:
$contactStr = "http://www.internic.net/registrars/registrar-967.html";
$contact_string = file_get_contents("$contactStr");
preg_match_all('/<a href="(.*)">(.*)<\/a>/i', $contact_string, $contactInfo);
$email = str_replace("mailto:", "", $contactInfo[1][6]);
However, I'm having a hard time getting the address and the phone # since there's no html element I can use like < p > maybe.. I just need 1800 SW First Ave., Suite 440 Portland OR 97201 United States and 310-467-2549 from this site.. Please enlighten me on how to do this
using preg_match_all or some other ways possible.. Thanks!

Instead of using regex try DOMDocument as others have said in comment.
Here is an example (bit hacky tho) hope it helps:
function get_register_by_id($id){
$site = file_get_contents('http://www.internic.net/registrars/registrar-'.$id.'.html');
$dom = new DOMDocument();
#$dom->loadHTML($site);
$result = array();
foreach($dom->getElementsByTagName('td') as $td) {
if($td->getAttribute('width')=='420'){
$innerHTML= '';
$children = $td->childNodes;
foreach ($children as $child) {
$innerHTML .= trim($child->ownerDocument->saveXML($child));
}
$fixed = array_map('strip_tags', array_map('trim', explode("<br/>",trim($innerHTML))));
foreach($fixed as $val){
if(empty($val)){continue;}
$result[] = str_replace(array('! '),'',$val);
}
}
}
return $result;
}
print_r(get_register_by_id(965));
/*Array
(
[0] => Domain Central Australia Pty Ltd.
[1] => Level 27
[2] => 101 Collins Street
[3] => Melbourne Victoria 3000
[4] => Australia
[5] => +64 300 4192
[6] => robert.rolls#domaincentral.com.au
)*/
print_r(get_register_by_id(966));
/*
Array
(
[0] => Web Business, LLC
[1] => PO Box 1417
[2] => Golden CO 80402
[3] => United States
[4] => +1.303.524.3469
[5] => support#webbusiness.biz
)*/
print_r(get_register_by_id(967));
/*
Array
(
[0] => #1 Host Australia, Inc.
[1] => 1800 SW First Ave., Suite 440
[2] => Portland OR 97201
[3] => United States
[4] => 310-467-2549
[5] => registry-operations#moniker.com
)*/

Related

PHP - concatenate ' checked' to each string value in multidimensional array using recursion

I have a multidimensional array in PHP, and want to concatenate a string onto each string element using recursion. The array is as follows:
$array = Array
(
[p] => Array
(
[0] => This Porsche 993 Carrera Cabriolet represents a great opportunity to acquire an open-top variant of one of the most coveted 911 models.
[1] => First registered on 5 August 1994, M912 SGY displays 10,630 miles on the odometer with a clock change at 66,244 miles in 2014.
[2] => The car’s Aventura Green metallic paintwork is reported to be in good condition, presenting well for its age and mileage.
[3] => The Marble Grey leather interior is believed to be entirely original.
[4] => Serviced by Porsche specialist Portiacraft in July 2020 at 76,598 miles, this consisted of an annual oil and filter service.
[5] => The last MOT was undertaken on 6 July 2020 at 76,598 miles.
[6] => It is supplied with a Porsche Club Great Britain folder with records of main dealer and specialist service history.
[7] => This Porsche 911 Carrera Cabriolet presents in highly original and well-maintained condition.
[8] => Summary of maintenance history:
[9] => Array
(
[strong] => The description of this auction lot is, to the best of the seller's knowledge, accurate and not misleading.
)
[10] => Array
(
[strong] => All UK-registered cars and motorbikes on Collecting Cars are run through an online HPI check. This vehicle shows no insurance database markers for damage or theft, and has no finance owing.
)
)
[ul] => Array
(
[li] => Array
(
[0] => 04/11/1996 – 16,120 miles
[1] => 18/11/1998 – 25,086 miles
[2] => 09/09/1999 – 28,769 miles
[3] => 21/02/2000 – 31,469 miles
[4] => 22/06/2001 – 36,055 miles
[5] => 29/10/2002 – 40,781 miles
[6] => 02/03/2005 – 46,238 miles
[7] => 24/03/2006 – 49,459 miles
[8] => 03/07/2007 – 53,051 miles
[9] => 17/12/2008 – 56,582 miles
[10] => 20/05/2010 – 57,385 miles
[11] => 08/06/2011 – 61,653 miles
[12] => 15/05/2012 – 64,425 miles
[13] => 17/04/2013 – 66,026 miles
[14] => 07/06/2014 – 66,244 miles
[15] => 14/09/2015 – 68,411 miles
[16] => 27/02/2018 – 74,856 miles
[17] => 06/08/2019 – ~76,400 miles
[18] => 06/07/2020 – 76,598 miles
)
)
)
Ideally, the result should look like this:
$array = Array
(
[p] => Array
(
[0] => This Porsche 993 Carrera Cabriolet represents a great opportunity to acquire an open-top variant of one of the most coveted 911 models. checked
[1] => First registered on 5 August 1994, M912 SGY displays 10,630 miles on the odometer with a clock change at 66,244 miles in 2014. checked
[2] => The car’s Aventura Green metallic paintwork is reported to be in good condition, presenting well for its age and mileage. checked
[3] => The Marble Grey leather interior is believed to be entirely original. checked
[4] => Serviced by Porsche specialist Portiacraft in July 2020 at 76,598 miles, this consisted of an annual oil and filter service. checked
[5] => The last MOT was undertaken on 6 July 2020 at 76,598 miles. checked
[6] => It is supplied with a Porsche Club Great Britain folder with records of main dealer and specialist service history. checked
[7] => This Porsche 911 Carrera Cabriolet presents in highly original and well-maintained condition. checked
[8] => Summary of maintenance history: checked
[9] => Array
(
[strong] => The description of this auction lot is, to the best of the seller's knowledge, accurate and not misleading. checked
)
[10] => Array
(
[strong] => All UK-registered cars and motorbikes on Collecting Cars are run through an online HPI check. This vehicle shows no insurance database markers for damage or theft, and has no finance owing. checked
)
)
[ul] => Array
(
[li] => Array
(
[0] => 04/11/1996 – 16,120 miles checked
[1] => 18/11/1998 – 25,086 miles checked
[2] => 09/09/1999 – 28,769 miles checked
[3] => 21/02/2000 – 31,469 miles checked
[4] => 22/06/2001 – 36,055 miles checked
[5] => 29/10/2002 – 40,781 miles checked
[6] => 02/03/2005 – 46,238 miles checked
[7] => 24/03/2006 – 49,459 miles checked
[8] => 03/07/2007 – 53,051 miles checked
[9] => 17/12/2008 – 56,582 miles checked
[10] => 20/05/2010 – 57,385 miles checked
[11] => 08/06/2011 – 61,653 miles checked
[12] => 15/05/2012 – 64,425 miles checked
[13] => 17/04/2013 – 66,026 miles checked
[14] => 07/06/2014 – 66,244 miles checked
[15] => 14/09/2015 – 68,411 miles checked
[16] => 27/02/2018 – 74,856 miles checked
[17] => 06/08/2019 – ~76,400 miles checked
[18] => 06/07/2020 – 76,598 miles checked
)
)
)
I have tried the following:
$addedChecked = $this->addCheckedRecursive($array);
private function addCheckedRecursive($array)
{
if(!is_array($array)) {
return $array . ' checked';
}
foreach($array as $v) {
$this->addCheckedRecursive($v);
}
}
and also
$addedChecked = array_walk_recursive($array, function (&$value) {
$value .= ' checked';
});
The latter simply returned true.
For info, every element of each array will always be a string, and I would also like to preserve the current array structure. Any help is appreciated.
There is an in-built function that you can use to achieve what you want.
If you use array_walk_recursive as follows:
// Say you have your array $xmlArray
array_walk_recursive($xmlArray, function (&$value) {
$value .= ' checked';
});
// Since $xmlArray is now modified (in place)
echo '<pre>';
print_r($xmlArray);
echo '</pre>';
In case you would not want $xmlArray to be changed as a side effect, could assign a value copy of the array to a new variable.
$addedChecked = $xmlArray;
array_walk_recursive($addedChecked, function (&$value) {
$value .= ' checked';
});
// Since $addedChecked is now modified (in place)
echo '<pre>';
print_r($addedChecked);
echo '</pre>';
The function takes in the array by reference and will thus modify the array directly as a side effect of the function. This is one important thing to note, it does not return an array, but only whether the function was successfully executed on the array you gave it.
This will loop over each key => value pair and do so recursively if the value is of type array. You can simply concatenate to the value (where we pass value by reference) to update it with checked.
why dont you try array_walk_recursive or array_replace_recursive ?
the docs can be found here
Try it like this:
$addedChecked = $this->addCheckedRecursive($array);
private function addCheckedRecursive($array)
{
if (!is_array($array)) {
return $array . ' checked';
}
else
{
for ($i = 0; $i < count($array); $i++) {
$array[$i] = $this->addCheckedRecursive($array[$i]);
}
return $array;
}
}

Can't get exactly values from other page

I am trying to get score table from this page http://www.skysports.com/football/competitions/bundesliga/table. I do this with
$bundes = file('http://www.skysports.com/football/competitions/bundesliga/table');
And when i try to display array $bundes i do it with this:
echo '<pre>', print_r($bundes), '</pre>';
The code witch i try do display is displayed like this:
[1437] =>
[1022] => German Bundesliga 2015/16
# Team Pl W D L F A GD Pts Last 6
1 [1059] => [1060] => Bayern Munich [1061] => [1062] => 9 9 0 0 29 4 25 27 [1072] =>
[1073] =>
[1074] =>
This is the first row of table. And now i can display $bundes[1060] and i get output of Bayer Munich but how can i get values from $bundes[1062], values are 9, 9, 0, 0, 29, 4, 25 and 27? I need to display each of this values in <td></td>
When i try to echo $bundes[1062] i get nothing.
A more reliable way of extracting the data is using DOM manipulation classes to do something like:
$doc = new \DOMDocument();
#$doc->loadHTMLFile('http://www.skysports.com/football/competitions/bundesliga/table');
$xpath = new \DOMXPath($doc);
$rows = $xpath->query('//tbody/tr');
$data = [];
foreach ($rows as $i => $row) {
$columns = $xpath->query('td', $row);
foreach ($columns as $column) {
$data[$i][] = trim($column->textContent);
}
}
print_r($data);
Which gives you:
Array
(
[0] => Array
(
[0] => 1
[1] => Bayern Munich
[2] => 9
[3] => 9
[4] => 0
[5] => 0
[6] => 29
[7] => 4
[8] => 25
[9] => 27
[10] =>
)
...
Regarding Dagon's comment, no terms can disallow crawling and extracting the data (as long as you do so at a reasonable rate that does not impact the website's performance). Terms of use & copyright law, however, do dictate what you can and cannot do with the crawled content (ex. republish).
Web scraping may be against the terms of use of some websites. The enforceability of these terms is unclear (see "FAQ about linking – Are website terms of use binding contracts?").
- Wikipedia, Web scraping: Legal issues
BTW, the pages robots meta tag does allow INDEX.

how to get data on month base from array in php

I have data in the following form
Array
(
[0] => Array
(
[event_id] => 2042632
[event_name] => Georgia Tech Yellow Jackets vs. North Carolina Tar Heels
[event_payment] => 156
[payment_status] => 1
[event_date] => 2014-01-29
)
[1] => Array
(
[event_id] => 2042632
[event_name] => Georgia Tech Yellow Jackets vs. North Carolina Tar Heels
[event_payment] => 89
[payment_status] => 1
[event_date] => 2014-01-29
)
[2] => Array
(
[event_id] => 2042632
[event_name] => Georgia Tech Yellow Jackets vs. North Carolina Tar Heels
[event_payment] => 772
[payment_status] => 1
[event_date] => 2014-01-29
)
[3] => Array
(
[event_id] => 2042633
[event_name] => Georgia Tech Yellow Jackets vs. North Carolina Tar Heels
[event_payment] => 256
[payment_status] => 0
[event_date] => 2013-12-29
)
)
All I want now to show data in the following format after getting from this array:
January
Event-name, payment, status
georgia.. 1234, 0
georgia.. 3456, 1
December
Event-name, payment, status
georgia.. 1234, 0
georgia.. 3456, 1
and so on and so forth.
please guide me how to do that..
Try
$result = array();
foreach($arr as $ar){
$date = DateTime::createFromFormat("Y-m-d", $ar['event_date']);
$month = $date->format('F');
$year = $date->format('Y');
$result[$year][$month][] = $ar;
}
See demo here
$res= array();
foreach($arr as $key => $val) // $arr is your actual array
{
$month = date('F-Y', strtotime($val['event_date']));
$res[$month][] = $val;
}
Now you get will result as
[January-2014] = array(a1,a2...)
[December-2014] = array(a1,a2...)
Here I only suggest how to sort an array by date. I hope you can do the rest of the work yourself.
function cmp($a, $b) {
return (new DateTime($a['event_date']))->getTimestamp() < (new DateTime($b['event_date']))->getTimestamp()? -1 : 1;
}
usort($array, 'cmp');
$curr_month = 0;
foreach ($array as $elem) {
// output elements of array based on month
// ...
}

PHP Array traversing foreach loop[?

Array
(
[0] => LinkedIn Corporation
[1] => www.linkedin.com
[2] => 2029 Stierlin Ct
Mountain View, CA 94043-4655
United States map
[3] => +1.650.687.3600
[4] => Software & Internet, E-commerce and Internet Businesses
Software & Internet, Data Analytics, Management and Storage
Business Services, HR and Recruiting Services
[5] => 1K - 10K
[6] => > $1B
[7] => Publicly Traded - NASDAQ : LNKD
)
By the way foreach is not a loop but a language construct.
Traverse like this.
foreach($yourarr as $k=>$v)
{
echo "The Key:$k and The Value:$v<br>";
}

Getting values from specified keys

i use imdb api
$homepage = file_get_contents('http://www.imdbapi.com/?i='.$imdbID);
$arr = json_decode($homepage);
In $arr i have all data about related movie.
Maguire [Year] => 1996 [Rated] => R [Released] => 13 Dec 1996 [Genre] => Comedy, Drama, Romance, Sport [Director] => Cameron Crowe [Writer] => Cameron Crowe [Actors] => Tom Cruise, Cuba Gooding Jr., Renée Zellweger, Kelly Preston [Plot] => When a sports agent has a moral epiphany and is fired for expressing it, he decides to put his new philosophy to the test as an independent with the only athlete who stays with him. [Poster] => http://ia.media-imdb.com/images/M/MV5BMTkxNjc2NjQwOF5BMl5BanBnXkFtZTcwMDE2NDU2MQ##._V1_SX320.jpg [Runtime] => 2 hrs 19 mins [Rating] => 7.2 [Votes] => 97329 [ID] => tt0116695 [Response] => True )
what i want is that reaching specified key's value.
Is there a function like that? For example , i will give the key=> Actors and get all actors?
:/ Like this?
$homepage = file_get_contents('http://www.imdbapi.com/?i='.$imdbID);
$arr = json_decode($homepage, true);
print($arr['Actors']);

Categories