PHP Scraping company data using Guzzle - php

I'm trying to audit a vast amount of company data from companycheck.co.uk my current script appears to be looping the first 10 results from only the first page. I had the script gather more than 10 results at one point, but this caused a fatal error after around 600 results (not a timeout error, but a connection error of some sort), I need the script to be more reliable as I'm fetching over 40,000 results.
My code so far:
<?php
set_time_limit(0);
ini_set('max_execution_time', 0);
require 'vendor/autoload.php';
require "Guzzle/guzzle.phar";
// Add this to allow your app to use Guzzle and the Cookie Plugin.
use Guzzle\Http\Client as GuzzleClient;
use Guzzle\Plugin\Cookie\Cookie;
use Guzzle\Plugin\Cookie\CookiePlugin;
use Guzzle\Plugin\Cookie\CookieJar\ArrayCookieJar;
use Guzzle\Plugin\Cookie\CookieJar\CookieJarInterface;
$Pagesurl = 'http://companycheck.co.uk/search/UpdateSearchCompany?searchTerm=cars&type=name';
$pagesData = json_decode(file_get_contents($Pagesurl), true);
$resultsFound = $pagesData["hits"]["found"];
$pages = ceil($resultsFound / 10);
//echo $pages;
echo "<br>";
for ($p = 0; $p < $pages; $p++) {
$url = 'http://companycheck.co.uk/search/UpdateSearchCompany?searchTerm=cars&type=name&companyPage=' . $p . '';
$data = json_decode(file_get_contents($url), true);
for ($i = 0; $i < 11; $i++) {
$id = $data["hits"]["hit"][$i]["id"];
$TradingAddress = $data["hits"]["hit"][$i]["data"]["address"][0];
$companyName = $data["hits"]["hit"][$i]["data"]["companyname"][0];
$companyNumber = $data["hits"]["hit"][$i]["data"]["companynumber"][0];
$finalURL = "http://companycheck.co.uk/company/" . $id . "";
$httpClient = new GuzzleClient($finalURL);
$httpClient->setSslVerification(FALSE);
$cookieJar = new ArrayCookieJar();
// Create a new cookie plugin
$cookiePlugin = new CookiePlugin($cookieJar);
// Add the cookie plugin to the client
$httpClient->addSubscriber($cookiePlugin);
$httpClient->setUserAgent("Opera/9.23 (Windows NT 5.1; U; en-US)");
$request = $httpClient->get($finalURL);
$response = $request->send();
$body = $response->getBody(true);
$matches = array();
preg_match_all('/<table.*?>(.*?)<\/table>/si', $body, $table);
preg_match('/<meta name=\"keywords\" content=\"(.*?)\"\/>/si', $body, $metaName);
preg_match('/<p itemprop="streetAddress".*?>(.*?)<\/p>/si', $body, $regOffice);
echo "<table><tbody>";
echo "<tr><th>Company Name</th><td>";
echo $companyName;
echo "</td></tr>";
echo "<tr><th>Company Number</th><td>";
echo $companyNumber;
echo "</td></tr>";
echo "<tr><th>Registar Address</th><td>";
echo str_replace("<br>", " ", $regOffice[0]);
echo "</td></tr>
<tr><th>Trading Address</th><td>";
echo $TradingAddress;
echo "</td></tr>
<tr>
<th>Director Name</th>
<td>";
$name = explode(',', $metaName[1]);
echo $name[2];
echo "</td>
</tr></tbody></table>";
echo $table[0][1];
echo "<br><br><br>";
}
}
To get each page, I use http://companycheck.co.uk/search/UpdateSearchCompany?searchTerm=cars&type=name&companyPage=1 which returns json for each page from http://companycheck.co.uk/search/results?SearchCompaniesForm[name]=cars&yt1= and some data, but not all.
With this I can get the ID of each company to navigate to each link and scrape some data from the frontend of the site.
For example the first result is:
"hits":{"found":42842,"start":0,"hit":[{"id":"08958547","data":{"address":["THE ALEXANDER SUITE SILK POINT, QUEENS AVENUE, MACCLESFIELD, SK10 2BB"],"assets":[],"assetsnegative":[],"cashatbank":[],"cashatbanknegative":[],"companyname":["CAR2CARS LIMITED"],"companynumber":["08958547"],"dissolved":["0"],"liabilities":[],"liabilitiesnegative":[],"networth":[],"networthnegative":[],"postcode":["SK10 2BB"],"siccode":[]}}
So the first link is: http://companycheck.co.uk/company/08958547
Then from this I can pull table data such as:
Registered Office
THE ALEXANDER SUITE SILK POINT
QUEENS AVENUE
MACCLESFIELD
SK10 2BB
And information from the meta tags such as:
<meta name="keywords" content="CAR2CARS LIMITED, 08958547,INCWISE COMPANY SECRETARIES LIMITED,MR ROBERT CARTER"/>
An example of one of the results returned:
Company Name CAR2CARS LIMITED
Company Number 08958547
Registar Address
THE ALEXANDER SUITE SILK POINT QUEENS AVENUE MACCLESFIELD SK10 2BB
Trading Address THE ALEXANDER SUITE SILK POINT, QUEENS AVENUE, MACCLESFIELD, SK10 2BB
Director Name INCWISE COMPANY SECRETARIES LIMITED
Telephone No telephone number available.
Email Address No email address available.
Contact Person No contact person available.
Business Activity No Business Activity on record.
Each json page contains 10 company IDs to put into the URL to find the company, from each of these companies I need to scrape data from the full URL, then after these 10 move onto the next page and get the next 10 and loop this up until the last page.

It is almost certainly blocking you deliberately due to an excessive number of requests. Try putting a pause in between requests - that might help you fly under their radar.
The website you are intending to scrape appears to be a private company that is reformatting and republishing data from Companies House, the official record of company information in the UK. This company offers an API which allows 10K requests per month, and this is either free or costs GBP200/month, depending on what data you need. Since you want 40K results immediately, it is no wonder they operate IP blocks.
The rights and wrongs of scraping are complicated, but there is an important point to understand: by copying someone else's data, you are attempting to avoid the costs of collating the data yourself. By taking them from someone else's server, you are also adding to their operational costs without reimbursing them, an economic phenomenon known as an externality.
There are some cases where I am sympathetic to passing on costs in this way, such as where the scrape target is engaged in potential market abuse (e.g. monopolistic practices) and scraping has an alleviating effect. I have heard that some airline companies operate anti-scraping devices because they don't want price scrapers to bring prices down. Since bringing prices down is in the interest of the consumer one could argue that the externality can be justified (on moral, if not legal grounds).
In your case, I would suggest obtaining this data directly from Companies House, where it might be available for a much lower cost. In any case, if you republish valuable data obtained from a scrape, having dodged technical attempts to block you, you may find yourself in legal trouble anyway. If in doubt (and if there is no moral or public interest defence such as I outlined earlier) get in touch with the site operator and ask if what you want to do is OK.

Related

How can we make multiple php http request at the same time asynchronously?

I'm currently working on a project with my friends,
so let me explain:
We have a mySql database filled with english postcode from London, one table with universities, and one with hosts, what we want is to actually calculate the public transport travel time between all the host and the universities and save it into another table of the database that will have the host postcode, the university post code and the travel time between the both on one line, and etc...
For that we are doing http request to the tfl API that return to us a JSON with all the travel details (and of course the travel time), that we then decode and keep only what we want (travel time).
The problem is that we have a quite big database with almost 250 host and 800 universities that give us around 200 000 request and a too long process time to be used (with the api response time and the php treatment, around 19h)
We tried to see if we could use the cURL method to split the process between multiple loop so that we can divide the process time by the number of cURL we made but we can't manage to figure how we can do that...
The final goal is to make a small local app that when we select one university it give us the nearests 10 hosts in public transport.
Does anyone have any experience with that kind of things and can help us ?
Here is what we have right now :
//postCodeUni list contains all the universites objects
foreach ($postCodeUni as $uniPostCode) {
//here we take the postcode from the university object
$uni = $uniPostCode['Postcode'];
//postCodeHost list contains all the host objects
foreach ($postCodeHost as $hostPostCode) {
//here we take the postcode from the host object
$host = $hostPostCode['Postcode'];
//here we make an http request to the tfl api that return us a journey between the two post codes (a json with all the journey details)
$data = json_decode(file_get_contents('https://api.tfl.gov.uk/journey/journeyresults/' . $uni . '/to/' . $host . '?app_key=a59c7dbb0d51419d8d3f9dfbf09bd5cc'), true);
//here we save the multiple duration times (because there is different ways to travel between two point with public transport)
$duration = $data['journeys'];
$tableTemp = [];
foreach ($duration as $durations) {
$durationns = $durations['duration'];
array_push($tableTemp, $durationns);
}
//We then take the shorter one
$min = min($tableTemp);
echo "Shorter travel time : " . $min . " of travel between " . $uni . " and ". $host . " . <br>";
echo "<br>";
//We then save this time in a table that will contain the travel time of all the journeys to do comparaison
array_push($tableAllRequest, array($uni . " and ". $host => $min));
}
}
There are many ways to achieve this however the easiest imo would be to use Guzzle Async (cURL multi interface under the hood). Take a look at this answer - Guzzle async requests not really async? example below,
<?php
use GuzzleHttp\Promise;
use GuzzleHttp\Client;
$client = new Client(['base_uri' => 'http://httpbin.org/']);
// Initiate each request but do not block
$promises = [
'image' => $client->getAsync('/image'),
'png' => $client->getAsync('/image/png'),
'jpeg' => $client->getAsync('/image/jpeg'),
'webp' => $client->getAsync('/image/webp')
];
// Wait on all of the requests to complete. Throws a ConnectException
// if any of the requests fail
$results = Promise\unwrap($promises);
// Wait for the requests to complete, even if some of them fail
$results = Promise\settle($promises)->wait();
// Loop through each response in the results and fetch data etc
foreach($results as $promiseKey => $result) {
// Data response
$dataOfResponse = ($result['value']->getBody()->getContents());
// Status
echo $promiseKey . ':' . $result['value']->getStatusCode() . "\r\n";
}

How to invoke the demo url using VinceG php-first-data-api

I am trying to integrate First Data e4 Gateway using PHP. I downloaded the VinceG/php-first-data-api PHP First Data Service API class. The code comes with some examples.
I have my Terminal ID (API_LOGIN) and Password (32 character string).
What confuses me is that when I use one of the examples, I don't know how to tell the class that I want to use the demo url, not the production url.
The class comes with two constants:
const LIVE_API_URL = 'https://api.globalgatewaye4.firstdata.com/transaction/';
const TEST_API_URL = 'https://api.demo.globalgatewaye4.firstdata.com/transaction/';
In the First Data console, when I generated my password, it said to use the v12 api, /transaction/v12, so I changed the protected $apiVersion = 'v12';
All I want to do is write my first development transaction using First Data e4. I have yet to get any kind of response. Obviously I need a lot of hand holding to get started.
When I set up a website to use BalancedPayments, they have a support forum that's pretty good, and I was able to get that running fairly quickly. First Data has a lot of documentation, but for some reason not much of it has good PHP examples.
My hope is that some expert has already mastered the VinceG/php-first-data-api, and can help me write one script that works.
Here's the pre-auth code I'm using, that invokes the FirstData class:
// Pre Auth Transaction Type
define("API_LOGIN", "B123456-01");
define("API_KEY", "xxxxxxxxxxyyyyyyyyyyyyzzzzzzzzzz");
$data = array();
$data['type'] = "00";
$data['number'] = "4111111111111111";
$data['name'] = "Cyrus Vance";
$data['exp'] = "0618";
$data['amount'] = "100.00";
$data['zip'] = "33333";
$data['cvv'] = "123";
$data['address'] = "1111 OCEAN BLVD MIAMI FL";
$orderId = "0001";
require_once("FirstData.php");
$firstData = new FirstData(API_LOGIN, API_KEY, true);
// Charge
$firstData->setTransactionType(FirstData::TRAN_PREAUTH);
$firstData->setCreditCardType($data['type'])
->setCreditCardNumber($data['number'])
->setCreditCardName($data['name'])
->setCreditCardExpiration($data['exp'])
->setAmount($data['amount'])
->setReferenceNumber($orderId);
if($data['zip']) {
$firstData->setCreditCardZipCode($data['zip']);
}
if($data['cvv']) {
$firstData->setCreditCardVerification($data['cvv']);
}
if($data['address']) {
$firstData->setCreditCardAddress($data['address']);
}
$firstData->process();
// Check
if($firstData->isError()) {
echo "!!!";
// there was an error
} else {
echo "###";
// transaction passed
}
My number one problem was that I had not created (applied for, with instant approval) a
demo account on First Data. I didn't realize this was a separate thing on First Data. On Balanced Payments, for instance, you have one account, and you can run your script on a test url with test values.
From the Administration panel, click "Terminals", then your Gateway number on the ECOMM row (will look something like AH1234-03), then you have to click "Generate" on password save it to your personal notes), then click UPDATE.
Now replace your parameter values in your test scripts. I use a variable assignment block that looks something like this:
define("API_LOGIN", "AH1234-05"); //fake
define("API_KEY", "44p7797xxx790098z1z2n6f270ys1z0x"); //fake
$data = array();
$data['type'] = "03";
$data['number'] = "4111111111111111";
$data['name'] = "Cyrus Vancce";
$data['exp'] = "0618";
$data['amount'] = "100.00";
$data['zip'] = "33320";
$data['cvv'] = "123";
$data['address'] = "1234 N OCEAN BLVD MIAMI BEACH FL";
$orderId = "0001";
require_once("FirstData.php");
$firstData = new FirstData(API_LOGIN, API_KEY, true);
at the end of the VinceG test scripts, I output my gateway response with a print_r, like this:
$firstData->process();
// Check
if($firstData->isError()) {
echo "!!!";
// there was an error
} else {
echo "###";
// transaction passed
}
echo "<pre>";
print_r($firstData);

How to properly store and lookup areacode from a phone number

How do I store areacodes (npa-nxx) in a database for fast lookup?
Here's the deal: I have a variable that contains a phone number and I need to look in a database for the city attached to that phone number.
The problem is, different countries have different formats.
Canada/USA: +19055551234 (+1 > Country, 905 > Area Code, 555 > City Code)
France: +33512345678 (+33 > Country, 5 > Areacode, 1 > City, Other numbers > Subscriber number)
and so on (infos based on wikipedia)
I created a table called 'npanxx' that contain the list of area codes, city code and city attached to each one (with the id of the country and the province/state id):
CountryId, RegionId, PrimaryCity, npa, nxx, fnpanxx
1 11 Acton Vale 450 236 +1450236
I am thinking about the following procedure:
Get all country codes from sql to php array
Go through each entry and check if there's a match from the beginning of the phone number
When (If there's) a match is found
Remove the beginning of the phone number
Get all npa-nxx that belong to that contry and put them in a php array
Go through each value of the array to find a matching beginning
When (If there's) a match is found
Remove the beginning of the phone number
Store data in different variables like: $country = 'Canada'; $city = 'Acton Vale'...
etc, etc.
First mistake (I think): To much database requests (the npanxx table contain 3000 records for only one province in Canada)
Second mistake: I'm pretty sure there's no need to go through each and every npa-nxx code
another problem: It's not sure that if the phone number is a France one that this procedure will work.
And... If there's an entry for, let's say 336 and another for 3364, it might give the wrong result.
Do you have any idea how I can solve this problem ? (I don't ask for any code, I don't want to to do the work for me, I would like some clues though)
This is for a personnel project to make donation for Multiple Sclerosis Society of Canada and would really like to finish that project :)
I would think maybe some kind of set of reg-exes or other pattern matches to whiddle down your options in terms of search. Just some basic way or "guessing" at the possibilities instead of searching all of them.
Here's a small script I wrote in PHP to return the NPA/NXX as a JSON object in real time from area-codes.com.
It returns some very useful data. It's only for the NANP, so it doesn't do so well trying to discern international calls. For that, I would suggest making a table of all international country codes and the appropriate methods to dial them, internationally.
Additionally, network exchange operators demand an international dial code (like 011 for the USA, or + for cell phones, in general) to figure out if the number is international, and then take the steps, above, to figure out where you're trying to go. You could add this constraint into the input field and be done with it.
If you're trying to just get NPA/NXX information in the North American Numbering Plan, though, this script should be very helpful.
Just an aside, the area-codes.com counts online lookups among their free services, and I have found nothing on the site to suggest that this code violates that policy. But this code can be retooled to gather data from other providers, none-the-less.
<?php
// Small script to return and format all data from the NPA/NXX info site www.area-codes.com
// Returns a JSON object.
error_reporting(E_NONE);
$npa = $_GET['npa'];
$nxx = $_GET['nxx'];
function parseInput($input) {
$v = new DOMDocument();
$v->formatOutput = true;
$v->preserveWhiteSpace = false;
$v->loadHTML($input);
$list = $v->getElementsByTagName("td");
$e = false;
$dataOut = array();
$p = "";
foreach($list as $objNode) {
if (!$e) {
$p = $objNode->nodeValue;
$p = strtolower($p);
$p = preg_replace("%[+ .:()\/_-]%", "", $p);
$p = str_replace("\xc2\xa0", "", $p);
$p = trim($p);
}
else {
if ($p != "") {
$d = trim($objNode->nodeValue);
if ($d != "") $dataOut[$p] = $d;
}
$p = "";
}
$e = !$e;
}
return $dataOut;
}
function getNPANXX($npa, $nxx) {
$url = "www.area-codes.com/exchange/exchange.asp?npa=$npa&nxx=$nxx";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
curl_close($ch);
$i = strpos($response, "<h3>AreaCode/Prefix $npa-$nxx Details</h3>");
$i = strpos($response, "<table width=\"100%\" border=\"0\" cellpadding=\"2\" cellspacing=\"0\">", $i);
$e = strpos($response, "</table>", $i);
$scan = substr($response, $i, ($e-$i) + 8);
return parseInput($scan);
}
$result = getNPANXX($npa, $nxx);
if (!isset($result['npaareacode'])) {
$result = array("error" => "invalid");
}
echo json_encode($result);
die;
?>
For the query npanxx.php?npa=202&nxx=520 the JSON outputs as follows:
{
"npaareacode":"202",
"nxxusetype":"WIRELESS",
"nxxprefix":"520",
"nxxintroversion":"11\/16\/2007",
"city":"WASHINGTON",
"state":"DC",
"latitude":"38.901",
"county":"DISTRICT OF COLUMBIA",
"longitude":"-77.0315",
"countypopulation":"0",
"lata":"236",
"zipcode":"20005",
"zipcodecount":"0",
"ratecenter":"WSHNGTNZN1",
"zipcodefreq":"0",
"fips":"11001",
"ocn":"6664",
"observesdst":"Unknown",
"cbsacode":"47900",
"timezone":"Eastern (GMT -05:00)",
"cbsaname":"Washington-Arlington-Alexandria, DC-VA-MD-WV",
"carriercompany":"SPRINT SPECTRUM L.P."
}
For your example npanxx.php?npa=450&nxx=236 the data returned is a little bit limited because it's Canada and Canada doesn't provide all the FIPS and carrier data like the United States does, but the returned data still quite useful:
{
"npaareacode":"450",
"nxxusetype":"WIRELESS",
"nxxprefix":"236",
"nxxintroversion":"2002-08-04",
"city":"ACTON VALE",
"state":"QC",
"latitude":"45.6523",
"longitude":"-72.5671",
"countypopulation":"51400",
"lata":"850",
"zipcodecount":"0",
"zipcodefreq":"-1",
"observesdst":"Unknown",
"timezone":"Eastern (GMT -05:00)"
}

Get live NFL scores/stats to read and manipulate?

I need some sort of database or feed to access live scores(and possibly player stats) for the NFL. I want to be able to display the scores on my site for my pickem league and show the users if their pick is winning or not.
I'm not sure how to go about this. Can someone point me in the right direction?
Also, it needs to be free.
Disclaimer: I'm the author of the tools I'm about to promote.
Over the past year, I've written a couple Python libraries that will do what you want. The first is nflgame, which gathers game data (including play-by-play) from NFL.com's GameCenter JSON feed. This includes active games where data is updated roughly every 15 seconds. nflgame has a wiki with some tips on getting started.
I released nflgame last year, and used it throughout last season. I think it is reasonably stable.
Over this past summer, I've worked on its more mature brother, nfldb. nfldb provides access to the same kind of data nflgame does, except it keeps everything stored in a relational database. nfldb also has a wiki, although it isn't entirely complete yet.
For example, this will output all current games and their scores:
import nfldb
db = nfldb.connect()
phase, year, week = nfldb.current(db)
q = nfldb.Query(db).game(season_year=year, season_type=phase, week=week)
for g in q.as_games():
print '%s (%d) at %s (%d)' % (g.home_team, g.home_score,
g.away_team, g.away_score)
Since no games are being played, that outputs all games for next week with 0 scores. This is the output with week=1: (of the 2013 season)
CLE (10) at MIA (23)
DET (34) at MIN (24)
NYJ (18) at TB (17)
BUF (21) at NE (23)
SD (28) at HOU (31)
STL (27) at ARI (24)
SF (34) at GB (28)
DAL (36) at NYG (31)
WAS (27) at PHI (33)
DEN (49) at BAL (27)
CHI (24) at CIN (21)
IND (21) at OAK (17)
JAC (2) at KC (28)
PIT (9) at TEN (16)
NO (23) at ATL (17)
CAR (7) at SEA (12)
Both are licensed under the WTFPL and are free to use for any purpose.
N.B. I realized you tagged this as PHP, but perhaps this will point you in the right direction. In particular, you could use nfldb to maintain a PostgreSQL database and query it with your PHP program.
So I found something that gives me MOST of what I was looking for. It has live game stats, but doesn't include current down, yards to go, and field position.
Regular Season:
http://www.nfl.com/liveupdate/scorestrip/ss.xml
Post Season:
http://www.nfl.com/liveupdate/scorestrip/postseason/ss.xml
I'd still like to find a live player stat feed to use to add Fantasy Football to my website, but I don't think a free one exists.
I know this is old, but this is what I use for scores only... maybe it will help someone some day. Note: there are some elements that you will not use and are specific for my site... but this would be a very good start for someone.
<?php
require('includes/application_top.php');
$week = (int)$_GET['week'];
//load source code, depending on the current week, of the website into a variable as a string
$url = "http://www.nfl.com/liveupdate/scorestrip/ss.xml"; //LIVE GAMES
if ($xmlData = file_get_contents($url)) {
$xml = simplexml_load_string($xmlData);
$json = json_encode($xml);
$games = json_decode($json, true);
}
$teamCodes = array(
'JAC' => 'JAX',
);
//build scores array, to group teams and scores together in games
$scores = array();
foreach ($games['gms']['g'] as $gameArray) {
$game = $gameArray['#attributes'];
//ONLY PULL SCORES FROM COMPLETED GAMES - F=FINAL, FO=FINAL OVERTIME
if ($game['q'] == 'F' || $game['q'] == 'FO') {
$overtime = (($game['q'] == 'FO') ? 1 : 0);
$away_team = $game['v'];
$home_team = $game['h'];
foreach ($teamCodes as $espnCode => $nflpCode) {
if ($away_team == $espnCode) $away_team = $nflpCode;
if ($home_team == $espnCode) $home_team = $nflpCode;
}
$away_score = (int)$game['vs'];
$home_score = (int)$game['hs'];
$winner = ($away_score > $home_score) ? $away_team : $home_team;
$gameID = getGameIDByTeamID($week, $home_team);
if (is_numeric(strip_tags($home_score)) && is_numeric(strip_tags($away_score))) {
$scores[] = array(
'gameID' => $gameID,
'awayteam' => $away_team,
'visitorScore' => $away_score,
'hometeam' => $home_team,
'homeScore' => $home_score,
'overtime' => $overtime,
'winner' => $winner
);
}
}
}
//see how the scores array looks
//echo '<pre>' . print_r($scores, true) . '</pre>';
echo json_encode($scores);
//game results and winning teams can now be accessed from the scores array
//e.g. $scores[0]['awayteam'] contains the name of the away team (['awayteam'] part) from the first game on the page ([0] part)
I've spent the last year or so working on a simple CLI tool to easily create your own NFL databases. It currently supports PostgreSql and Mongo natively, and you can programmatically interact with the Engine if you'd like to extend it.
Want to create your own different database (eg MySql) using the Engine (or even use Postgres/Mongo but with your own schema)? Simply implement an interface and the Engine will do the work for you.
Running everything, including the database setup and updating with all the latest stats, can be done in a single command:
ffdb setup
I know this question is old, but I also realize that there's still a need out there for a functional and easy-to-use tool to do this. The entire reason I built this is to power my own football app in the near future, and hopefully this can help others.
Also, because the question is fairly old, a lot of the answers are not working at the current time, or reference projects that are no longer maintained.
Check out the github repo page for full details on how to download the program, the CLI commands, and other information:
FFDB Github Repository
$XML = "http://www.nfl.com/liveupdate/scorestrip/ss.xml";
$lineXML = file_get_contents($XML);
$subject = $lineXML;
//match and capture week then print
$week='/w="([0-9])/';
preg_match_all($week, $subject, $week);
echo "week ".$week[1][0]."<br/>";
$week2=$week[1][0];
echo $week2;
//capture team, scores in two dimensional array
$pattern = '/hnn="(.+)"\shs="([0-9]+)"\sv="[A-Z]+"\svnn="(.+)"\svs="([0-9]+)/';
preg_match_all($pattern, $subject, $matches);
//enumerate length of array (number games played)
$count= count($matches[0]);
//print array values
for ($x = 0; $x < $count ; $x++) {
echo"<br/>";
//print home team
echo $matches[1][$x]," ",
//print home score
$matches[2][$x]," ",
//print visitor team
$matches[3][$x]," ",
//print visitor score
$matches[4][$x];
echo "<br/>";
}
I was going through problems finding a new source for the 2021 season. Well I finally found one on ESPN.
http://site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard
Returns the results in JSON format.
I recommend registering at http://developer.espn.com and get access to their JSON API. It just took me 5 minutes and they have documentation to make pretty much any call you need.

Google Adwords Incorrect statistics

I'm using the Google AdWords PHP API to access statistics from our account. However, I'm getting some really strange read outs from the statistics through the api. I'm trying to access the stats for individuals Ads or Adgroups. The statistics returned, however, are way off what they are in the client center. The code I'm using:
$user->SetClientCustomerId($clientId);
$adService = $user->GetService("AdGroupAdService", ADWORDS_VERSION);
$selector = new Selector();
$selector->fields = array("Id", "Name", "Clicks", "Impressions", "Cost");
$selector->predicates[] = new Predicate("AdGroupId", "IN", array($adGroupId));
$selector->dateRange = $dateRange;
$selector->paging = new Paging(0, AdWordsConstants::RECOMMENDED_PAGE_SIZE);
do {
// Make the get request.
$page = $adService->get($selector);
if (isset($page->entries)) {
foreach ($page->entries as $ad) {
$newLineObject->adName = $ad->name;
$newLineObject->clicks = $ad->ad->AdStats->clicks;
$newLineObject->impressions = $ad->adStats->impressions;
$newLineObject->cost = $ad->ad->AdStats->cost->microAmount/ AdWordsConstants::MICROS_PER_DOLLAR;
}
}
else {
print "No matching ads were found.\n";
}
$selector->paging->startIndex += AdWordsConstants::RECOMMENDED_PAGE_SIZE;
} while ($page->totalNumEntries > $selector->paging->startIndex);
When I print the results I get numbers that are considerably larger than those displayed in the client center. For example, for one partiuclar Ad the API reported 2.000.000 impressions, while the client center showed 56.000.
What am I doing wrong?
Your code seems correct to me. However, you problem may be that your date range in your code is different to the one you see in your client center. Make sure that you keep the same date range when you cross check.
Having tried using the method detailed above extensively, I have altered my code completely. I now use AdHoc Reporting (described here https://developers.google.com/adwords/api/docs/guides/reporting). This method was suggested to me by an AdWords developer. While this does not literally solve my question (i.e. why does the above code return incorrect statistics), it does provide an easy and clean way to obtain the data correctly.

Categories