Get my Apps Data iTunes Connect - php

I'm looking for a script or series of scripts that download and parse iTunes Connect sales data and AppStore comments, ratings and rankings data for a defined app. I want to get my apps data on my web site .

You can do this by making following request in PHP
$response = file_get_contents("https://itunes.apple.com/lookup?id={YOUR_APP_ID}&entity=software");
For Example:
$response = file_get_contents("https://itunes.apple.com/lookup?id=317469184&entity=software");
Will give following JSON output:
{
"resultCount":1,
"results": [
{
"kind":"software", "features":[],
"supportedDevices":["iPad23G", "iPhone5s", "iPad2Wifi", "iPhone5c", "iPadThirdGen", "iPadFourthGen4G", "iPhone4", "iPadMini", "iPadThirdGen4G", "iPadFourthGen", "iPhone5", "iPodTouchFifthGen", "iPadMini4G", "iPhone4S"],
"isGameCenterEnabled":false,
"screenshotUrls":["http://a2.mzstatic.com/us/r30/Purple/v4/25/c1/45/25c145d7-5272-4f41-536f-c5744cd0c61e/screen1136x1136.jpeg", "http://a4.mzstatic.com/us/r30/Purple/v4/61/3c/0c/613c0c2e-1263-c54c-8658-2de8e6a0dd4b/screen1136x1136.jpeg", "http://a5.mzstatic.com/us/r30/Purple/v4/fa/19/ad/fa19ad92-85cc-c4bf-921b-f575a543327c/screen1136x1136.jpeg", "http://a1.mzstatic.com/us/r30/Purple4/v4/3d/5d/97/3d5d9701-f6d9-cc12-7e91-b9ede19e9776/screen1136x1136.jpeg", "http://a2.mzstatic.com/us/r30/Purple6/v4/52/cc/ee/52cceed3-5837-9987-4c10-7cd02bf33b1e/screen1136x1136.jpeg"], "ipadScreenshotUrls":[], "artworkUrl60":"http://a1156.phobos.apple.com/us/r30/Purple2/v4/54/93/ac/5493acfd-1a6e-7960-dda5-7de005232d35/AppIcon57x57.png", "artworkUrl512":"http://a1105.phobos.apple.com/us/r30/Purple4/v4/95/77/8b/95778b7e-8897-3b68-b8f9-134f34531b25/mzl.sobbgbbg.png", "artistViewUrl":"https://itunes.apple.com/us/artist/espn/id317469187?uo=4", "artistId":317469187, "artistName":"ESPN", "price":0.00, "version":"4.0.5",
"description":"Introducing the all-new SportsCenter app, a supercharged update to the popular ScoreCenter app packed with live scores, breaking news, video highlights, in-depth analysis, personalized alerts and more. What more could any sports fan ask for? \n\nFeatures include: \n- Instant scores and updates on the biggest games of the day as well as your favorite teams \n- Breaking news and analysis across hundreds of leagues and teams, all powered by ESPN's authoritative newsroom \n- Dozens of notification options: never miss another kickoff, scoring play, substitution, final whistle or tidbit of breaking news \n- Add, edit and remove favorite teams quickly and easily for a customized experience throughout \n- Deep Twitter integration for a social perspective on news, rumors and gossip", "currency":"USD", "genres":["Sports", "Entertainment"], "genreIds":["6004", "6016"], "releaseDate":"2009-06-02T07:00:00Z", "sellerName":"ESPN Inc.", "bundleId":"com.espn.ScoreCenter", "trackId":317469184, "trackName":"ESPN SportsCenter", "primaryGenreName":"Sports", "primaryGenreId":6004,
"releaseNotes":"- In-game highlights added to game pages during the Live game\n- SportsCenter TV Graphics now inside the app\n- Support for World Cup videos and games\n- Easily share video right from the News Feed\n- Improved performance \n- Enhanced design\n- The latest Breaking News", "minimumOsVersion":"7.0", "formattedPrice":"Free", "wrapperType":"software", "trackCensoredName":"ESPN SportsCenter", "languageCodesISO2A":["NB", "DA", "NL", "EN", "FR", "DE", "IT", "NN", "ES"], "fileSizeBytes":"18953341", "sellerUrl":"http://www.espn.com", "contentAdvisoryRating":"4+", "averageUserRatingForCurrentVersion":3.5, "userRatingCountForCurrentVersion":4317, "artworkUrl100":"http://a1105.phobos.apple.com/us/r30/Purple4/v4/95/77/8b/95778b7e-8897-3b68-b8f9-134f34531b25/mzl.sobbgbbg.png", "trackViewUrl":"https://itunes.apple.com/us/app/espn-sportscenter/id317469184?mt=8&uo=4", "trackContentRating":"4+", "averageUserRating":3.5, "userRatingCount":264216
}
]
}
Reference Link

Related

xml reading issue coming from third party

i have created a script in php/laravel which downloads the thousands of xml files from the third party server everyday and i don't have any control over their xml files. i am extracting them, reading and inserting them into my database. one of the xml file is throwing an error.
local.ERROR: XMLReader::readOuterXml(): DRHBN.xml:17106: parser error : Couldn't find end of Start Tag commo {"exception":"[object] (ErrorException(code: 0): XMLReader::readOuterXml(): DRHBN.xml:17106: parser error : Couldn't find end of Start Tag commo at
and here is the xml line number 17106 from DRHBN.xml
<Listing><Address><commons:preference-order>1</commons:preference-order><commons:address-preference-order>1</commons:address-preference-order><commons:FullStreetAddress>6100 Goldenseal Ct. NW</commons:FullStreetAddress><commons:UnitNumber>Plan: The Oakwood</commons:UnitNumber><commons:City>Albuquerque</commons:City><commons:StateOrProvince>NM</commons:StateOrProvince><commons:PostalCode>87120</commons:PostalCode><commons:Country>US</commons:Country></Address><ListPrice commons:isgSecurityClass="Public">351990</ListPrice><ListingURL>https://listings.listhub.net/pages/DRHBN/94121-4714/?channel=visualshows</ListingURL><ProviderName>D.R. Horton Homes</ProviderName><ProviderURL>https://www.drhorton.com</ProviderURL><ProviderCategory>HomeBuilder</ProviderCategory><LeadRoutingEmail>infoabq#drhorton.com</LeadRoutingEmail><Bedrooms>3</Bedrooms><Bathrooms>2</Bathrooms><PropertyType otherDescription="Single Family">Residential</PropertyType><PropertySubType otherDescription="Single Family">Single Family Detached</PropertySubType><ListingKey>3yd-DRHBN-94121-4714</ListingKey><ListingCategory>Purchase</ListingCategory><ListingStatus>Active</ListingStatus><MarketingInformation><commons:PermitAddressOnInternet commons:isgSecurityClass="Public">true</commons:PermitAddressOnInternet><commons:VOWAddressDisplay commons:isgSecurityClass="Public">true</commons:VOWAddressDisplay><commons:VOWAutomatedValuationDisplay commons:isgSecurityClass="Public">true</commons:VOWAutomatedValuationDisplay><commons:VOWConsumerComment commons:isgSecurityClass="Public">true</commons:VOWConsumerComment></MarketingInformation><Photos><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/1?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/2?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/3?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/4?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/5?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/6?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/7?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/8?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/9?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/10?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/11?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/12?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/13?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/14?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/15?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/16?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/17?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/18?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/19?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/20?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/21?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/22?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/23?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/24?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/25?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/26?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/27?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/28?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/29?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/30?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/31?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/32?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/33?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/34?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/35?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/36?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/37?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/38?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/39?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/40?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/41?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/42?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/43?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/44?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/45?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/46?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/47?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/48?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/49?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/50?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/51?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/52?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/53?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/54?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/55?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/56?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/57?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/58?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/59?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/60?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/61?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/62?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/63?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/64?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/65?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/66?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/67?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/68?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/69?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/70?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/71?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/72?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/73?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/74?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/75?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/76?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/77?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/78?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/79?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/80?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/81?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/82?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/83?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/84?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/85?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/86?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/87?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/88?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/89?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/90?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/91?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/92?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/93?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/94?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/95?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/96?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/97?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/98?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/99?lm=20210202T110438</MediaURL></Photo><Photo><MediaModificationTimestamp commons:isgSecurityClass="Public">2021-02-02T11:04:38+00:00</MediaModificationTimestamp><MediaURL>http://photos.listhub.net/DRHBN/94121-4714/100?lm=20210202T110438</MediaURL></Photo></Photos><DiscloseAddress>true</DiscloseAddress><ListingDescription>The Oakwood is a thoughtfully designed single-story home with three bedrooms and two bathrooms. The covered entry invites guests to your home. The tech room is found off the foyer for a second living area or office. The split design has the largest bedroom separate from the secondary bedrooms. The kitchen features a corner pantry and island and looks out over the spacious great room and dining area. The largest bedroom includes both a bathtub and a walk-in shower, with a large walk-in closet as well. The covered patio is a great place to entertain or just relax in your backyard.</ListingDescription><MlsId>DRHBN</MlsId><MlsName>D.R. Horton</MlsName><MlsNumber>94121+4714</MlsNumber><LivingArea>2105</LivingArea><ListingTitle>The Oakwood</ListingTitle><FullBathrooms>2</FullBathrooms><ThreeQuarterBathrooms>0</ThreeQuarterBathrooms><HalfBathrooms>0</HalfBathrooms><OneQuarterBathrooms>0</OneQuarterBathrooms><PartialBathrooms>0</PartialBathrooms><ListingParticipants><Participant><ParticipantKey>3yd-DRHBN-94121</ParticipantKey><ParticipantId>94121</ParticipantId><FirstName>Online</FirstName><LastName>Sales Counselor</LastName><Role>Listing</Role><PrimaryContactPhone>5057501209</PrimaryContactPhone><Email>infoabq#drhorton.com</Email><WebsiteURL>https://www.drhorton.com/new-mexico/albuquerque/albuquerque/la-cuentista.aspx</WebsiteURL></Participant></ListingParticipants><Offices><Office><OfficeKey>3yd-DRHBN-940</OfficeKey><OfficeId>940</OfficeId><OfficeCode><OfficeCodeId>940</OfficeCodeId></OfficeCode><Name>D.R. Horton - Albuquerque</Name><CorporateName>D.R. Horton - Albuquerque</CorporateName><BrokerId>divisions</BrokerId><Address><commons:preference-order>1</commons:preference-order><commons:address-preference-order>1</commons:address-preference-order><commons:FullStreetAddress>6100 Goldenseal Ct. NW</commons:FullStreetAddress><commons:City>Albuquerque</commons:City><commons:StateOrProvince>NM</commons:StateOrProvince><commons:PostalCode>87120</commons:PostalCode><commons:Country>US</commons:Country></Address><Website>https://www.drhorton.com</Website></Office></Offices><Brokerage><Name>D.R. Horton Homes</Name></Brokerage><Builder><Name>D.R. Horton - Albuquerque</Name><WebsiteURL>https://www.drhorton.com</WebsiteURL></Builder><Location><Latitude>35.17669</Latitude><Longitude>-106.7101</Longitude><Community><commons:Subdivision commons:isgSecurityClass="Public">La Cuentista</commons:Subdivision><commons:Schools><commons:School><commons:Name>Sunset View Elementary School</commons:Name><commons:SchoolCategory>Elementary</commons:SchoolCategory><commons:District commons:isgSecurityClass="Public">Albuquerque Public Schools</commons:District></commons:School><commons:School><commons:Name>Volcano Vista High School</commons:Name><commons:SchoolCategory>High</commons:SchoolCategory><commons:District commons:isgSecurityClass="Public">Albuquerque Public Schools</commons:District></commons:School><commons:School><commons:SchoolCategory>JuniorHigh</commons:SchoolCategory><commons:District commons:isgSecurityClass="Public">Albuquerque Public Schools</commons:District></commons:School><commons:School><commons:Name>Tony Hillerman Middle School</commons:Name><commons:SchoolCategory>Middle</commons:SchoolCategory><commons:District commons:isgSecurityClass="Public">Albuquerque Public Schools</commons:District></commons:School></commons:Schools></Community><Neighborhoods><Neighborhood><Name>La Cuentista</Name><Description>La Cuentista is the Westsides hottest community! Offering D.R Hortons most popular single-story floor plans from 1,746 to 2,482 square feet with views of the Sandia and city lights. Homes start in the low-$300,000s and offer the most popular, in-demand options like the Multi-Gen floor plan, 9 ceilings with 8 doors, hard tile living rooms and covered patios standard. Located on the Northwest side of Albuquerque, only minutes from I-40, Montano, Coors and Paseo Del Norte. Perfect location for commuting to Downtown or Northeast AlbuquerqueThe La Cuentista community is just a short distance away from Volcano Vista High School and has a variety of shopping centers nearby. Due to it's location near the edge of Albuquerque, there are many recreational activities that are easily accessible such as the Petroglyph National Park. Give us a call and schedule a tour of your dream home!When you choose Americas Builder to construct your home, you select the features and location that work best for you. Below is a list of reasons why its a smart idea to choose the largest home builder in the nation: The benefits of high quality, new home construction include the most advanced technology, energy efficiencies and building standards. A robust new home warranty provides buyers peace of mind. New houses typically dont require the upgrades and maintenance needed in pre-owned homes before moving in or during the initial years of ownership.</Description></Neighborhood></Neighborhoods></Location><DetailedCharacteristics><ArchitectureStyle otherDescription="">Other</ArchitectureStyle><IsNewConstruction>true</IsNewConstruction><NumFloors>1.0</NumFloors><NumParkingSpaces>2</NumParkingSpaces><RoofTypes><RoofType>Unknown</RoofType></RoofTypes><Rooms><Room>Bedroom</Room><Room>Bedroom</Room><Room>Bedroom</Room><Room>Full Bath</Room><Room>Full Bath</Room></Rooms></DetailedCharacteristics><ModificationTimestamp commons:isgSecurityClass="Public">2021-02-06T21:31:14+00:00</ModificationTimestamp><Disclaimer commons:isgSecurityClass="Public">Copyright © 2021 D.R. Horton. All rights reserved. All information provided by the listing agent/broker is deemed reliable but is not guaranteed and should be independently verified.</Disclaimer></Listing>
how do i get any ride of this invalid xml issue?

How to get book title from ISBN with Knowledge Graph?

I volunteer in a communal library and I'm in charge of the digital transition.
I'm using the free and open-source software PMB and I want to automate the retrieval of book titles with the Knowledge Graph API (which is not possible with PMB, or I missed something).
Why to use Knowledge Graph instead of ISBNdb or another free ISBN API ? Because none is as complete and qualitative as KG.
For example: I take the ISBN of a French book : 9782884613736 ("Le foot illustré de A à Z").
Not found on ISBNdb.com, etc.
So, on when I google it, the Knowledge Graph returns me exactly what I want :
> Screenshot of what I see
But when i'm using the API :
GET https://kgsearch.googleapis.com/v1/entities:search?languages=fr&query=9782884613736&types=Book&key={YOUR_API_KEY}
{
"#context": {
"#vocab": "http://schema.org/",
"goog": "http://schema.googleapis.com/",
"EntitySearchResult": "goog:EntitySearchResult",
"detailedDescription": "goog:detailedDescription",
"kg": "http://g.co/kg"
},
"#type": "ItemList",
"itemListElement": [
]
}
Nothing returned to my GET request. (It works properly if I request the book title, it returns well the informations)
I tried with different types according to schema.org : Book, BookSeries, BookFormatType.
Is there a way to use KG API as I want ?
I'm totally open to all suggestions (even to use another method to reach my aim).
Thank you.
The ISBN seems to be wrong or doesn't exits on google DB, here's a sample using google books api:
ISBN_10 : 2884610154
https://www.googleapis.com/books/v1/volumes?languages=fr&q=isbn:2884610154
ISBN_13 : 9782884610155
https://www.googleapis.com/books/v1/volumes?languages=fr&q=isbn:9782884610155

How to recognise adult content programmatically?

I am currently developing a website for a client. It consists of users being able to upload pictures to be shown in a gallery on the site.
The problem we have is that when a user uploads an image it would obviously need to be verified to make sure it is safe for the website (no pornographic or explicit pictures). However my client would not like to manually have to accept every image that is being uploaded as this would be time consuming and the users' images would not instantly be online.
I am writing my code in PHP. If needs be I could change to ASP.net or C#. Is there any way that this can be done?
2019 Update
A lot has changed since this original answer way back in 2013, the main thing being machine learning. There are now a number of libraries and API's available for programmatically detecting adult content:
Google Cloud Vision API, which uses the same models Google uses for safe search.
NSFWJS uses TensorFlow.js claims to achieve ~90% accuracy and is open source under MIT license.
Yahoo has a solution called Open NSFW under the BSD 2 clause license.
2013 Answer
There is a JavaScript library called nude.js which is for this, although I have never used it. Here is a demo of it in use.
There is also PORNsweeper.
Another option is to "outsource" the moderation work using something like Amazon Mechanical Turk, which is a crowdsourced platform which "enables computer programs to co-ordinate the use of human intelligence to perform tasks which computers are unable to do". So you would basically pay a small amount per moderation item and have an outsourced actual human to moderate the content for you.
The only other solution I can think of is to make the images user moderated, where users can flag inappropriate posts/images for moderation, and if nobody wants to manually moderate them they can simply be removed after a certain number of flags.
Here are a few other interesting links on the topic:
http://thomas.deselaers.de/publications/papers/deselaers_icpr08_porn.pdf
http://www.naun.org/multimedia/NAUN/computers/20-462.pdf
What is the best way to programmatically detect porn images?
The example below does not give you 100% accurate results but it should help you a least a bit and works out of the box.
<?php
$url = 'http://server.com/image.png';
$data = json_decode(file_get_contents('http://api.rest7.com/v1/detect_nudity.php?url=' . $url));
if (#$data->success !== 1)
{
die('Failed');
}
echo 'Contains nudity? ' . $data->nudity . '<br>';
echo 'Nudity percentage: ' . $data->nudity_percentage . '<br>';
If you are looking for an API-based solution, you may want to check out Sightengine.com
It's an automated solution to detect things like adult content, violence, celebrities etc in images and videos.
Here is an example in PHP, using the SDK:
<?php
$client = new SightengineClient('YourApplicationID', 'YourAPIKey');
$output = $client>check('nudity')>image('https://sightengine.com/assets/img/examples/example2.jpg');
The output will then return the classification:
{
"status": "success",
"request": {
"id": "req_VjyxevVQYXQZ1HMbnwtn",
"timestamp": 1471762434.0244,
"operations": 1
},
"nudity": {
"raw": 0.000757,
"partial": 0.000763,
"safe": 0.999243
},
"media": {
"id": "med_KWmB2GQZ29N4MVpVdq5K",
"uri": "https://sightengine.com/assets/img/examples/example2.jpg"
}
}
Have a look at the documentation for more details: https://sightengine.com/docs/#nudity-detection
(disclaimer: I work there)
There is a free API that detects adult content (porn, nudity, NSFW).
https://market.mashape.com/purelabs/sensitive-image-detection
We've using it on our production environment and I would say it works pretty good so far. There are some false detections though, it seems they prefer to mark the image as unsafe if they are unsure.
It all depends on the level of accuracy you are looking for, simple skin tone detection (like nude.js) will prob get you 60-80% accuracy on a generous sample set, for anything more accurate than that, let's say 90-95%, you are going to need some specialized computer vision system with an evolving model that is revised over time. For the latter you might want to check out http://clarifai.com or https://scanii.com (which I work on)
Microsoft Azure has a very cool API called Computer Vision, which you can use for free (either through the UI or programmatically) and has tons of documentation, including for PHP.
It has some amazingly accurate (and sometimes humorous) results.
Outside of detecting adult and "racy" material, it will read text, guess your age, identify primary colours, etc etc.
You can try it out at azure.microsoft.com.
Sample output from a "racy" image:
FEATURE NAME: VALUE:
Description { "tags": [ "person", "man", "young", "woman", "holding",
"surfing", "board", "hair", "laying", "boy", "standing",
"water", "cutting", "white", "beach", "people", "bed" ],
"captions": [ { "text": "a man and a woman taking a selfie",
"confidence": 0.133149087 } ] }
Tags [ { "name": "person", "confidence": 0.9997446 },
{ "name": "man", "confidence": 0.9587285 },
{ "name": "wall", "confidence": 0.9546831 },
{ "name": "swimsuit", "confidence": 0.499717563 } ]
Image format "Jpeg"
Image dimensions 1328 x 2000
Clip art type 0
Line drawing type 0
Black and white false
Adult content true
Adult score 0.9845981
Racy true
Racy score 0.964191854
Categories [ { "name": "people_baby", "score": 0.4921875 } ]
Faces [ { "age": 37, "gender": "Female",
"faceRectangle": { "top": 317, "left": 1554,
"width": 232, "height": 232 } } ]
Dominant color background "Brown"
Dominant color foreground "Black"
Accent Color #0D8CBE

Scraping HN Front Page - Handeling Simple HTML Dom Error

I'm using 'Simple HTML Dom' to scrape the HN Front Page (news.ycombinator.com), which works great most of the time.
However, every now and then they promote a job/company that lacks the elements that the scraper is looking for, i.e. score, username and number of comments.
This of course, breaks the array and thus the output of my script:
<?php
// 2012-02-12 Maximilian (Extract news.ycombinator.com's Front Page)
// Set the header during development
//header ("content-type: text/xml");
// Call the external PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/manual.htm)
include('lib/simple_html_dom.php');
date_default_timezone_set('Europe/Berlin');
// Download 'news.ycombinator.com' content
//$tmp = file_get_contents('http://news.ycombinator.com');
//file_put_contents('get.tmp', $tmp);
// Retrieve the content
$html = file_get_html('tc.tmp');
// Set the extraction pattern for each item
$title = $html->find("tr td table tr td.title a");
$score = $html->find("tr td.subtext span");
$user = $html->find("tr td.subtext a[href^=user]");
$link = $html->find("tr td table tr td.title a");
$time = $html->find("tr td.subtext");
$additionals = $html->find("tr td.subtext a[href^=item?id]");
// Construct the feed by looping through the items
for($i=0;$i<29;$i++) {
$cr=1;
// Check if the item points to an external website
if (!strstr($link[$i]->href,'http')) {
$url = 'http://news.ycombinator.com/'.$link[$i]->href;
$description = "Join the discussion on Hacker News.";
} else {
$url = $link[$i]->href;
// Getting content here
if (empty($abstract)) {
$description ="Failed to load any relevant content. Please try again later.";
} else {
$description = $abstract;
}
}
// Put all the items together
$result .= '<item><id>f'.$i.'</id><title>'.htmlspecialchars(trim($title[$i]->plaintext)).'</title><description><![CDATA['.$description.']]></description><pubDate>'.str_replace(' | '.$additionals[$i]->plaintext,'',str_replace($score[$i]->plaintext.' by '.$user[$i]->plaintext.' ','',$time[$i]->plaintext)).'</pubDate><score>'.$score[$i]->plaintext.'</score><user>'.$user[$i]->plaintext.'</user><comments>'.$additionals[$i]->plaintext.'</comments><id>'.substr($additionals[$i]->href,8).'</id><discussion>http://news.ycombinator.com/'.$additionals[$i]->href.'</discussion><link>'.htmlspecialchars($url).'</link></item>';
}
$output = '<rss><channel><id>news.ycombinator.com Frontpage</id><buildDate>'.date('Y-m-d H:i:s').'</buildDate>'.$result.'</channel></rss>';
file_put_contents('tc.xml', $output);
?>
Here's an example of the correct output
<item>
<id>f0</id>
<title>Show HN: Bootswatch, free swatches for your Bootstrap site</title>
<description><![CDATA[Easy to Install Simply download the CSS file from the swatch of your choice and replace the one in Bootstrap. No messing around with hex values. Whole New Feel We've all been there with the black bar and blue buttons. See how a splash of color and typography can transform the feel of your site. Modular Changes are contained in just two LESS files, enabling modification and ensuring forward compatibility.]]></description>
<pubDate>3 hours ago</pubDate>
<score>196 points</score>
<user>parkov</user>
<comments>30 comments</comments>
<id>3594540</id>
<discussion>http://news.ycombinator.com/item?id=3594540</discussion>
<link>http://bootswatch.com</link>
</item>
<item>
<id>f1</id>
<title>Louis CK inspires Jim Gaffigan to sell comedy special for $5 online</title>
<description><![CDATA[Dear Internet Friends,Inspired by the brilliant Louis CK, I have decided to debut my all-new hour stand-up special on my website, Jimgaffigan.com.Beginning sometime in April, “Jim Gaffigan: Mr. Universe” will be available exclusively for download for only $5. A dollar from each download will go directly to The Bob Woodruff Foundation; a charity dedicated to serving injured Veterans and their families.I am confident that the low price of my new comedy special and the fact that 20% of each $5 download will be donated to this very noble cause will prevent people from stealing it. Maybe I’m being naïve, but I trust you guys.]]></description>
<pubDate>57 minutes ago</pubDate>
<score>25 points</score>
<user>rkudeshi</user>
<comments>4 comments</comments>
<id>3595285</id>
<discussion>http://news.ycombinator.com/item?id=3595285</discussion>
<link>http://www.whosay.com/jimgaffigan/content/218011</link>
</item>
And here's an example of incorrect output. Note that the elements are not empty, thus I cannot seem to catch the error and simply jump to the next item. Everything past the promotion post will break:
<item>
<id>f14</id>
<title>Build the next Legos: We're hiring an iOS Developer & Web Developer (YC S11)</title>
<description><![CDATA[Interested in building the next generation of toys on digital devices such as the iPad? That’s what we’re doing here at Launchpad Toys with apps like Toontastic (Named one of the “Top 10 iPad Apps of 2011” by the New York Times and was recently added to the iTunes Hall of Fame) and an awesom]]><![CDATA[e suite of others we have under development. We’re looking for creative and playful coders that have made games or highly visual apps/sites in the past for our two open development positions. As a kid, you probably played with Legos endlessly and grew up to be a hacker because you still love building things. Sounds like you? Email us at howdy#launchpadtoys.com with a couple links to some projects and code that we can look at along with your resume.]]></description>
<pubDate>2 hours ago</pubDate>
<score>14 points</score>
<user>bproper</user>
<comments>7 comments</comments>
<id>3594944</id>
<discussion>http://news.ycombinator.com/item?id=3594944</discussion>
<link>http://launchpadtoys.com/blog/2012/02/iosdeveloper-webdeveloper/</link>
</item>
<item>
<id>f15</id>
<title>SOPA foe Fred Wilson supports a blacklist on pirate sites</title>
<description><![CDATA[VC Fred Wilson says Google, Bing, Facebook, and Twitter should warn people when they try to log in at known pirate sites: "We don't need legislation." Fred Wilson says: If they try to pass antipiracy legislation, it will once again be 'war.' (Credit: Greg Sandoval/CNET) Fred Wilson, a well-known ven]]><![CDATA[ture capitalist from New York, says he's in favor of creating a blacklist for Web sites found to traffic in pirated films, music, and other intellectual property. The co-founder of Union Square Ventures told a gathering of media executives at the Paley Center for Media yesterday that he believes a good antipiracy measure would be for Google, Twitter, Facebook, and other major sites to issue warnings to people when they try to connect with a known pirate site. Fred Wilson, a co-founder of Union Square Ventures, says 'Our children have been taught to steal.' (Credit: Union Square Ventures) Wilson favors establishing an independent group to create a "black and white list." "The blacklist are those sites we all know are bad news," he told the audience in New York.]]></description>
<pubDate>14 points by bproper 2 hours ago | 7 comments</pubDate>
<score>24 points</score>
<user>andrewcross</user>
<comments>12 comments</comments>
<id>3594558</id>
<discussion>http://news.ycombinator.com/item?id=3594558</discussion>
<link>http://news.cnet.com/8301-31001_3-57377862-261/post-sopa-influential-tech-investor-favors-blacklisting-pirate-sites/</link>
</item>
So here's my question: How can I handle a situation where a particular element is missing and find() doesn't throw an error? Do I have to start from scratch, or is there a better approach in scraping the HN front page?
For anyone curious, here's the whole XML file: http://thequeue.org/api/tc.xml
You have to work by chunks in order to handle that, there seems to be a dummy spacer element that can help you with that:
$news = preg_split('/<tr style="height:5px"><\/tr>/',$html->find('tbody',2)->innertext);
And then use subselectors:
foreach($news as $article){
$article = str_get_html($article)
// No upvote arrow found so its not a valid article
if(count($article->find('img')) === 0){
continue;
}
}
And for the other elements you use the same selectors
We'll thanks to Ivan's trail of thought, I am now splitting the initially scraped HTML into an array, each node representing a post. Then, going through every single post in a loop, I'll check if the up voting arrow image exists. If not, I'll not add it to the result. In the end everything will be stitched back together and the sponsored post is left out. Here's the code:
$array = explode('<tr style="height:5px"></tr>',$html);
foreach ($array as $post) {
if (!strstr($post,'grayarrow.gif')){}else{
$clean .= $post;
}
}
unset($array);
$html = str_get_html($clean.'</body></html>');

Generate zip code based on City and State with Google API

Would there be a way to do a zip code lookup based on City/State input in a form? I'm thinking the Google geocode API might be the right direction. Any thoughts? I have a site built on Wordpress so the code would have to utilize PHP. Thanks in advance.
YQL can do things like this:
select name from geo.places.children where parent_woeid in (select woeid from geo.places where text="sunnyvale, usa" limit 1) AND placetype = 11
returns:
{
"query": {
"count": 6,
"created": "2011-03-16T06:49:09Z",
"lang": "en-US",
"results": {
"place": [
{
"name": "94086"
},
{
"name": "94087"
},
{
"name": "94088"
},
{
"name": "94089"
},
{
"name": "94090"
},
{
"name": "94085"
}
]
}
}
}
YQL Console
There are examples on there on how to implement queries like this in both PHP and Javascript on their site.
Geocoding is where you find the coordinates of an address. Yes you could geocode a city,state but this would give you he center of the city (as defined by the geocoder's internal database - typically a centroid or 'city hall'.
Most cities have multiple zip codes: Do you want all of these?
Similarly a zip code could contain multiple cities - especially in rural areas where zip codes can be large and cities are what other countries would call 'villages' and 'hamlets'
So you best bet is probably to get a database. There might be some free ones around (Geonames comes to mind but I don't think it has zip codes), but you might end up having to buy one.
First a note on the Google API: be aware of Google's TOS so you don't take a wasted path as others have done (sometimes unknowingly). Specifically: "Note: the Geocoding API may only be used in conjunction with a Google map; geocoding results without displaying them on a map is prohibited.".
Your best bet is to get a free zip code database if your project is not mission-critical; otherwise, you'll probably need a good commercial-grade database. Just google "commercial grade zip code database".
Also, see a good stack-overflow thread about this topic.

Categories