PHP script to extract artist & title from Shoutcast/Icecast stream - php

I found a script which can extract the artist & title name from an Icecast or Shoutcast stream.
I want the script to update automatically when a song changed, at the moment its working only when i execute it. I'm new to PHP so any help will be appreciated.
Thanks!
define('CRLF', "\r\n");
class streaminfo{
public $valid = false;
public $useragent = 'Winamp 2.81';
protected $headers = array();
protected $metadata = array();
public function __construct($location){
$errno = $errstr = '';
$t = parse_url($location);
$sock = fsockopen($t['host'], $t['port'], $errno, $errstr, 5);
$path = isset($t['path'])?$t['path']:'/';
if ($sock){
$request = 'GET '.$path.' HTTP/1.0' . CRLF .
'Host: ' . $t['host'] . CRLF .
'Connection: Close' . CRLF .
'User-Agent: ' . $this->useragent . CRLF .
'Accept: */*' . CRLF .
'icy-metadata: 1'.CRLF.
'icy-prebuffer: 65536'.CRLF.
(isset($t['user'])?'Authorization: Basic '.base64_encode($t['user'].':'.$t['pass']).CRLF:'').
'X-TipOfTheDay: Winamp "Classic" rulez all of them.' . CRLF . CRLF;
if (fwrite($sock, $request)){
$theaders = $line = '';
while (!feof($sock)){
$line = fgets($sock, 4096);
if('' == trim($line)){
break;
}
$theaders .= $line;
}
$theaders = explode(CRLF, $theaders);
foreach ($theaders as $header){
$t = explode(':', $header);
if (isset($t[0]) && trim($t[0]) != ''){
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode(':', $t));
if ($value != ''){
if (is_numeric($value)){
$this->headers[$name] = (int)$value;
}else{
$this->headers[$name] = $value;
}
}
}
}
if (!isset($this->headers['icymetaint'])){
$data = ''; $metainterval = 512;
while(!feof($sock)){
$data .= fgetc($sock);
if (strlen($data) >= $metainterval) break;
}
$this->print_data($data);
$matches = array();
preg_match_all('/([\x00-\xff]{2})\x0\x0([a-z]+)=/i', $data, $matches, PREG_OFFSET_CAPTURE);
preg_match_all('/([a-z]+)=([a-z0-9\(\)\[\]., ]+)/i', $data, $matches, PREG_SPLIT_NO_EMPTY);
echo '<pre>';var_dump($matches);echo '</pre>';
$title = $artist = '';
foreach ($matches[0] as $nr => $values){
$offset = $values[1];
$length = ord($values[0]{0}) +
(ord($values[0]{1}) * 256)+
(ord($values[0]{2}) * 256*256)+
(ord($values[0]{3}) * 256*256*256);
$info = substr($data, $offset + 4, $length);
$seperator = strpos($info, '=');
$this->metadata[substr($info, 0, $seperator)] = substr($info, $seperator + 1);
if (substr($info, 0, $seperator) == 'title') $title = substr($info, $seperator + 1);
if (substr($info, 0, $seperator) == 'artist') $artist = substr($info, $seperator + 1);
}
$this->metadata['streamtitle'] = $artist . ' - ' . $title;
}else{
$metainterval = $this->headers['icymetaint'];
$intervals = 0;
$metadata = '';
while(1){
$data = '';
while(!feof($sock)){
$data .= fgetc($sock);
if (strlen($data) >= $metainterval) break;
}
//$this->print_data($data);
$len = join(unpack('c', fgetc($sock))) * 16;
if ($len > 0){
$metadata = str_replace("\0", '', fread($sock, $len));
break;
}else{
$intervals++;
if ($intervals > 100) break;
}
}
$metarr = explode(';', $metadata);
foreach ($metarr as $meta){
$t = explode('=', $meta);
if (isset($t[0]) && trim($t[0]) != ''){
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode('=', $t));
if (substr($value, 0, 1) == '"' || substr($value, 0, 1) == "'"){
$value = substr($value, 1);
}
if (substr($value, -1) == '"' || substr($value, -1) == "'"){
$value = substr($value, 0, -1);
}
if ($value != ''){
$this->metadata[$name] = $value;
}
}
}
}
fclose($sock);
$this->valid = true;
}else echo 'unable to write.';
}else echo 'no socket '.$errno.' - '.$errstr.'.';
}
public function print_data($data){
$data = str_split($data);
$c = 0;
$string = '';
echo "<pre>\n000000 ";
foreach ($data as $char){
$string .= addcslashes($char, "\n\r\0\t");
$hex = dechex(join(unpack('C', $char)));
if ($c % 4 == 0) echo ' ';
if ($c % (4*4) == 0 && $c != 0){
foreach (str_split($string) as $s){
//echo " $string\n";
if (ord($s) < 32 || ord($s) > 126){
echo '\\'.ord($s);
}else{
echo $s;
}
}
echo "\n";
$string = '';
echo str_pad($c, 6, '0', STR_PAD_LEFT).' ';
}
if (strlen($hex) < 1) $hex = '00';
if (strlen($hex) < 2) $hex = '0'.$hex;
echo $hex.' ';
$c++;
}
echo " $string\n</pre>";
}
public function __get($name){
if (isset($this->metadata[$name])){
return $this->metadata[$name];
}
if (isset($this->headers[$name])){
return $this->headers[$name];
}
return null;
}
}
$t = new streaminfo('http://64.236.34.196:80/stream/1014'); // get metadata
echo Meta Interval: $t->icymetaint;
echo Current Track: $t->streamtitle;

You will need to constantly query the stream at a set interval to find when the song changes.
This can be best done by scheduling a cron job.
If on Windows, you should use the Windows Task Scheduler

If you want to run the PHP script to keep your meta data up to date (I'm assuming you're making a website and using html audio tags here) you can use the ontimeupdate event with an ajax function. If you're not you probably should look up your audio playback documentation for something similar.
<audio src="http://ip:port/;" ontimeupdate="loadXMLDoc()">
You can find a great example here http://www.w3schools.com/php/php_ajax_php.asp
You want to use the PHP echo function all the relevant information at once using one php variable at the very end of your script.
<?php ....
$phpVar=$streamtitle;
$phpVar2=$streamsong;
$result="I want my string to look like this: <br> {$phpVar} {$phpVar2}";
echo $result;
?>
and then use the function called by the .onreadystatechange to modify the particular elements you want on your website by using the .resonseText (this will contain the same content as your PHP script's echo).

After SCOURING the web for 4 hours, this is the only Shoutcast metadata script I've found that works! Thankyou.
To run this constantly, why not use a setInterval combined with jQuery's AJAX call?
<script>
$(function() {
setInterval(getTrackName,16000);
});
function getTrackName() {
$.ajax({
url: "track_name.php"
})
.done(function( data ) {
$( "#results" ).text( data );
});
}
</script>
Also your last couple 'echo' lines were breaking the script for me. Just put quotes around the Meta Interval, etc....

Related

Search and replace a string within a given pattern in PHP

I am trying to generate html from a given string pattern, similar to a plugin.
There are three patterns, a no arg, a single arg and a multi arg string pattern. I can't change this pattern since it's from a CMS.
{pluginName} or {pluginName=3} or {pluginName id=3|view=simple|arg999=asv}
An example:
<p>Hi this is a html page</p>
<p>The following line should generate html</p>
{pluginName=3}
<p>The following line also should generate html</p>
{pluginName id=3|view=simple|arg999=asv}
My goal is to replace those "tags" with something (it's not relavant for this question the processing per say). However I want to be able to pass the args given to a class/function that should handle that logic.
This is my first attempt, without using regexes since I don't know how I could approach this problem with them (and mainly because they are slower).
<?php
function processPlugins($text, $pos = 0, $start = '{', $end = '}') {
$plugins = array('plugin1', 'plugin2');
while(($pos = strpos($text, $start, $pos)) !== false) {
$startPos = $pos;
$pos += strlen($start);
foreach($plugins as $plugin) {
if(substr($text, $pos, strlen($plugin)) === $plugin
&& ($endPos = strpos($text, $end, $pos + strlen($plugin))) !== false) {
$char = substr($text, $pos + strlen($plugin), 1); // 1 is strlen of (= or ' ')
$pos += strlen($plugin) + 1; // 1 is strlen of (= or ' ')
$argString = substr($text, $pos, $endPos - $pos);
if($char === ' ') { //Multi arg
$params = explode('|', trim($argString));
$paramDict = array();
foreach ($params as $param) {
list($k, $v) = array_pad(explode('=', $param), 2, null);
$paramDict[$k] = $v;
}
//$output = $plugin->processDictionary($paramDict);
var_dump($paramDict);
} elseif ($char === '=') { //One arg
//$output = $plugin->processArg($argString);
echo $argString . "\n";
} elseif ($char === $end) { //No arg
//$output = $plugin->processNoArg();
echo $plugin. "\n";
}
$pos = $endPos + strlen($end);
break;
}
}
}
}
processPlugins('{plugin1}');
processPlugins('{plugin2=3}');
processPlugins('{plugin2 arg1=b|arg2=d}');
The previous code works in a PHP sandbox.
This code seems to work (for now) but it seems sketchy. Would you approach this problem differently? Could I refactor this code somehow?
If you opt for string manipulation functions over regex, why not use explode for stripping the input down to the significant part?
Here is an alternative implementation:
function processPlugins($text, $pos = 0, $start = '{', $end = '}') {
$t = substr($text, $pos);
if($pos > 0) {
echo "$pos chracters removed from the begining: $t" . PHP_EOL;
} else {
echo "Starting with '$t'" . PHP_EOL;
}
$parts = explode($start, $t);
$t = $parts[1];
$parts = explode($end, $t);
$t = $parts[0];
echo "The part between curly braces: '$t'" . PHP_EOL;
$t = str_replace(['plugin1', 'plugin2'], '', $t);
echo "After plugin name has been removed: '$t'" . PHP_EOL;
$n = strlen($t);
if(!$n) {
echo "Processing complete: " . trim($parts[0]) . PHP_EOL . PHP_EOL;
return;
}
$params = explode('|', $t);
echo 'Key-Values: ' . json_encode($params) . PHP_EOL;
$kv = [];
foreach($params as $p) {
list($k, $v) = explode('=', trim($p));
echo " Item: '$p', Key: '$k', Value: '$v'" . PHP_EOL;
if($k === '') {
echo "Processing complete: $v" . PHP_EOL . PHP_EOL;
return;
}
$kv[$k] = $v;
}
echo "Processing complete: " . json_encode($kv) . PHP_EOL . PHP_EOL;
}
echo '<pre>';
processPlugins('{plugin1}');
processPlugins('{plugin2=3}');
processPlugins('{plugin2 arg1=b|arg2=d}');
Of course the echo lines could be thrown away. With them in place we get this output:
Starting with '{plugin1}'
The part between curly braces: 'plugin1'
After plugin name has been removed: ''
Processing complete: plugin1
Starting with '{plugin2=3}'
The part between curly braces: 'plugin2=3'
After plugin name has been removed: '=3'
Key-Values: ["=3"]
Item: '=3', Key: '', Value: '3'
Processing complete: 3
Starting with '{plugin2 arg1=b|arg2=d}'
The part between curly braces: 'plugin2 arg1=b|arg2=d'
After plugin name has been removed: 'arg1=b|arg2=d'
Key-Values: [" arg1=b","arg2=d"]
Item: ' arg1=b', Key: 'arg1', Value: 'b'
Item: 'arg2=d', Key: 'arg2', Value: 'd'
Processing complete: {"arg1":"b","arg2":"d"}
This version works with inputs having more than one plugin token.
function processPlugins($text, $pos = 0, $start = '{', $end = '}') {
$processed = [];
$t = substr($text, $pos);
$parts = explode($start, $t);
array_shift($parts);
foreach($parts as $part) {
$pparts = explode($end, $part);
$t = trim($pparts[0]);
$t = str_replace(['plugin1', 'plugin2'], '', $t);
$n = strlen($t);
if(!$n) {
$processed[] = trim($pparts[0]);
continue;
}
$params = explode('|', $t);
$kv = [];
foreach($params as $p) {
list($k, $v) = explode('=', trim($p));
if(trim($k) === '') {
$processed[] = trim($v);
continue 2;
}
$kv[trim($k)] = trim($v);
}
$processed[] = $kv;
}
return $processed;
}
function test($case) {
$p = processPlugins($case);
echo "$case => " . json_encode($p) . PHP_EOL;
}
$cases = [
'{plugin1}',
'{plugin2=3}',
'{plugin2 arg1=b|arg2=d}',
'text here {plugin1} and more{plugin2=55}here {plugin2 arg1=b|arg2=d} till the end'
];
foreach($cases as $case) {
test($case);
}
The output:
{plugin1} => ["plugin1"]
{plugin2=3} => ["3"]
{plugin2 arg1=b|arg2=d} => [{"arg1":"b","arg2":"d"}]
text here {plugin1} and more{plugin2=55}here {plugin2 arg1=b|arg2=d} till the end => ["plugin1","55",{"arg1":"b","arg2":"d"}]

Laravel Auto-Link library

I'm looking for a travel auto-link detection.
I'm trying to make a social media website and when my users post URLs I need it so like shows instead of just normal text.
Try Autologin for Laravel by dwightwatson, which provides you to generate URLs that will provide automatic login to your application and then redirect to the appropriate location
As far as I know, there's no equivalent in the Laravel's core for the auto_link() funtion helper from Code Igniter (assuming you are refering to the CI version).
Anyway, it's very simple to grab that code and use it in Laravel for a quick an dirty workaround. I just did casually looking for the same issue.
Put in your App directory a container class for your helpers (or any containter for the matter, it's just need to be discovered by the framework), in this case I put a UrlHelpers.php file. Then, inside of it put this two static functions grabbed for the CI version:
class UrlHelpers
{
static function auto_link($str, $type = 'both', $popup = FALSE)
{
// Find and replace any URLs.
if ($type !== 'email' && preg_match_all('#(\w*://|www\.)[^\s()<>;]+\w#i', $str, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER)) {
// Set our target HTML if using popup links.
$target = ($popup) ? ' target="_blank"' : '';
// We process the links in reverse order (last -> first) so that
// the returned string offsets from preg_match_all() are not
// moved as we add more HTML.
foreach (array_reverse($matches) as $match) {
// $match[0] is the matched string/link
// $match[1] is either a protocol prefix or 'www.'
//
// With PREG_OFFSET_CAPTURE, both of the above is an array,
// where the actual value is held in [0] and its offset at the [1] index.
$a = '<a href="' . (strpos($match[1][0], '/') ? '' : 'http://') . $match[0][0] . '"' . $target . '>' . $match[0][0] . '</a>';
$str = substr_replace($str, $a, $match[0][1], strlen($match[0][0]));
}
}
// Find and replace any emails.
if ($type !== 'url' && preg_match_all('#([\w\.\-\+]+#[a-z0-9\-]+\.[a-z0-9\-\.]+[^[:punct:]\s])#i', $str, $matches, PREG_OFFSET_CAPTURE)) {
foreach (array_reverse($matches[0]) as $match) {
if (filter_var($match[0], FILTER_VALIDATE_EMAIL) !== FALSE) {
$str = substr_replace($str, static::safe_mailto($match[0]), $match[1], strlen($match[0]));
}
}
}
return $str;
}
static function safe_mailto($email, $title = '', $attributes = '')
{
$title = (string)$title;
if ($title === '') {
$title = $email;
}
$x = str_split('<a href="mailto:', 1);
for ($i = 0, $l = strlen($email); $i < $l; $i++) {
$x[] = '|' . ord($email[$i]);
}
$x[] = '"';
if ($attributes !== '') {
if (is_array($attributes)) {
foreach ($attributes as $key => $val) {
$x[] = ' ' . $key . '="';
for ($i = 0, $l = strlen($val); $i < $l; $i++) {
$x[] = '|' . ord($val[$i]);
}
$x[] = '"';
}
} else {
for ($i = 0, $l = strlen($attributes); $i < $l; $i++) {
$x[] = $attributes[$i];
}
}
}
$x[] = '>';
$temp = array();
for ($i = 0, $l = strlen($title); $i < $l; $i++) {
$ordinal = ord($title[$i]);
if ($ordinal < 128) {
$x[] = '|' . $ordinal;
} else {
if (count($temp) === 0) {
$count = ($ordinal < 224) ? 2 : 3;
}
$temp[] = $ordinal;
if (count($temp) === $count) {
$number = ($count === 3)
? (($temp[0] % 16) * 4096) + (($temp[1] % 64) * 64) + ($temp[2] % 64)
: (($temp[0] % 32) * 64) + ($temp[1] % 64);
$x[] = '|' . $number;
$count = 1;
$temp = array();
}
}
}
$x[] = '<';
$x[] = '/';
$x[] = 'a';
$x[] = '>';
$x = array_reverse($x);
$output = "<script type=\"text/javascript\">\n"
. "\t//<![CDATA[\n"
. "\tvar l=new Array();\n";
for ($i = 0, $c = count($x); $i < $c; $i++) {
$output .= "\tl[" . $i . "] = '" . $x[$i] . "';\n";
}
$output .= "\n\tfor (var i = l.length-1; i >= 0; i=i-1) {\n"
. "\t\tif (l[i].substring(0, 1) === '|') document.write(\"&#\"+unescape(l[i].substring(1))+\";\");\n"
. "\t\telse document.write(unescape(l[i]));\n"
. "\t}\n"
. "\t//]]>\n"
. '</script>';
return $output;
}
}
The function safe_mailto is used in case there are email links in your string. If you don't need it you are free to modify the code.
Then you could use the helper class like this in any part of your Laravel code as usually (here inside a blade template, but the principle is the same):
<p>{!! \App\Helpers\Helpers::auto_link($string) !!}</p>
Quick and dirty, and It works. Hope to have helped. ¡Good luck!

Can't extract metdata from some icecast streams

I'm trying to extract icecast metadata from streams.
I have code that works for some streams and not for others.
The issue is that some streams don't return the icymetaint value and that's where the code gets lost.
I can't get the icymetaint header from this stream:
http://radio.hbr1.com:19800/tronic.ogg
But when I put it in VLC media player it shows the meta just fine.
So what exactly am I missing here? What other ways are there for an icecast stream to transmit metdata? The stream version is Icecast 2.3.3
This is code inside a class to retrieve the metadata and headers:
public function GetDataFromStream($parsedUrl)
{
$returnData = array();
$addr = $parsedUrl['host'];
$addr = gethostbyname($addr);
$sock = fsockopen($addr, $parsedUrl['port'], $errno, $errstr, 5);
$path = isset($parsedUrl['path'])?$parsedUrl['path']:'/';
if ($sock)
{
$request = 'GET '. $path .' HTTP/1.0' . CRLF .
'Host: ' . $parsedUrl['host'] . CRLF .
'Connection: Close' . CRLF .
'User-Agent: ' . $this->useragent . CRLF .
'Accept: */*' . CRLF .
'icy-metadata: 1'.CRLF.
'icy-prebuffer: 65536'.CRLF.
(isset($parsedUrl['user']) ? 'Authorization: Basic ' .
base64_encode($parsedUrl['user'] . ':' . $parsedUrl['pass']) . CRLF : '').
'X-TipOfTheDay: Winamp "Classic" rulez all of them.' . CRLF . CRLF;
if (fwrite($sock, $request))
{
$theaders = $line = '';
while (!feof($sock))
{
$line = fgets($sock, 4096);
if('' == trim($line))
break;
$theaders .= $line;
}
$theaders = explode(CRLF, $theaders);
foreach ($theaders as $header)
{
$t = explode(':', $header);
if (isset($t[0]) && trim($t[0]) != '')
{
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode(':', $t));
if ($value != '')
{
if (is_numeric($value))
$this->headers[$name] = (int)$value;
else
$this->headers[$name] = $value;
}
}
}
if (isset($this->headers['icymetaint']))
{
$metainterval = $this->headers['icymetaint'];
$intervals = 0;
$metadata = '';
while(1)
{
$data = '';
while(!feof($sock))
{
$data .= fgetc($sock);
if (strlen($data) >= $metainterval)
break;
}
$len = join(unpack('c', fgetc($sock))) * 16;
if ($len > 0)
{
$metadata = str_replace("\0", '', fread($sock, $len));
break;
}
else
{
$intervals++;
if ($intervals > 100) break;
}
}
$metarr = explode(';', $metadata);
foreach ($metarr as $meta)
{
$t = explode('=', $meta);
if (isset($t[0]) && trim($t[0]) != '')
{
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode('=', $t));
if (substr($value, 0, 1) == '"' || substr($value, 0, 1) == "'")
$value = substr($value, 1);
if (substr($value, -1) == '"' || substr($value, -1) == "'")
$value = substr($value, 0, -1);
if ($value != '')
{
$tmp = &$this->metadata;
$tmp[$name] = $value;
}
}
}
$this->valid = true;
}
else
{
$this->valid = false;
}
fclose($sock);
}
else
echo 'unable to write.';
}
else
//echo 'no socket '.$errno.' - '.$errstr.'.';
;
}
You can use .xspf mountpoint extension, get XML and parse it:
<?php
$stream_url = "http://radio.hbr1.com:19800/tronic.ogg";
$xspf_url = $stream_url . ".xspf";
$xml = file_get_contents($xspf_url);
if($xml){
$data = simplexml_load_string($xml);
// Track artist
print $data->trackList->track->creator;
// Track title
print $data->trackList->track->title;
}
?>
Here is how .xspf data looks like (I use lynx to read the URL content):
$ lynx -mime_header http://radio.hbr1.com:19800/tronic.ogg.xspf
HTTP/1.0 200 OK
Content-Type: application/xspf+xml
Content-Length: 615
<?xml version="1.0" encoding="UTF-8"?>
<playlist xmlns="http://xspf.org/ns/0/" version="1">
<title/>
<creator/>
<trackList>
<track>
<location>http://radio.hbr1.com:19800/tronic.ogg</location>
<creator>Res Q</creator>
<title>Fakesleep (2012)</title>
<annotation>Stream Title: HBR1 - Tronic Lounge
Stream Description: Music on Futurenet
Content Type:application/ogg
Bitrate: Quality 0,00
Current Listeners: 28
Peak Listeners: 45
Stream Genre: Tech House, Progressive House, Electro, Minimal</annotation>
<info>http://www.hbr1.com</info>
</track>
</trackList>
</playlist>
As you can see /playlist/trackList/track/title XML node is your song title, /playlist/trackList/track/creator is usually an artist.
That's because you are only trying to parse the braindead ancient metadata slipstreaming introduced by Nullsoft in Shoutcast.
Proper streams use a container (e.g. Ogg or WebM) instead of throwing raw data out.
Newer Icecast servers offer a JSON API (Version 2.4.1 and above). This is more useful than pulling in a whole stream just for the metadata.
If you are decoding the stream anyway, then you should look into proper libraries for parsing streams, libogg, libopus, libvorbis come to mind.

Change php script with variables from working in http to working in shell

I use a script from here to generate my sitemaps.
I can call it with the browser with http://www.example.com/sitemap.php?update=pages and its working fine.
I need to call it as shell script so that I can automate it with the windows task scheduler. But the script needs to be changed to get the variables ?update=pages. But I don't manage to change it correctly.
Could anybody help me so that I can execute the script from command line with
...\php C:\path\to\script\sitemap.php update=pages. It would also be fine for me to hardcode the variables into the script since I wont change them anyway.
define("BASE_URL", "http://www.example.com/");
define ('BASE_URI', $_SERVER['DOCUMENT_ROOT'] . '/');
class Sitemap {
private $compress;
private $page = 'index';
private $index = 1;
private $count = 1;
private $urls = array();
public function __construct ($compress=true) {
ini_set('memory_limit', '75M'); // 50M required per tests
$this->compress = ($compress) ? '.gz' : '';
}
public function page ($name) {
$this->save();
$this->page = $name;
$this->index = 1;
}
public function url ($url, $lastmod='', $changefreq='', $priority='') {
$url = htmlspecialchars(BASE_URL . 'xx' . $url);
$lastmod = (!empty($lastmod)) ? date('Y-m-d', strtotime($lastmod)) : false;
$changefreq = (!empty($changefreq) && in_array(strtolower($changefreq), array('always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'))) ? strtolower($changefreq) : false;
$priority = (!empty($priority) && is_numeric($priority) && abs($priority) <= 1) ? round(abs($priority), 1) : false;
if (!$lastmod && !$changefreq && !$priority) {
$this->urls[] = $url;
} else {
$url = array('loc'=>$url);
if ($lastmod !== false) $url['lastmod'] = $lastmod;
if ($changefreq !== false) $url['changefreq'] = $changefreq;
if ($priority !== false) $url['priority'] = ($priority < 1) ? $priority : '1.0';
$this->urls[] = $url;
}
if ($this->count == 50000) {
$this->save();
} else {
$this->count++;
}
}
public function close() {
$this->save();
}
private function save () {
if (empty($this->urls)) return;
$file = "sitemaps/xx-sitemap-{$this->page}-{$this->index}.xml{$this->compress}";
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($this->urls as $url) {
$xml .= ' <url>' . "\n";
if (is_array($url)) {
foreach ($url as $key => $value) $xml .= " <{$key}>{$value}</{$key}>\n";
} else {
$xml .= " <loc>{$url}</loc>\n";
}
$xml .= ' </url>' . "\n";
}
$xml .= '</urlset>' . "\n";
$this->urls = array();
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $file, 'wb');
fwrite($fp, $xml);
fclose($fp);
$this->index++;
$this->count = 1;
$num = $this->index; // should have already been incremented
while (file_exists(BASE_URI . "xxb-sitemap-{$this->page}-{$num}.xml{$this->compress}")) {
unlink(BASE_URI . "xxc-sitemap-{$this->page}-{$num}.xml{$this->compress}");
$num++;
}
$this->index($file);
}
private function index ($file) {
$sitemaps = array();
$index = "sitemaps/xx-sitemap-index.xml{$this->compress}";
if (file_exists(BASE_URI . $index)) {
$xml = (!empty($this->compress)) ? gzfile(BASE_URI . $index) : file(BASE_URI . $index);
$tags = $this->xml_tag(implode('', $xml), array('sitemap'));
foreach ($tags as $xml) {
$loc = str_replace(BASE_URL, '', $this->xml_tag($xml, 'loc'));
$lastmod = $this->xml_tag($xml, 'lastmod');
$lastmod = ($lastmod) ? date('Y-m-d', strtotime($lastmod)) : date('Y-m-d');
if (file_exists(BASE_URI . $loc)) $sitemaps[$loc] = $lastmod;
}
}
$sitemaps[$file] = date('Y-m-d');
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($sitemaps as $loc => $lastmod) {
$xml .= ' <sitemap>' . "\n";
$xml .= ' <loc>' . BASE_URL . $loc . '</loc>' . "\n";
$xml .= ' <lastmod>' . $lastmod . '</lastmod>' . "\n";
$xml .= ' </sitemap>' . "\n";
}
$xml .= '</sitemapindex>' . "\n";
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $index, 'wb');
fwrite($fp, $xml);
fclose($fp);
}
private function xml_tag ($xml, $tag, &$end='') {
if (is_array($tag)) {
$tags = array();
while ($value = $this->xml_tag($xml, $tag[0], $end)) {
$tags[] = $value;
$xml = substr($xml, $end);
}
return $tags;
}
$pos = strpos($xml, "<{$tag}>");
if ($pos === false) return false;
$start = strpos($xml, '>', $pos) + 1;
$length = strpos($xml, "</{$tag}>", $start) - $start;
$end = strpos($xml, '>', $start + $length) + 1;
return ($end !== false) ? substr($xml, $start, $length) : false;
}
public function __destruct () {
$this->save();
}
}
// start part 2
$sitemap = new Sitemap;
if (get('pages')) {
$sitemap->page('pages');
$result = mysql_query("SELECT uri FROM app_uri");
while (list($url, $created) = mysql_fetch_row($result)) {
$sitemap->url($url, $created, 'monthly');
}
}
$sitemap->close();
unset ($sitemap);
function get ($name) {
return (isset($_GET['update']) && strpos($_GET['update'], $name) !== false) ? true : false;
}
?>
I could install wget (it's available for windows as well) and then call the url via localhost in the task scheduler script:
wget.exe "http://localhost/path/to/script.php?pages=test"
This way you wouldn't have to rewrite the php script.
Otherwise, if the script is meant for shell usage only, then pass variables via command line:
php yourscript.php variable1 variable2 ...
In the php script you can than access those variables using the $argv variable:
$variable1 = $argv[1];
$variable2 = $argv[2];
have a look on:
How to pass GET variables to php file with Shell?
which already answered the same question :).

php sitemap for large websites

I want to create a sitemap for a page with more than 30.000.000 pages. The page is daily updating, removing and adding new pages.
I found this php script which I would like to run with a cron job.
Sitemap php script
I have all URIs in the table "myuri" in the column "uri" entries are written e.g. "/this-is-a-page.html". What parameters do I need to add to the script to get it running on my table?
<?php
/*
* author: Kyle Gadd
* documentation: http://www.php-ease.com/classes/sitemap.html
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
class Sitemap {
private $compress;
private $page = 'index';
private $index = 1;
private $count = 1;
private $urls = array();
public function __construct ($compress=true) {
ini_set('memory_limit', '75M'); // 50M required per tests
$this->compress = ($compress) ? '.gz' : '';
}
public function page ($name) {
$this->save();
$this->page = $name;
$this->index = 1;
}
public function url ($url, $lastmod='', $changefreq='', $priority='') {
$url = htmlspecialchars(BASE_URL . $url);
$lastmod = (!empty($lastmod)) ? date('Y-m-d', strtotime($lastmod)) : false;
$changefreq = (!empty($changefreq) && in_array(strtolower($changefreq), array('always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'))) ? strtolower($changefreq) : false;
$priority = (!empty($priority) && is_numeric($priority) && abs($priority) <= 1) ? round(abs($priority), 1) : false;
if (!$lastmod && !$changefreq && !$priority) {
$this->urls[] = $url;
} else {
$url = array('loc'=>$url);
if ($lastmod !== false) $url['lastmod'] = $lastmod;
if ($changefreq !== false) $url['changefreq'] = $changefreq;
if ($priority !== false) $url['priority'] = ($priority < 1) ? $priority : '1.0';
$this->urls[] = $url;
}
if ($this->count == 50000) {
$this->save();
} else {
$this->count++;
}
}
public function close() {
$this->save();
$this->ping_search_engines();
}
private function save () {
if (empty($this->urls)) return;
$file = "sitemap-{$this->page}-{$this->index}.xml{$this->compress}";
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($this->urls as $url) {
$xml .= ' <url>' . "\n";
if (is_array($url)) {
foreach ($url as $key => $value) $xml .= " <{$key}>{$value}</{$key}>\n";
} else {
$xml .= " <loc>{$url}</loc>\n";
}
$xml .= ' </url>' . "\n";
}
$xml .= '</urlset>' . "\n";
$this->urls = array();
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $file, 'wb');
fwrite($fp, $xml);
fclose($fp);
$this->index++;
$this->count = 1;
$num = $this->index; // should have already been incremented
while (file_exists(BASE_URI . "sitemap-{$this->page}-{$num}.xml{$this->compress}")) {
unlink(BASE_URI . "sitemap-{$this->page}-{$num}.xml{$this->compress}");
$num++;
}
$this->index($file);
}
private function index ($file) {
$sitemaps = array();
$index = "sitemap-index.xml{$this->compress}";
if (file_exists(BASE_URI . $index)) {
$xml = (!empty($this->compress)) ? gzfile(BASE_URI . $index) : file(BASE_URI . $index);
$tags = $this->xml_tag(implode('', $xml), array('sitemap'));
foreach ($tags as $xml) {
$loc = str_replace(BASE_URL, '', $this->xml_tag($xml, 'loc'));
$lastmod = $this->xml_tag($xml, 'lastmod');
$lastmod = ($lastmod) ? date('Y-m-d', strtotime($lastmod)) : date('Y-m-d');
if (file_exists(BASE_URI . $loc)) $sitemaps[$loc] = $lastmod;
}
}
$sitemaps[$file] = date('Y-m-d');
$xml = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$xml .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
foreach ($sitemaps as $loc => $lastmod) {
$xml .= ' <sitemap>' . "\n";
$xml .= ' <loc>' . BASE_URL . $loc . '</loc>' . "\n";
$xml .= ' <lastmod>' . $lastmod . '</lastmod>' . "\n";
$xml .= ' </sitemap>' . "\n";
}
$xml .= '</sitemapindex>' . "\n";
if (!empty($this->compress)) $xml = gzencode($xml, 9);
$fp = fopen(BASE_URI . $index, 'wb');
fwrite($fp, $xml);
fclose($fp);
}
private function xml_tag ($xml, $tag, &$end='') {
if (is_array($tag)) {
$tags = array();
while ($value = $this->xml_tag($xml, $tag[0], $end)) {
$tags[] = $value;
$xml = substr($xml, $end);
}
return $tags;
}
$pos = strpos($xml, "<{$tag}>");
if ($pos === false) return false;
$start = strpos($xml, '>', $pos) + 1;
$length = strpos($xml, "</{$tag}>", $start) - $start;
$end = strpos($xml, '>', $start + $length) + 1;
return ($end !== false) ? substr($xml, $start, $length) : false;
}
public function ping_search_engines () {
$sitemap = BASE_URL . 'sitemap-index.xml' . $this->compress;
$engines = array();
$engines['www.google.com'] = '/webmasters/tools/ping?sitemap=' . urlencode($sitemap);
$engines['www.bing.com'] = '/webmaster/ping.aspx?siteMap=' . urlencode($sitemap);
$engines['submissions.ask.com'] = '/ping?sitemap=' . urlencode($sitemap);
foreach ($engines as $host => $path) {
if ($fp = fsockopen($host, 80)) {
$send = "HEAD $path HTTP/1.1\r\n";
$send .= "HOST: $host\r\n";
$send .= "CONNECTION: Close\r\n\r\n";
fwrite($fp, $send);
$http_response = fgets($fp, 128);
fclose($fp);
list($response, $code) = explode (' ', $http_response);
if ($code != 200) trigger_error ("{$host} ping was unsuccessful.<br />Code: {$code}<br />Response: {$response}");
}
}
}
public function __destruct () {
$this->save();
}
}
?>
There is already an example of usage on the page:
<?php
require_once ('php/classes/Sitemap.php');
$sitemap = new Sitemap;
if (get('pages')) {
$sitemap->page('pages');
$result = db_query ("SELECT url, created FROM pages"); // 20 pages
while (list($url, $created) = $result->fetch_row()) {
$sitemap->url($url, $created, 'yearly');
}
}
if (get('posts')) {
$sitemap->page('posts');
$result = db_query ("SELECT url, updated FROM posts"); // 70,000 posts
while (list($url, $updated) = $result->fetch_row()) {
$sitemap->url($url, $updated, 'monthly');
}
}
$sitemap->close();
unset ($sitemap);
function get ($name) {
return (isset($_GET['update']) && strpos($_GET['update'], $name) !== false) ? true : false;
}
?>
I would change this part....
if (get('pages')) {
$sitemap->page('pages');
$result = db_query ("SELECT uri FROM myuri");
while (list($url) = mysql_fetch_row($result)) {
$sitemap->url($url,'', 'yearly');
}
}
Not sure if that $updated is needed? Looks like the function just defaults it to an empty string anyways...... But maybe you could at a timestamp column to your table to pull the last updated date as well, and feed it into the function where I put ''.
Also....remove this part...
if (get('posts')) {
$sitemap->page('posts');
$result = db_query ("SELECT url, updated FROM posts"); // 70,000 posts
while (list($url, $updated) = $result->fetch_row()) {
$sitemap->url($url, $updated, 'monthly');
}
}

Categories