Split string in to multiple parts in PHP - php

I'm writing an IRC bot in PHP and trying to split the below notice down in to multiple parts.
:irc.server.com NOTICE PHPServ :*** CONNECT: Client connecting on port 6667 (class users): Guest!Guest#127.0.0.1 (127.0.0.1) [Guest]<br />
So far I am using:
while(1) {
while($data = fgets($socket)) {
echo nl2br($data);
flush();
$ex = explode(' ', $data);
if($ex[0] == "PING"){
fputs($socket, "PONG ".$ex[1]."\n");
}
if($ex[1] == "NOTICE"){
if($ex[6] == "connecting"){
$userstring = $ex[12];
$usernick = strstr($userstring, '!', true);
$userip = strstr($userstring, '#');
}
}
}
}
?>
So $user.nick is working ok but $user.ip includes the # and the IP address. Why does this include the # but the nickname doesn't include the !?
Also how can I get $user.ident which is between the ! and the #?

try:
$userParts = explode('#', $userstring);
$userip = end($userParts);

The reason is includes the '#' is because (source: http://php.net/strstr):
Returns part of haystack string starting from and including the first
occurrence of needle to the end of haystack.
To solve this you could use substring like this:
$userstring = $ex[12];
$exPos = strpos($userstring, '!');
$atPos = strpos($userstring, '#');
$usernick = strstr($userstring, '!', true);
$userip = substr($userstring, $atPos + 1);
$userident = substr($userstring, ($exPos + 1), ($atPos - $exPos) - 1);
I left the first strstr because it's easier to read/understand than a substring call.

Related

I modified script to work with php 7.1 changing eregi to preg_match, script worked for a a few minutes on wamp, and suddenly it stopped working

The script below creates a log file for all bot visits, sends me an email, and also verifies IP at ip2location. It worked just fine with PHP5.2 with the eregi function, so I modified the eregi line to preg_match and worked for a few minutes on my wamp testing server after adding forward slashes to each bot variable because I was getting a "reg_match(): Delimiter must not be alphanumeric or backslash" warning , but now it won't work and won't log any bots in the visits.log file.
The script still gives me these three warnings below, but since they were warnings and it had begun working, I didn't pay much attention to them:
Notice: Undefined offset: 5 in C:\wamp\www\visits.php on line 28
Warning: preg_match(): Empty regular expression in C:\wamp\www\visits.php on line 28
Notice: Undefined index: js in C:\wamp\www\visits.php on line 62
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$to = "email#here.com";
$log = "./visits.log";
$dateTime = date("r");
$agents[] = "/googlebot/";
$spiders[] = "/Google/";
$spiders[] = "/Googlebot/";
$agents[] = "/slurp/";
$spiders[] = "/Slurp (Inktomi's robot, HotBot)/";
$agents[] = "/msnbot/";
$spiders[] = "/MSN Robot (MSN Search, search\.msn\.com)/";
$agents[] = "/yahoo\! slurp/";
$spiders[] = "/Yahoo! Slurp/";
$agents[] = "/bingbot/";
$spiders[] = "/Bing\.com/";
$ip= $_SERVER['REMOTE_ADDR'];
$found = false;
for ($spi = 0; $spi < count($spiders); $spi++)
if ($found = preg_match($agents[$spi], $_SERVER['HTTP_USER_AGENT']))
break;
if ($found) {
$url = "http://" . $_SERVER['SERVER_NAME']. $_SERVER['PHP_SELF'];
if ($_SERVER['QUERY_STRING'] != "") {
$url .= '?' . $_SERVER['QUERY_STRING'];
}
$line = $dateTime . " " . $spiders[$spi] . " " . $ip." # " . $url;
$ip2location = "https://www.ip2location.com/".$_SERVER['REMOTE_ADDR'];
if ($log != "") {
if (#file_exists($log)) {
$mode = "a";
} else {
$mode = "w";
}
if ($f = #fopen($log, $mode)) {
#fwrite($f, $line . "\n");
#fclose($f);
}
}
if ($to != "") {
$to = "email#here.com";
$subject = $spiders[$spi]. " crawled your site";
$body = "$line". "\xA\xA" ."Whois verification available at: $ip2location";
mail($to, $subject, $body);
}
}
if ($_REQUEST["js"]) {
header("Content-Type: image/gif\r\n");
header("Cache-Control: no-cache, must-revalidate\r\n");
header("Pragma: no-cache\r\n");
#readfile("visits.gif");
}
?>
a) you have 6 elements in $spiders and only 5 in $agents which results in the warning about offset 5 and empty regular expression. Googlebot is doubled:
$spiders[] = "/Google/";
$spiders[] = "/Googlebot/";
remove one entry
b) if ($_REQUEST["js"]) { should be replaced with:
if (isset($_REQUEST["js"])) { and depending what value you expect there to be afterwards the isset the value should be checked - for instance if you verify against true:
if (isset($_REQUEST["js"]) && $_REQUEST['js'] === true) {
Brackets have a special meaning in php 7 preg_match's regex. Just escape them it should work fine. As for the first warning instead of just coint($agents) use count($agents) - 1 sine array indexes begin at zero or just use foreach .
Second waring use if(isset($_REQUEST ["js"])
Good luck

PHP Strip domain name from url

I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.
I'm trying to build a function which strips the domain name from a url:
http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
Only the plain name of the domain is needed.
With parse_url I get the domain filtered but that is not enough.
I have 3 functions that stips the domain but still I get some wrong outputs
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.
UPDATE:
Thanks for all the suggestions!
I figured it out with the help from you all.
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
How about
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
http://php.net/manual/en/function.array-pop.php
Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.
What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.
This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.
Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.
This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.
At anyrate, this was bugging me, so I hope this answer helps :
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>
try with preg_replace.
something like
$domain = preg_replace($regex, '$1', $url);
regex
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}

How to search for strings of webpage without saving content?

I know of one method where you can do this:
$url = "http://www.google.com/search?q=test";
$str = file_get_contents($url);
preg_match("title/tt\d{7}?/", $str, $matches);
print $matches[0];
But this reads the whole file and then scans for the match.Is there anyway I can reduce the time time taken for doing the above process of matching?
If you know where inside the webpage you need to look (i.e only the first 3000 characters or so), you can use the maxlen parameter in file_get_contents to limit the reading:
file_get_contents($url, false, NULL, -1, 3000);
UPDATE
If you don't know where to look in the webpage and you want to minimize http request length, I worked up a nice solution for you :))
$url = "www.google.com";
$step = 3000;
$found = false;
$addr = gethostbyname($url);
$client = stream_socket_client("tcp://$addr:80", $errno, $errorMessage);
if ($client === false) {
throw new UnexpectedValueException("Failed to connect: $errorMessage");
}
fwrite($client, "GET /search?q=test HTTP/1.0\r\nHost: $url\r\nAccept: */*\r\n\r\n");
$str = "";
while(!feof($client)){
$str .= stream_get_contents($client, $step, -1);
if(preg_match("/tt\d{7}?/", $str, $matches)){
$found = true;
break;
}
}
fclose($client);
if($found){
echo $matches[0];
} else {
echo "not found";
}
EXPLANATION:
set the $step variable to be the number of bytes to read each iteration, and change the "search?q=test" to your desired query (IMDB titles, judging by your regex? :) ). It will do the job wonderfully.
You can also do echo $str after the while loop to see exactly how much it has read until it found the requested string.
I believe this was what you were looking for.

How to use preg_match to search from a string, including a symbol?

I have a pattern status: available, but the colon symbol doesn't work somehow. How to modify this patern?
I have messed something in the code, will notify you when I find it. Thank You
OK I broke down the code. I'm writing a script for domain avalability.
<?php
$server = 'whois.cira.ca';
$pattern = 'status: available';
$domain = 'nonexistingdomain';
$extension = '.ca';
$buffer = NULL;
$sock = fsockopen($server, 43) or die('Error Connecting To Server: ' . $server);
fputs($sock, $domain.$extension . "\r\n");
while( !feof($sock) )
{
$buffer .= fgets($sock,128);
}
//If I give a value localy to $buffer (like below) it works, but if $buffer takes the value from fgets() function it wont
$buffer = "Domain name: nonexistingdomain.ca Domain status: available % WHOIS look-up made at 2013-01-16 12:35:45 (GMT) % % Use of CIRA's WHOIS service is governed by the Terms of Use in its Legal % Notice, available at http://www.cira.ca/legal-notice/?lang=en % % (c) 2013 Canadian Internet Registration Authority, (http://www.cira.ca/) NO";
fclose($sock);
if(preg_match("/$pattern/", $buffer))
echo "YEP";
else
echo "NO";
?>
If I change $pattern to "available" it works!
It seems like you are missing a delimeter.
Try this:
<?php
$pattern = '/status: available/';
$string = "String1: status: available";
$string1 = "String2: status: unavailable";
if (preg_match($pattern,$string))
echo 'String1 matches<br>';
else
echo 'String1 does not match<br>';
if (preg_match($pattern,$string1))
echo 'String2 matches<br>';
else
echo 'String 2 does not match<br>';
?>
Gives the following output:
String1 matches
String 2 does not match

Check if a user entered an email address that has a domain similar to the domain name they enter above

In my signup form, I ask users to enter an email with the same domain name as they enter in the url field above.
Right now, I collect data this way:
URL : http://www.domain.com The domain.com part is what the user enters. The http://www is hard coded.
Email : info# domain.com The bold part is entered by the user. The # is hard coded.
The domain.com part in the url and domain.com part in the email should match. Right now, I can match the two fields since they are separate.
But I want to give up the above approach and make the user enter the entire domain name and email. When that's the case, what would be a good way to check if a user entered an email with the same domain he entered in the url field above.
I'm doing all this using php.
<?php
//extract domain from email
$email_domain_temp = explode("#", $_POST['email']);
$email_domain = $email_domain_temp[1];
//extract domain from url
$url_domain_temp = parse_url($_POST['url']);
$url_domain = strip_out_subdomain($url_domain_temp['host']);
//compare
if ($email_domain == $url_domain){
//match
}
function strip_out_subdomain($domain){
//do nothing if only 1 dot in $domain
if (substr_count($domain, ".") == 1){
return $domain;
}
$only_my_domain = preg_replace("/^(.*?)\.(.*)$/","$2",$domain);
return $only_my_domain;
}
So what this does is :
First, split the email string in 2 parts in an array. The second part is the domain.
Second, use the php built in function to parse the url, then extract the "host", while removing the (optionnal) subdomain.
Then compare.
you can do this by explode()
supp url = bla#gmail.com
$pieces = explode("#", $url);
$new = $pieces[1]; //which will be gmail.com
now again explode
$newpc= explode(".", $new );
$new1 = $newpc[0]; //which will be gmail
This is my version (tested, works):
<?php
$domain = 'www2.example.com'; // Set domain here
$email = 'info#example.com'; // Set email here
if(!preg_match('~^https?://.*$~i', $domain)) { // Does the URL start with http?
$domain = "http://$domain"; // No, prepend it with http://
}
if(filter_var($domain, FILTER_VALIDATE_URL)) { // Validate URL
$host = parse_url($domain, PHP_URL_HOST); // Parse the host, if it is an URL
if(substr_count($host, '.') > 1) { // Is there a subdomain?
$host = substr($host, -strrpos(strrev($host), '.')); // Get the host
}
if(strpos(strrev($email), strrev($host)) === 0) { // Does it match the end of the email?
echo 'Valid!'; // Valid
} else {
echo 'Does not match.'; // Invalid
}
} else {
echo 'Invalid domain!'; // Domain is invalid
}
?>
you could do:
$parsedUrl = parse_url($yourEnteredUrl);
$domainHost = str_replace("www.", "", $parsedUrl["host"]);
$emailDomain = array_pop(explode('#', $yourEnteredEmail));
if( $emailDomain == $domainHost ) {
//valid data
}
$email = 'myemail#example.com';
$site = 'http://example.com';
$emailDomain = ltrim( strstr($email, '#'), '#' );
// or automate it using array_map(). Syntax is correct only for >= PHP5.4
$cases = ['http://'.$emailDomain, 'https://'.$emailDomain, 'http://www.'.$emailDomain, 'https://www.'.$emailDomain];
$bSameDomain = in_array($site, $cases);
var_dump($bSameDomain);
Use regular expressions with positive lookbehinds(i.e only return the expression I'd like to match if it is preceded by a certain pattern, but don't include the lookbehind itself in the match), like so:
<?php
$url = preg_match("/(?<=http:\/\/www\.).*/",$_POST['url'],$url_match);
$email = preg_match("/(?<=#).*/",$_POST['email'],$email_match);
if ($url_match[0]==$email_match[0]) {
// Success Code
}
else {
// Failure Code
}
?>
Of course this is a bit oversimplified as you also need to account for https or www2 and the likes, but these require only minor changes to the RegExp, using the question mark as the "optional" operator

Categories