I am writing a code for Bayesian Filter. For a particular word, I want to check if the word is in the stop words list or not, I populate from stop word list from a file on my pc.
Because I have to do this for many words I don't want to read the StopWord file from my pc again and again.
I want to do something like this
function isStopWord( $word ){
if(!isset($stopWordDict))
{
$stopWords = array();
$handle = fopen("StopWords.txt", "r");
if( $handle )
{
while( ( $buffer = fgets( $handle ) ) != false )
{
$stopWords[] = trim( $buffer );
}
}
echo "StopWord opened";
static $stopWordDict = array();
foreach( $stopWords as $stopWord )
$stopWordDict[$stopWord] = 1;
}
if( array_key_exists( $word, $stopWordDict ) )
return true;
else
return false;
}
I thought by using a static variable it will solve the issue, but it doesn't. Kindly help.
Put the static declaration at the beginning of the function:
function isStopWord( $word ){
static $stopWordDict = array();
if(!$stopWordDict)
{
$stopWords = file("StopWords.txt");
echo "StopWord opened";
foreach( $stopWords as $stopWord ) {
$stopWordDict[trim($stopWord)] = 1;
}
}
if( array_key_exists( $word, $stopWordDict ) )
return true;
else
return false;
}
This will work since an empty array is considered falsy.
Related
I've looked thru the questions already with the system, but couldn't find the answer to my problem. I want to try a counter in a php recursive function, which looks for and deletes empty folders. I paste below a way to echo non-empties as "–" and empties as "|". It looks ok after all, but in case there's a lot to purge, it all grows into gibberish on the screen. Instead I'd like to see the numbers of folders checked vs deleted. Here's the code compiled so far using StackOverflow too. Any help pls?
function RemoveEmptySubFolders($path) {
echo "–";
$empty = true;
foreach ( glob ( $path . DIRECTORY_SEPARATOR . "*" ) as $file ) {
if (is_dir ( $file )) {
if (! RemoveEmptySubFolders ( $file ))
$empty = false;
} else {
$empty = false;
}
}
if ($empty) {
if (is_dir ( $path )) {
// echo "Removing $path...<br>";
rmdir ( $path );
echo "|";
}
}
return $empty;
}
Just pass the variables as reference:
function recursive($path, &$directories, &$removed) {
echo "-";
$directories ++;
$empty = true;
foreach ( glob ( $path . DIRECTORY_SEPARATOR . "*" ) as $file ) {
if (is_dir ( $file )) {
if (! recursive ( $file, $directories, $removed ))
$empty = false;
} else {
$empty = false;
}
}
if ($empty) {
if (is_dir ( $path )) {
$removed++;
echo "|";
}
}
return $empty;
}
$path = "c:\exampledir";
$directories = 0;
$removed = 0;
recursive($path, $directories, $removed);
echo("<br>$directories, $removed");
You could also use global variables, but that's very ugly, and every time you use a global variable othen than the standard ones, a kitty dies.
Object-Oriented variant:
Put everything in a class and add the instance attributes $checked and $deleted, then increment them with a selfmade procedure or (nasty code) just with += 1 or $var ++.
I've been trying to build this recursive function for the better part of a day now, but I just can't seem to get it to work the way I want.
First, I have a property which holds some data that the function have to access:
$this->data
And then I have this string which the intention is to turn into a relative path:
$path = 'path.to.%id%-%folder%.containing.%info%';
The part of the string that are like this: %value% will load some dynamic values found in the $this->data property (like so: $this->data['id']; or $this->data['folder'];
and to make things really interesting, the property can reference itself again like so: $this->data['folder'] = 'foldername.%subfolder%'; and also have two %values% separated by a - that would have to be left alone.
So to the problem, I've been trying to make a recursive function that will load the dynamic values from the data property, and then again if the new value contains another %value% and so on until no more %value%'s are loaded.
So far, this is what I've been able to come up with:
public function recursiveFolder( $folder, $pathArr = null )
{
$newPathArr = explode( '.', $folder );
if ( count ( $newPathArr ) !== 1 )
{
foreach( $newPathArr as $id => $folder )
{
$value = $this->recursiveFolder( $folder, $newPathArr );
$resultArr = explode( '.', $value );
if ( count ( $resultArr ) !== 1 )
{
foreach ( $resultArr as $nid => $result )
{
$nvalue = $this->recursiveFolder( $result, $newPathArr );
$resultArr[$nid] = $nvalue;
}
}
$resultArr = implode( '.',$resultArr );
$newPathArr[$id] = $resultArr;
}
}
else
{
$pattern = '/%(.*?)%/si';
preg_match_all( $pattern, $folder, $matches );
if ( empty( $matches[0] ) )
{
return $folder;
}
foreach ( $matches[1] as $mid => $match )
{
if ( isset( $this->data[$match] ) && $this->data[$match] != '' )
{
$folder = str_replace( $matches[0][$mid], $this->data[$match], $folder );
return $folder;
}
}
}
return $newPathArr;
}
Unfortunately it is not a recursive function at all as it grinds to a halt when it has multiple layers of %values%, but works with two layers -barely-. (I just coded it so that it would work at a bare minimalistic level this point).
Here's how it should work:
It should turn:
'files.%folder%.blog-%type%.and.%time%'
into:
'files.foldername.blog-post.and.2013.feb-12th.09'
based on this:
$data['folder'] = 'foldername';
$data['type'] = 'post';
$data['time'] = '%year%.%month%-%day%';
$data['year'] = 2013;
$data['month'] = 'feb';
$data['day'] = '12th.%hour%';
$data['hour'] = '09';
Hope you can help!
Jay
I don't see the need for this too be solved recursively:
<?php
function putData($str, $data)
{
// Repeat the replacing process until no more matches are found:
while (preg_match("/%(.*?)%/si", $str, $matches))
{
// Use $matches to make your replaces
}
return $str;
}
?>
I'm trying to get a list of all occurrences of a file being included in a php script.
I'm reading in the entire file, which contains this:
<?php
echo 'Hello there';
include 'some_functions.php';
echo 'Trying to find some includes.';
include 'include_me.php';
echo 'Testtest.';
?>
Then, I run this code on that file:
if (preg_match_all ("/(include.*?;){1}/is", $this->file_contents, $matches))
{
print_r($matches);
}
When I run this match, I get the expected results... which are the two include sections, but I also get repeats of the exact same thing, or random chunks of the include statement. Here is an example of the output:
Array (
[0] => Array ( [0] => include 'some_functions.php'; [1] => include 'include_me.php'; )
[1] => Array ( [0] => include 'some_functions.php'; [1] => include 'include_me.php'; ) )
As you can see, it's nesting arrays with the same result multiple times. I need 1 item in the array for each include statement, no repeats, no nested arrays.
I'm having some trouble with these regular expressions, so some guidance would be nice. Thank you for your time.
what about this one
<?php
preg_match_all( "/include(_once)?\s*\(?\s*(\"|')(.*?)\.php(\"|')\s*\)?\s*;?/i", $this->file_contents, $matches );
// for file names
print_r( $matches[3] );
// for full lines
print_r( $matches[0] );
?>
if you want a better and clean way, then the only way is php's token_get_all
<?php
$tokens = token_get_all( $this->file_contents );
$files = array();
$index = 0;
$found = false;
foreach( $tokens as $token ) {
// in php 5.2+ Line numbers are returned in element 2
$token = ( is_string( $token ) ) ? array( -1, $token, 0 ) : $token;
switch( $token[0] ) {
case T_INCLUDE:
case T_INCLUDE_ONCE:
case T_REQUIRE:
case T_REQUIRE_ONCE:
$found = true;
if ( isset( $token[2] ) ) {
$index = $token[2];
}
$files[$index] = null;
break;
case T_COMMENT:
case T_DOC_COMMENT:
case T_WHITESPACE:
break;
default:
if ( $found && $token[1] === ";" ) {
$found = false;
if ( !isset( $token[2] ) ) {
$index++;
}
}
if ( $found ) {
if ( in_array( $token[1], array( "(", ")" ) ) ) {
continue;
}
if ( $found ) {
$files[$index] .= $token[1];
}
}
break;
}
}
// if your php version is above 5.2
// $files index will be line numbers
print_r( $files );
?>
Use get_included_files(), or the built-in tokenizer if the script is not included
I'm searching through a string of another files contents and not the
current file
Then your best bet is the tokenizer. Try this:
$scriptPath = '/full/path/to/your/script.php';
$tokens = token_get_all(file_get_contents($scriptPath));
$matches = array();
$incMode = null;
foreach($tokens as $token){
// ";" should end include stm.
if($incMode && ($token === ';')){
$matches[] = $incMode;
$incMode = array();
}
// keep track of the code if inside include statement
if($incMode){
$incMode[1] .= is_array($token) ? $token[1] : $token;
continue;
}
if(!is_array($token))
continue;
// start of include stm.
if(in_array($token[0], array(T_INCLUDE, T_INCLUDE_ONCE, T_REQUIRE, T_REQUIRE_ONCE)))
$incMode = array(token_name($token[0]), '');
}
print_r($matches); // array(token name, code)
Please read, how works preg_match_all
First item in array - it return all text, which is in regular expression.
Next items in array - that's texts from regular expression (in parenthesises).
You should use $matches[1]
I'm a little lost with that.
How can I retrieve the ISO country code of the visitors at one php page?
Thanks advance
You can either do this by Geolocation of the IP or by inspecting the right headers.
Usually you want the latter, since it tells you which languages the browser/system uses. You will only want to use geolocation when you want to know the physical location.
The header is stored in $_SERVER['HTTP_ACCEPT_LANGUAGE']. It contains comma-separated entries, e.g.: en-GB,en;q=0.8,en-US;q=0.6,nl;q=0.4 (my own)
The HTTP Accept Language parameters seperates it's languages by a comma, it's properties by a semicolon. The q-value is from 0 to 1, with 1 being the highest/most preferred. Here is some naive and untested code to parse it:
$langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
$preffered = "";
$prefvalue = 0;
foreach($langs as $lang){
$info = explode(';', $lang);
$val = (isset($lang[1])?$lang[1];1);
if($prefvalue < $val){
$preferred = $lang[0];
$prefvalue = $val;
}
}
Much simpler is it if you want to test if a specific language is accepted, e.g. Spanish (es):
if(strpos($_SERVER['HTTP_ACCEPT_LANGUAGE'], "es") !== false){
// Spanish is supported
}
I think you could use this php script which uses an ip and prints out a country code
Example
http://api.hostip.info/country.php?ip=4.2.2.2
Gives US
Check out
http://www.hostip.info/use.html
for more info.
A library i use myself and can recommend, is MaxMind GeoLite Country. To get the country code, you need only to copy 2 files to your server, the php code geoip.inc and the binary data GeoIP.dat.
Using the library is also very straightforward:
function ipToCountry()
{
include_once('geoip/geoip.inc');
$gi = geoip_open(__DIR__ . '/geoip/GeoIP.dat', GEOIP_STANDARD);
$result = geoip_country_code_by_addr($gi, $_SERVER['REMOTE_ADDR']);
geoip_close($gi);
return $result;
}
This will use GeoIp and fall back to accept_lang
class Ip2Country
{
function get( $target )
{
$country = false;
if( function_exists( 'geoip_record_by_name' ) )
$country = $this->getFromIp( $target );
if( !$country && isset( $_SERVER['HTTP_ACCEPT_LANGUAGE'] ) )
$country = $this->getFromLang( $_SERVER['HTTP_ACCEPT_LANGUAGE'] );
return $country;
}
function getFromIp( $target )
{
$dat = #geoip_record_by_name( $target );
return ( isset( $dat['country_code'] ) ) ? mb_strtolower( $dat['country_code'] ) : false;
}
function getFromLang( $str )
{
$info = array();
$langs = explode( ',', $str );
foreach( $langs as $lang )
{
$i = explode( ';', $lang );
$j = array();
if( !isset( $i[0] ) ) continue;
$j['code'] = $i[0];
if( strstr( $j['code'], '-' ) )
{
$parts = explode( '-', $j['code'] );
$j['lang'] = $parts[0];
$j['country'] = mb_strtolower( $parts[1] );
}
$info[] = $j;
}
return ( isset( $info[0]['country'] ) ) ? $info[0]['country'] : false;
}
}
$x = new Ip2Country();
var_dump( $x->get( 'canada.ca' ) );
I have created a script that generates information about a torrent file! But I'm lacking in creating a seeds and peers displaying function! Someone told me that they are in the completed field defined in the torrent. Please my class function codes from which I display the generated information using a bencode.php which takes out the data and this script ,named torrent.php converts it in readable form!
<?php
include_once('bencode.php');
class Torrent
{
// Private class members
private $torrent;
private $info;
// Public error message, $error is set if load() returns false
public $error;
// Load torrent file data
// $data - raw torrent file contents
public function load( &$data )
{
$this->torrent = BEncode::decode( $data );
if ( $this->torrent->get_type() == 'error' )
{
$this->error = $this->torrent->get_plain();
return false;
}
else if ( $this->torrent->get_type() != 'dictionary' )
{
$this->error = 'The file was not a valid torrent file.';
return false;
}
$this->info = $this->torrent->get_value('info');
if ( !$this->info )
{
$this->error = 'Could not find info dictionary.';
return false;
}
return true;
}
// Get comment
// return - string
public function getComment() {
return $this->torrent->get_value('comment') ? $this->torrent->get_value('comment')->get_plain() : null;
}
// Get creatuion date
// return - php date
public function getCreationDate() {
return $this->torrent->get_value('creation date') ? $this->torrent->get_value('creation date')->get_plain() : null;
}
// Get created by
// return - string
public function getCreatedBy() {
return $this->torrent->get_value('created by') ? $this->torrent->get_value('created by')->get_plain() : null;
}
// Get name
// return - filename (single file torrent)
// directory (multi-file torrent)
// see also - getFiles()
public function getName() {
return $this->info->get_value('name')->get_plain();
}
// Get piece length
// return - int
public function getPieceLength() {
return $this->info->get_value('piece length')->get_plain();
}
// Get pieces
// return - raw binary of peice hashes
public function getPieces() {
return $this->info->get_value('pieces')->get_plain();
}
// Get private flag
// return - -1 public, implicit
// 0 public, explicit
// 1 private
public function getPrivate() {
if ( $this->info->get_value('private') )
{
return $this->info->get_value('private')->get_plain();
}
return -1;
}
// Get a list of files
// return - array of Torrent_File
public function getFiles() {
// Load files
$filelist = array();
$length = $this->info->get_value('length');
if ( $length )
{
$file = new Torrent_File();
$file->name = $this->info->get_value('name')->get_plain();
$file->length = $this->info->get_value('length')->get_plain();
array_push( $filelist, $file );
}
else if ( $this->info->get_value('files') )
{
$files = $this->info->get_value('files')->get_plain();
while ( list( $key, $value ) = each( $files ) )
{
$file = new Torrent_File();
$path = $value->get_value('path')->get_plain();
while ( list( $key, $value2 ) = each( $path ) )
{
$file->name .= "/" . $value2->get_plain();
}
$file->name = ltrim( $file->name, '/' );
$file->length = $value->get_value('length')->get_plain();
array_push( $filelist, $file );
}
}
return $filelist;
}
// Get a list of trackers
// return - array of strings
public function getTrackers() {
// Load tracker list
$trackerlist = array();
if ( $this->torrent->get_value('announce-list') )
{
$trackers = $this->torrent->get_value('announce-list')->get_plain();
while ( list( $key, $value ) = each( $trackers ) )
{
if ( is_array( $value->get_plain() ) ) {
while ( list( $key, $value2 ) = each( $value ) )
{
while ( list( $key, $value3 ) = each( $value2 ) )
{
array_push( $trackerlist, $value3->get_plain() );
}
}
} else {
array_push( $trackerlist, $value->get_plain() );
}
}
}
else if ( $this->torrent->get_value('announce') )
{
array_push( $trackerlist, $this->torrent->get_value('announce')->get_plain() );
}
return $trackerlist;
}
// Helper function to make adding a tracker easier
// $tracker_url - string
public function addTracker( $tracker_url )
{
$trackers = $this->getTrackers();
$trackers[] = $tracker_url;
$this->setTrackers( $trackers );
}
// Replace the current trackers with the supplied list
// $trackerlist - array of strings
public function setTrackers( $trackerlist )
{
if ( count( $trackerlist ) >= 1 )
{
$this->torrent->remove('announce-list');
$string = new BEncode_String( $trackerlist[0] );
$this->torrent->set( 'announce', $string );
}
if ( count( $trackerlist ) > 1 )
{
$list = new BEncode_List();
while ( list( $key, $value ) = each( $trackerlist ) )
{
$list2 = new BEncode_List();
$string = new BEncode_String( $value );
$list2->add( $string );
$list->add( $list2 );
}
$this->torrent->set( 'announce-list', $list );
}
}
// Update the list of files
// $filelist - array of Torrent_File
public function setFiles( $filelist )
{
// Load files
$length = $this->info->get_value('length');
if ( $length )
{
$filelist[0] = str_replace( '\\', '/', $filelist[0] );
$string = new BEncode_String( $filelist[0] );
$this->info->set( 'name', $string );
}
else if ( $this->info->get_value('files') )
{
$files = $this->info->get_value('files')->get_plain();
for ( $i = 0; $i < count( $files ); ++$i )
{
$file_parts = split( '/', $filelist[$i] );
$path = new BEncode_List();
foreach ( $file_parts as $part )
{
$string = new BEncode_String( $part );
$path->add( $string );
}
$files[$i]->set( 'path', $path );
}
}
}
// Set the comment field
// $value - string
public function setComment( $value )
{
$type = 'comment';
$key = $this->torrent->get_value( $type );
if ( $value == '' ) {
$this->torrent->remove( $type );
} elseif ( $key ) {
$key->set( $value );
} else {
$string = new BEncode_String( $value );
$this->torrent->set( $type, $string );
}
}
// Set the created by field
// $value - string
public function setCreatedBy( $value )
{
$type = 'created by';
$key = $this->torrent->get_value( $type );
if ( $value == '' ) {
$this->torrent->remove( $type );
} elseif ( $key ) {
$key->set( $value );
} else {
$string = new BEncode_String( $value );
$this->torrent->set( $type, $string );
}
}
// Set the creation date
// $value - php date
public function setCreationDate( $value )
{
$type = 'creation date';
$key = $this->torrent->get_value( $type );
if ( $value == '' ) {
$this->torrent->remove( $type );
} elseif ( $key ) {
$key->set( $value );
} else {
$int = new BEncode_Int( $value );
$this->torrent->set( $type, $int );
}
}
// Change the private flag
// $value - -1 public, implicit
// 0 public, explicit
// 1 private
public function setPrivate( $value )
{
if ( $value == -1 ) {
$this->info->remove( 'private' );
} else {
$int = new BEncode_Int( $value );
$this->info->set( 'private', $int );
}
}
// Bencode the torrent
public function bencode()
{
return $this->torrent->encode();
}
// Return the torrent's hash
public function getHash()
{
return strtoupper( sha1( $this->info->encode() ) );
}
}
// Simple class to encapsulate filename and length
class Torrent_File
{
public $name;
public $length;
}
?>
Please help me out!
Thanks in advance!
Little late but the class you say to have created comes from:
https://github.com/torrage/Torrage
It's original purpose was not intended to retrieve that kind of data.
A class that gets you seeds and peers for torrent inclusive the rest of the data see:
https://github.com/adriengibrat/torrent-rw
That information's not stored in the .torrent file. It's highly dynamic data, which can change every microsecond on a 'busy' torrent. The server's not going to build a custom .torrent file with up-to-the-minute statistics every time someone downloads it.
Think about it for a second. You download a .torrent file on Monday, but only look at it next Friday. The stats are now a week old and stale.
You can, however, take the tracker information in the .torrent and query those trackers for the stats.