What is a workaround for colour coding in wordwrap() - php

So Minecraft uses section signs (§) for colour coding so for example, light green is §a (a is the color code id for green). An important note to remember is that these are VISUALLY ignored in-game. I'm using wordwrap() to make text look centred however these section signs get in the way of that because they're visually not there yet still considered as characters by the function itself.
Here's my attempt: if you take a look, I tried to count the number of occurrences the section sign was found and multiplied it by two for the colour code character. I later then realized that this is inefficient because this affects the entire line of code and not just a specific bit. This basically means that this would make the length of other colour coded lines look odd since they have more or less colour coding in them. I also tried a rather dumb alternative where I'd use constants but I quickly realized that wasn't going to work. Let me know if anything is unclear. Thanks in advance.
$line = "§r§7This is the §eAuction House§7! In the §eAuction House§7, you can sell and purchase items from other Luriders who have auctioned their items. The §eAuction House §7is a great way to make some cash by simply selling items that other players might be interested in buying."
public static function itemLineOptimizer(string $line, int $width = 40)
{
$width += substr_count($line, '§') * 2;
return wordwrap($line, $width, "\n");
}
Console Output:
string(281) "§r§7This is the §eAuction House§7! In the §eAuction
House§7, you can sell and purchase items from other
Luriders who have auctioned their items. The §eAuction
House §7is a great way to make some cash by simply
selling items that other players might be interested in
buying."
In-Game Output:
In-Game Output

No where near as efficient as IMSoP's approach, but it is an alternative method I wanted to share. So what I did was I replaced section signs, removed them, wordwrapped, then added them back to their correct places. A bit complicated at first look but it's quite simple. Every line has its details commented.
function itemLineOptimizer(string $line, int $width = 40)
{
$line = str_replace("§", "&", $line); // Since section signs aren't just one-byte, we're going to make our lives easier and replace them with another one-byte symbol, I went with "&"
$colourCoding = array(); // Straightforward
$split = str_split($line); // Splitting the line into an array per character
foreach ($split as $key => $char){ // for every character has a $key (position) and the character itself: $char
if($char === "&") { // Check if it's a section sign / symbol chosen
array_push($colourCoding, [$key, $split[$key + 1]]); // add to $colourCoding an element which includes an array consisting of the position of the sign and the colour which the character at the position after
unset($split[$key]); // remove sign
unset($split[$key + 1]); // remove colour
}
}
// Now we've removed all colour coding from the line and saved it in $colourCoding
$bland = wordwrap(implode("", $split), $width, "\n"); // $bland is the now colourless wordwrapped line
foreach ($colourCoding as $array){ // Lastly we add the section signs back in their positions
$key = $array[0]; // position
$colour = $array[1]; // colour
$lineBreak = substr_count($bland, "§"); // Check for section signs already inside this line: they interfere with future loops since the correct position is different
$bland = substr_replace($bland, "§".$colour, $key + $lineBreak, 0); // Adding the colour coding back back to its correct position
}
return $bland; // Straightforward
}
$line = "§r§7This is the §eAuction House§7! In the §eAuction House§7, you can sell and purchase items from other Luriders who have auctioned their items. The §eAuction House §7is a great way to make some cash by simply selling items that other players might be interested in buying.";
var_dump(wordwrap($line, 40), itemLineOptimizer($line, 40));

One way to approach this which I though might be interesting is to take the internal implementation of wordwrap, and adapt it to our needs.
So I found the definition in the source, and in particular the special-case algorithm for handling a single-character line-break character which is all we need here, and saves us understanding all the other modes.
It works by copying the string, and then walking through it character by character, tracking when it last saw a space, and when it last saw or inserted a newline character. It then over-writes spaces with newline characters in place, without having to touch the rest of the string.
I first translated that literally into PHP (mostly a case of adding $ in front of each variable, and removing some special type handling macros), giving this:
function my_word_wrap($text, $linelength)
{
$newtext = $text;
$breakchar = "\n";
$laststart = $lastspace = 0;
$string_length = strlen($text);
for ($current = 0; $current < $string_length; $current++) {
if ( $text[$current] == $breakchar ) {
$laststart = $lastspace = $current + 1;
}
elseif ( $text[$current] == ' ' ) {
if ($current - $laststart >= $linelength) {
$newtext[$current] = $breakchar;
$laststart = $current + 1;
}
$lastspace = $current;
}
elseif ($current - $laststart >= $linelength && $laststart != $lastspace) {
$newtext[$lastspace] = $breakchar;
$laststart = $lastspace + 1;
}
}
return $newtext;
}
Two of those if statements include this condition which tracks how many characters we've seen since the last line break: $current - $laststart >= $linelength. What we could do is subtract from that the number of invisible characters we've seen, so they don't contribute to the "width" of lines: $current - $laststart - $invisibles >= $linelength.
Next, we need to detect section signs. My immediate guess was to use $text[$current] == '§', but that doesn't work because we're working in byte offsets, and § is not a single byte. Assuming UTF-8, it's specifically the pair of bytes which in hexadecimal are C2 A7, so we need to test the current and next character for that pair: $text[$current] == "\xC2" && $text[$current+1] == "\xA7".
Now we can detect the invisible characters, we can increment our $invisibles counter. Since § is two bytes, and the following character is also invisible, we want to increment the counter by three, and also move the $current pointer an extra two steps:
elseif ( $text[$current] == "\xC2" && $text[$current+1] == "\xA7" ) {
$invisibles += 3;
$current += 2;
}
Finally, we need to reset the $invisibles counter whenever we insert a newline, or see an existing one - in other words, everywhere we reset $laststart.
So, the final result looks like this:
function special_word_wrap($text, $linelength)
{
$newtext = $text;
$breakchar = "\n";
$laststart = $lastspace = $invisibles = 0;
$string_length = strlen($text);
for ($current = 0; $current < $string_length; $current++) {
if ( $text[$current] == $breakchar ) {
$laststart = $lastspace = $current + 1;
$invisibles = 0;
}
elseif ( $text[$current] == ' ' ) {
if ($current - $laststart - $invisibles >= $linelength) {
$newtext[$current] = $breakchar;
$laststart = $current + 1;
$invisibles = 0;
}
$lastspace = $current;
}
elseif ( $text[$current] == "\xC2" && $text[$current+1] == "\xA7" ) {
$invisibles += 3;
$current += 2;
}
elseif ($current - $laststart - $invisibles >= $linelength && $laststart != $lastspace) {
$newtext[$lastspace] = $breakchar;
$laststart = $lastspace + 1;
$invisibles = 0;
}
}
return $newtext;
}
Here's a live demo of it in action with your sample input.
Not the most elegant, and probably not the most efficient way to do it, but I enjoyed the exercise, even if it's not what you were hoping for. :)

Related

Math / statistics problem analyse words in string

In need of some help - am trying to analyse news articles.
I have a list of positive words and negative words. I am search the article contents for instances of the words a counting the up.
my problem is that the negative word list is a lot long that the positive so all the results a skewed to negative.
I am looking for a way to normalise the results so a positive word is weighted slightly against the negative to even out the fact that is a considerably high chance of finding a negative word. Unfortunately I have no idea where to start.
Appreciate you taking the time to read this.
Below is the code I have so far.
function process_scores($content)
{
$positive_score = 0;
for ($i = 0; $i < count($this->positive_words); $i++) {
if($this->positive_words[$i] != "")
{
$c = substr_count( strtolower($content) , $this->positive_words[$i] );
if($c > 0)
{
$positive_score += $c;
}
}
}
$negative_score = 0;
for ($i = 0; $i < count($this->negative_words); $i++) {
if($this->negative_words[$i] != "")
{
$c = substr_count( strtolower($content) , $this->negative_words[$i] );
if($c > 0)
{
$negative_score += $c;
}
}
}
return ["positive_score" => $positive_score, "negative_score" => $negative_score];
}
So I don't know php, but this seems less like a php question and more of a question of method. Right now when you analyze an article, you assign words as positive or negative based on whether or not they are in your dictionary, but because your dictionaries are of different sizes, you feel like this isn't giving you a fair analysis of the article.
One method you could try is to assign each word in the article a value. If a word does not exist in your dictionary, have the program prompt for manual interpretation of the word through the command line. Then decide whether the word is positive, negative, or neutral, and have the program add that word to the appropriate dictionary. This will be really annoying at first, but English speakers use roughly the same 2000 words for almost all of our conversation, so after a few articles, you will have robust dictionaries and not have to worry about skew because every single word will have been assigned a value.
I would suggest just throwing in a weighting factor to the output. The exact weighting is determined by trial and error. I went ahead and refactored your code since there was some repetition
<?php
class WordScore {
private $negative_words = [];
private $positive_words = [];
private $positive_weight = 1;
private $negative_weight = 1;
public function setScore(float $pos = 1, float $neg = 1) {
$this->negative_weight = $neg;
$this->positive_weight = $pos;
}
public function processScores($content) {
$positive_score = $this->countWords($content, $this->positive_words);
$negative_score = $this->countWords($content, $this->negative_words);
return [
"positive_score" => $positive_score * $this->positive_weight,
"negative_score" => $negative_score * $this->negative_weight
];
}
private function countWords( string $content, array $words, float $weight = 1 ) {
$count = 0;
foreach( $words as $word ) {
$count += substr_count( strtolower($content) , strtolower($word) );
}
return $count;
}
}
working example at http://sandbox.onlinephpfunctions.com/code/19b4ac3c12d35cf253e9fa6049e91508e4797a2e

Search for pattern in a string

Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}
If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?
What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.
Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}
Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.
quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;

Count lines of code on a Database field

Using PHP and MySQL. I need to do something similar to how Github does when it shows source code,
122 lines (98 sloc) 4.003 kb
I need to get the total number of Lines Of Code and Source Lines Of Code from a MySQL database result.
The source code will be in a wordpress table as a meta field, I know how to accomplish this with a file, but not with a Database result. I then need to also calculate the disk space used on this field to show something like 4.003 kb
If you know how to do any of this, I would appreciate any help
UPDATE
This very ugly code is the only solution I have found so far, if you search SO or Google, all the results show how to count lines of code on a FILE, my problem is very different, I am not fetching a file, I have a result from a MySQL database held in a $variable
This code below does 1 of the 3 things, it ets total number of lines of (LOC) so I would just need to get (SLOC) and (Disk space used) however I am very open to a better way of counting the LOC of a variable.
// Get LOC
$a = $code_source; // Variable holding our code
$result = count_chars($a, 0);
for ($i=0; $i < count($result); $i++) {
if ($result[$i] != 0) {
if (chr($i) == "\n") // line feed
$n = $result[$i];
if (chr($i) == "\r") // carriage return
$r = $result[$i];
}
}
if ($n > $r) $l = $n + 1;
if ($r >= $n) $l = $r + 1;
if (!isset($l) ) $l = "2";
echo "Line Of Code = " . $l;
Space
//Get Disk Space Used
if (function_exists('mb_strlen')) {
$size = mb_strlen($code_source, '8bit');
} else {
$size = strlen($code_source);
}
if($size >= 1024)
$size = round($size / 1024, 2).' KB';
else
$size = $size.' bytes';
echo 'size of file' . ': ' . $size;
2nd Update
Ok here is the end result I have now, I omitted #refp's (LOC) code as I could not get it to work for some reason, I did use his SLOC code and put in all into a usable class. Please fill free to improve
class SourceCodeHelper{
// Count total number of Lines (LOC)
public function countLOC($string){
$result = count_chars($string, 0);
for ($i=0; $i < count($result); $i++) {
if ($result[$i] != 0) {
if (chr($i) == "\n\n") // line feed
$n = $result[$i];
if (chr($i) == "\r") // carriage return
$r = $result[$i];
}
}
if ($n > $r) $l = $n + 1;
if ($r >= $n) $l = $r + 1;
if (!isset($l) ) $l = "2";
//substr_count ($data, "\n") + 1;
return $l;
}
// Count total Source code lines of Code (SLOC)
public function countSLOC($string){
return count(preg_split("/\n\s*/", $string));
}
// Calculate disk space usage of string
public function stringDiskSpace($string){
if (function_exists('mb_strlen')) {
$size = mb_strlen($string, '8bit');
} else {
$size = strlen($string);
}
return diskSpacePretty($size);
}
// Format a disk space usage into human readable format
public function diskSpacePretty($size){
if($size >= 1024)
$size = round($size / 1024, 2).' KB';
else
$size = $size.' bytes';
return $size;
}
}
There is no straight forward way of counting the number of lines in a field from the inside of mysql.
Instead I recommend you to append a new column to your table where you store the number of lines. Calculate the LOC/SLOC by using PHP (or whatever backend language is responsible for inserting data) and update it when neccessary.
To count the number of lines (LOC) in a string you can use substr_count which will count the number of occurrences of the second parameter string inside of the first one.
In the below example we will count how many times \n is present inside $data.
substr_count ($data, "\n") + 1; // since one \n creates two lines
To count the number of SLOC you can use preg_split to filter off all empty lines (lines of length 0 or only containing white spaces)
count (preg_split ("/\n\s*/", $data))
To answer your question, the following query should be able to count the number of lines in a field. However its a bit hacky and i doubt it will scale well! All depends on your use case i guess.
SELECT
SUM( LENGTH(fieldname) - LENGTH(REPLACE(fieldname, '\n', ''))+1)
FROM tablename
Hope that helps.

How i can Convert this Function Back?

function getVideoName($in) {
$index = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$base = strlen($index);
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out = $out + strpos($index, substr($in, $t, 1)) * $bcpow;
}
$out = sprintf('%F', $out);
$out = substr($out, 0, strpos($out, '.'));
return $out;
}
This Function return a converted value
How i can convert the value back to input number ?
You probably won't be able to do this. The code looks to be one way only.
for ($t = 0; $t <= $len; $t++) {
// bcpow raises the first argument to the power of the second argument,
// the first argument being the length of the string, the second being
// the length minus one, minus the current position being inspected.
// This can make a pretty large number depending on the length of the string.
$bcpow = bcpow($base, $len - $t);
// Then that number is multiplied by the position of
// the currently inspected character in the string,
// as if it was in the key string given earlier,
// then that number is added to the running total.
$out = $out + strpos($index, substr($in, $t, 1)) * $bcpow;
}
// Then it's formatted as a floating point number
$out = sprintf('%F', $out);
// and then truncated at the decimal.
$out = substr($out, 0, strpos($out, '.'));
This is effectively one way because undoing it would require knowing the length of the string and the position of characters within it, and if you know that, you have the original string!
The function is also slightly buggy, getVideoName('a') returns 0. So does getVideoName('aaaaaaaaaa'). getVideoName('d') returns 3, so does getVideoName('da').
The numbering is predictable, and follows a pattern. It might be called deterministic, even. Given enough input from an outsider that doesn't know the formula, it could be possible to either reconstruct or guess an output... but that would be a very time consuming and annoying process.

Wrongly asked or am I stupid?

There's a blog post comment on codinghorror.com by Paul Jungwirth which includes a little programming task:
You have the numbers 123456789, in that order. Between each number, you must insert either nothing, a plus sign, or a multiplication sign, so that the resulting expression equals 2001. Write a program that prints all solutions. (There are two.)
Bored, I thought, I'd have a go, but I'll be damned if I can get a result for 2001. I think the code below is sound and I reckon that there are zero solutions that result in 2001. According to my code, there are two solutions for 2002. Am I right or am I wrong?
/**
* Take the numbers 123456789 and form expressions by inserting one of ''
* (empty string), '+' or '*' between each number.
* Find (2) solutions such that the expression evaluates to the number 2001
*/
$input = array(1,2,3,4,5,6,7,8,9);
// an array of strings representing 8 digit, base 3 numbers
$ops = array();
$numOps = sizeof($input)-1; // always 8
$mask = str_repeat('0', $numOps); // mask of 8 zeros for padding
// generate the ops array
$limit = pow(3, $numOps) -1;
for ($i = 0; $i <= $limit; $i++) {
$s = (string) $i;
$s = base_convert($s, 10, 3);
$ops[] = substr($mask, 0, $numOps - strlen($s)) . $s;
}
// for each element in the ops array, generate an expression by inserting
// '', '*' or '+' between the numbers in $input. e.g. element 11111111 will
// result in 1+2+3+4+5+6+7+8+9
$limit = sizeof($ops);
$stringResult = null;
$numericResult = null;
for ($i = 0; $i < $limit; $i++) {
$l = $numOps;
$stringResult = '';
$numericResult = 0;
for ($j = 0; $j <= $l; $j++) {
$stringResult .= (string) $input[$j];
switch (substr($ops[$i], $j, 1)) {
case '0':
break;
case '1':
$stringResult .= '+';
break;
case '2':
$stringResult .= '*';
break;
default :
}
}
// evaluate the expression
// split the expression into smaller ones to be added together
$temp = explode('+', $stringResult);
$additionElems = array();
foreach ($temp as $subExpressions)
{
// split each of those into ones to be multiplied together
$multplicationElems = explode('*', $subExpressions);
$working = 1;
foreach ($multplicationElems as $operand) {
$working *= $operand;
}
$additionElems[] = $working;
}
$numericResult = 0;
foreach($additionElems as $operand)
{
$numericResult += $operand;
}
if ($numericResult == 2001) {
echo "{$stringResult}\n";
}
}
Further down the same page you linked to.... =)
"Paul Jungwirth wrote:
You have the numbers 123456789, in
that order. Between each number, you
must insert either nothing, a plus
sign, or a multiplication sign, so
that the resulting expression equals
2001. Write a program that prints all solutions. (There are two.)
I think you meant 2002, not 2001. :)
(Just correcting for anyone else like
me who obsessively tries to solve
little "practice" problems like this
one, and then hit Google when their
result doesn't match the stated
answer. ;) Damn, some of those Perl
examples are ugly.)"
The number is 2002.
Recursive solution takes eleven lines of JavaScript (excluding string expression evaluation, which is a standard JavaScript function, however it would probably take another ten or so lines of code to roll your own for this specific scenario):
function combine (digit,exp) {
if (digit > 9) {
if (eval(exp) == 2002) alert(exp+'=2002');
return;
}
combine(digit+1,exp+'+'+digit);
combine(digit+1,exp+'*'+digit);
combine(digit+1,exp+digit);
return;
}
combine(2,'1');

Categories