Count lines of code on a Database field - php

Using PHP and MySQL. I need to do something similar to how Github does when it shows source code,
122 lines (98 sloc) 4.003 kb
I need to get the total number of Lines Of Code and Source Lines Of Code from a MySQL database result.
The source code will be in a wordpress table as a meta field, I know how to accomplish this with a file, but not with a Database result. I then need to also calculate the disk space used on this field to show something like 4.003 kb
If you know how to do any of this, I would appreciate any help
UPDATE
This very ugly code is the only solution I have found so far, if you search SO or Google, all the results show how to count lines of code on a FILE, my problem is very different, I am not fetching a file, I have a result from a MySQL database held in a $variable
This code below does 1 of the 3 things, it ets total number of lines of (LOC) so I would just need to get (SLOC) and (Disk space used) however I am very open to a better way of counting the LOC of a variable.
// Get LOC
$a = $code_source; // Variable holding our code
$result = count_chars($a, 0);
for ($i=0; $i < count($result); $i++) {
if ($result[$i] != 0) {
if (chr($i) == "\n") // line feed
$n = $result[$i];
if (chr($i) == "\r") // carriage return
$r = $result[$i];
}
}
if ($n > $r) $l = $n + 1;
if ($r >= $n) $l = $r + 1;
if (!isset($l) ) $l = "2";
echo "Line Of Code = " . $l;
Space
//Get Disk Space Used
if (function_exists('mb_strlen')) {
$size = mb_strlen($code_source, '8bit');
} else {
$size = strlen($code_source);
}
if($size >= 1024)
$size = round($size / 1024, 2).' KB';
else
$size = $size.' bytes';
echo 'size of file' . ': ' . $size;
2nd Update
Ok here is the end result I have now, I omitted #refp's (LOC) code as I could not get it to work for some reason, I did use his SLOC code and put in all into a usable class. Please fill free to improve
class SourceCodeHelper{
// Count total number of Lines (LOC)
public function countLOC($string){
$result = count_chars($string, 0);
for ($i=0; $i < count($result); $i++) {
if ($result[$i] != 0) {
if (chr($i) == "\n\n") // line feed
$n = $result[$i];
if (chr($i) == "\r") // carriage return
$r = $result[$i];
}
}
if ($n > $r) $l = $n + 1;
if ($r >= $n) $l = $r + 1;
if (!isset($l) ) $l = "2";
//substr_count ($data, "\n") + 1;
return $l;
}
// Count total Source code lines of Code (SLOC)
public function countSLOC($string){
return count(preg_split("/\n\s*/", $string));
}
// Calculate disk space usage of string
public function stringDiskSpace($string){
if (function_exists('mb_strlen')) {
$size = mb_strlen($string, '8bit');
} else {
$size = strlen($string);
}
return diskSpacePretty($size);
}
// Format a disk space usage into human readable format
public function diskSpacePretty($size){
if($size >= 1024)
$size = round($size / 1024, 2).' KB';
else
$size = $size.' bytes';
return $size;
}
}

There is no straight forward way of counting the number of lines in a field from the inside of mysql.
Instead I recommend you to append a new column to your table where you store the number of lines. Calculate the LOC/SLOC by using PHP (or whatever backend language is responsible for inserting data) and update it when neccessary.
To count the number of lines (LOC) in a string you can use substr_count which will count the number of occurrences of the second parameter string inside of the first one.
In the below example we will count how many times \n is present inside $data.
substr_count ($data, "\n") + 1; // since one \n creates two lines
To count the number of SLOC you can use preg_split to filter off all empty lines (lines of length 0 or only containing white spaces)
count (preg_split ("/\n\s*/", $data))

To answer your question, the following query should be able to count the number of lines in a field. However its a bit hacky and i doubt it will scale well! All depends on your use case i guess.
SELECT
SUM( LENGTH(fieldname) - LENGTH(REPLACE(fieldname, '\n', ''))+1)
FROM tablename
Hope that helps.

Related

What is a workaround for colour coding in wordwrap()

So Minecraft uses section signs (§) for colour coding so for example, light green is §a (a is the color code id for green). An important note to remember is that these are VISUALLY ignored in-game. I'm using wordwrap() to make text look centred however these section signs get in the way of that because they're visually not there yet still considered as characters by the function itself.
Here's my attempt: if you take a look, I tried to count the number of occurrences the section sign was found and multiplied it by two for the colour code character. I later then realized that this is inefficient because this affects the entire line of code and not just a specific bit. This basically means that this would make the length of other colour coded lines look odd since they have more or less colour coding in them. I also tried a rather dumb alternative where I'd use constants but I quickly realized that wasn't going to work. Let me know if anything is unclear. Thanks in advance.
$line = "§r§7This is the §eAuction House§7! In the §eAuction House§7, you can sell and purchase items from other Luriders who have auctioned their items. The §eAuction House §7is a great way to make some cash by simply selling items that other players might be interested in buying."
public static function itemLineOptimizer(string $line, int $width = 40)
{
$width += substr_count($line, '§') * 2;
return wordwrap($line, $width, "\n");
}
Console Output:
string(281) "§r§7This is the §eAuction House§7! In the §eAuction
House§7, you can sell and purchase items from other
Luriders who have auctioned their items. The §eAuction
House §7is a great way to make some cash by simply
selling items that other players might be interested in
buying."
In-Game Output:
In-Game Output
No where near as efficient as IMSoP's approach, but it is an alternative method I wanted to share. So what I did was I replaced section signs, removed them, wordwrapped, then added them back to their correct places. A bit complicated at first look but it's quite simple. Every line has its details commented.
function itemLineOptimizer(string $line, int $width = 40)
{
$line = str_replace("§", "&", $line); // Since section signs aren't just one-byte, we're going to make our lives easier and replace them with another one-byte symbol, I went with "&"
$colourCoding = array(); // Straightforward
$split = str_split($line); // Splitting the line into an array per character
foreach ($split as $key => $char){ // for every character has a $key (position) and the character itself: $char
if($char === "&") { // Check if it's a section sign / symbol chosen
array_push($colourCoding, [$key, $split[$key + 1]]); // add to $colourCoding an element which includes an array consisting of the position of the sign and the colour which the character at the position after
unset($split[$key]); // remove sign
unset($split[$key + 1]); // remove colour
}
}
// Now we've removed all colour coding from the line and saved it in $colourCoding
$bland = wordwrap(implode("", $split), $width, "\n"); // $bland is the now colourless wordwrapped line
foreach ($colourCoding as $array){ // Lastly we add the section signs back in their positions
$key = $array[0]; // position
$colour = $array[1]; // colour
$lineBreak = substr_count($bland, "§"); // Check for section signs already inside this line: they interfere with future loops since the correct position is different
$bland = substr_replace($bland, "§".$colour, $key + $lineBreak, 0); // Adding the colour coding back back to its correct position
}
return $bland; // Straightforward
}
$line = "§r§7This is the §eAuction House§7! In the §eAuction House§7, you can sell and purchase items from other Luriders who have auctioned their items. The §eAuction House §7is a great way to make some cash by simply selling items that other players might be interested in buying.";
var_dump(wordwrap($line, 40), itemLineOptimizer($line, 40));
One way to approach this which I though might be interesting is to take the internal implementation of wordwrap, and adapt it to our needs.
So I found the definition in the source, and in particular the special-case algorithm for handling a single-character line-break character which is all we need here, and saves us understanding all the other modes.
It works by copying the string, and then walking through it character by character, tracking when it last saw a space, and when it last saw or inserted a newline character. It then over-writes spaces with newline characters in place, without having to touch the rest of the string.
I first translated that literally into PHP (mostly a case of adding $ in front of each variable, and removing some special type handling macros), giving this:
function my_word_wrap($text, $linelength)
{
$newtext = $text;
$breakchar = "\n";
$laststart = $lastspace = 0;
$string_length = strlen($text);
for ($current = 0; $current < $string_length; $current++) {
if ( $text[$current] == $breakchar ) {
$laststart = $lastspace = $current + 1;
}
elseif ( $text[$current] == ' ' ) {
if ($current - $laststart >= $linelength) {
$newtext[$current] = $breakchar;
$laststart = $current + 1;
}
$lastspace = $current;
}
elseif ($current - $laststart >= $linelength && $laststart != $lastspace) {
$newtext[$lastspace] = $breakchar;
$laststart = $lastspace + 1;
}
}
return $newtext;
}
Two of those if statements include this condition which tracks how many characters we've seen since the last line break: $current - $laststart >= $linelength. What we could do is subtract from that the number of invisible characters we've seen, so they don't contribute to the "width" of lines: $current - $laststart - $invisibles >= $linelength.
Next, we need to detect section signs. My immediate guess was to use $text[$current] == '§', but that doesn't work because we're working in byte offsets, and § is not a single byte. Assuming UTF-8, it's specifically the pair of bytes which in hexadecimal are C2 A7, so we need to test the current and next character for that pair: $text[$current] == "\xC2" && $text[$current+1] == "\xA7".
Now we can detect the invisible characters, we can increment our $invisibles counter. Since § is two bytes, and the following character is also invisible, we want to increment the counter by three, and also move the $current pointer an extra two steps:
elseif ( $text[$current] == "\xC2" && $text[$current+1] == "\xA7" ) {
$invisibles += 3;
$current += 2;
}
Finally, we need to reset the $invisibles counter whenever we insert a newline, or see an existing one - in other words, everywhere we reset $laststart.
So, the final result looks like this:
function special_word_wrap($text, $linelength)
{
$newtext = $text;
$breakchar = "\n";
$laststart = $lastspace = $invisibles = 0;
$string_length = strlen($text);
for ($current = 0; $current < $string_length; $current++) {
if ( $text[$current] == $breakchar ) {
$laststart = $lastspace = $current + 1;
$invisibles = 0;
}
elseif ( $text[$current] == ' ' ) {
if ($current - $laststart - $invisibles >= $linelength) {
$newtext[$current] = $breakchar;
$laststart = $current + 1;
$invisibles = 0;
}
$lastspace = $current;
}
elseif ( $text[$current] == "\xC2" && $text[$current+1] == "\xA7" ) {
$invisibles += 3;
$current += 2;
}
elseif ($current - $laststart - $invisibles >= $linelength && $laststart != $lastspace) {
$newtext[$lastspace] = $breakchar;
$laststart = $lastspace + 1;
$invisibles = 0;
}
}
return $newtext;
}
Here's a live demo of it in action with your sample input.
Not the most elegant, and probably not the most efficient way to do it, but I enjoyed the exercise, even if it's not what you were hoping for. :)

Time complexity of an algorithm: find length of a longest palindromic substring

I've written a small PHP function to find a length of a longest palindromic substring of a string. To avoid many loops I've used a recursion.
The idea behind algorithm is, to loop through an array and for each center (including centers between characters and on a character), recursively check left and right caret values for equality. Iteration for a particular center ends when characters are not equal or one of the carets is out of the array (word) range.
Questions:
1) Could you please write a math calculations which should be used to explain time complexity of this algorithm? In my understanding its O(n^2), but I'm struggling to confirm that with a detailed calculations.
2) What do you think about this solution, any improvement suggestions (considering it was written in 45 mins just for practice)? Are there better approaches from the time complexity perspective?
To simplify the example I've dropped some input checks (more in comments).
Thanks guys, cheers.
<?php
/**
* Find length of the longest palindromic substring of a string.
*
* O(n^2)
* questions by developer
* 1) Is the solution meant to be case sensitive? (no)
* 2) Do phrase palindromes need to be taken into account? (no)
* 3) What about punctuation? (no)
*/
$input = 'tttabcbarabb';
$input2 = 'taat';
$input3 = 'aaaaaa';
$input4 = 'ccc';
$input5 = 'bbbb';
$input6 = 'axvfdaaaaagdgre';
$input7 = 'adsasdabcgeeegcbgtrhtyjtj';
function getLenRecursive($l, $r, $word)
{
if ($word === null || strlen($word) === 0) {
return 0;
}
if ($l < 0 || !isset($word[$r]) || $word[$l] != $word[$r]) {
$longest = ($r - 1) - ($l + 1) + 1;
return !$longest ? 1 : $longest;
}
--$l;
++$r;
return getLenRecursive($l, $r, $word);
}
function getLongestPalSubstrLength($inp)
{
if ($inp === null || strlen($inp) === 0) {
return 0;
}
$longestLength = 1;
for ($i = 0; $i <= strlen($inp); $i++) {
$l = $i - 1;
$r = $i + 1;
$length = getLenRecursive($l, $r, $inp); # around char
if ($i > 0) {
$length2 = getLenRecursive($l, $i, $inp); # around center
$longerOne = $length > $length2 ? $length : $length2;
} else {
$longerOne = $length;
}
$longestLength = $longerOne > $longestLength ? $longerOne : $longestLength;
}
return $longestLength;
}
echo 'expected: 5, got: ';
var_dump(getLongestPalSubstrLength($input));
echo 'expected: 4, got: ';
var_dump(getLongestPalSubstrLength($input2));
echo 'expected: 6, got: ';
var_dump(getLongestPalSubstrLength($input3));
echo 'expected: 3, got: ';
var_dump(getLongestPalSubstrLength($input4));
echo 'expected: 4, got: ';
var_dump(getLongestPalSubstrLength($input5));
echo 'expected: 5, got: ';
var_dump(getLongestPalSubstrLength($input6));
echo 'expected: 9, got: ';
var_dump(getLongestPalSubstrLength($input7));
Your code doesn't really need to be recursive. A simple while loop would do just fine.
Yes, complexity is O(N^2). You have N options for selecting the middle point. The number of recursion steps goes from 1 to N/2. The sum of all that is 2 * (N/2) * (n/2 + 1) /2 and that is O(N^2).
For code review, I wouldn't do recursion here since it's fairly straightforward and you don't need the stack at all. I would replace it with a while loop (still in a separate function, to make the code more readable).

combinations without duplicates using php

I need all possible combinations in math sense (without duplicates) where n=30 and k=18
function subcombi($arr, $arr_size, $count)
{
$combi_arr = array();
if ($count > 1) {
for ($i = $count - 1; $i < $arr_size; $i=$i+1) {
$highest_index_elem_arr = array($i => $arr[$i]);
foreach (subcombi($arr, $i, $count - 1) as $subcombi_arr)
{
$combi_arr[] = $subcombi_arr + $highest_index_elem_arr;
}
}
} else {
for ($i = $count - 1; $i < $arr_size; $i=$i+1) {
$combi_arr[] = array($i => $arr[$i]);
}
}
return $combi_arr;
}
function combinations($arr, $count)
{
if ( !(0 <= $count && $count <= count($arr))) {
return false;
}
return $count ? subcombi($arr, count($arr), $count) : array();
}
$numeri="01.02.03.04.05.06.07.08.09.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30";
$numeri_ar=explode(".",$numeri);
$numeri_ar=array_unique($numeri_ar);
for ($combx = 2; $combx < 19; $combx++)
{
$combi_arr = combinations($numeri_ar, $combx);
}
print_r($combi_arr);
It works but it terminates with an out of memory error, of course, number of combinations is too large.
Now I do not need exactly all the combinations. I need only a few of them.
I'll explain.
I need this work for a statistical study over Italian lotto.
I have the lotto archive in this format saved in $archivio array
...
35.88.86.03.54
70.72.45.18.09
55.49.35.30.43
15.52.49.41.72
74.26.54.77.90
33.14.56.42.11
08.79.41.01.52
82.33.32.83.43
...
A full archive is available here
https://pastebin.com/tut6kFXf
newer extractions are on top.
I tried (unsuccessfully) to modify the function to do this
for each 18 numbers combination found by the function combinations, the function should check if there are min. 3 numbers in one of the first 30 rows of $archivio. If "yes", the combination must not be saved in combination array, this combination has no statistical value for my need. If "no", the combination must be saved in combination array, this combination has great statistical value for my need.
In this way the total combinations will be no more than some hundred or thousand and I will avoid the out of memory and I'll have what I need.
The script time will be surely long but there should not be out of memory using the way above.
Anyone is able to help me in this ?
Thank you

Random generator returning endless duplicates

I am trying to create a random string which will be used as a short reference number. I have spent the last couple of days trying to get this to work but it seems to get to around 32766 records and then it continues with endless duplicates. I need at minimum 200,000 variations.
The code below is a very simple mockup to explain what happens. The code should be syntaxed according to 1a-x1y2z (example) which should give a lot more results than 32k
I have a feeling it may be related to memory but not sure. Any ideas?
<?php
function createReference() {
$num = rand(1, 9);
$alpha = substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, 1);
$char = '0123456789abcdefghijklmnopqrstuvwxyz';
$charLength = strlen($char);
$rand = '';
for ($i = 0; $i < 6; $i++) {
$rand .= $char[rand(0, $charLength - 1)];
}
return $num . $alpha . "-" . $rand;
}
$codes = [];
for ($i = 1; $i <= 200000; $i++) {
$code = createReference();
while (in_array($code, $codes) == true) {
echo 'Duplicate: ' . $code . '<br />';
$code = createReference();
}
$codes[] = $code;
echo $i . ": " . $code . "<br />";
}
exit;
?>
UPDATE
So I am beginning to wonder if this is not something with our WAMP setup (Bitnami) as our local machine gets to exactly 1024 records before it starts duplicating. By removing 1 character from the string above (instead of 6 in the for loop I make it 5) it gets to exactly 32768 records.
I uploaded the script to our centos server and had no duplicates.
What in our enviroment could cause such a behaviour?
The code looks overly complex to me. Let's assume for the moment you really want to create n unique strings each based on a single random value (rand/mt_rand/something between INT_MIN,INT_MAX).
You can start by decoupling the generation of the random values from the encoding (there seems to be nothing in the code that makes a string dependant on any previous state - excpt for the uniqueness). Comparing integers is quite a bit faster than comparing arbitrary strings.
mt_rand() returns anything between INT_MIN and INT_MAX, using 32bit integers (could be 64bit as well, depends on how php has been compiled) that gives ~232 elements. You want to pick 200k, let's make it 400k, that's ~ a 1/10000 of the value range. It's therefore reasonable to assume everything goes well with the uniqueness...and then check at a later time. and add more values if a collision occured. Again much faster than checking in_array in each iteration of the loop.
Once you have enough values, you can encode/convert them to a format you wish. I don't know whether the <digit><character>-<something> format is mandatory but assume it is not -> base_convert()
<?php
function unqiueRandomValues($n) {
$values = array();
while( count($values) < $n ) {
for($i=count($values);$i<$n; $i++) {
$values[] = mt_rand();
}
$values = array_unique($values);
}
return $values;
}
function createReferences($n) {
return array_map(
function($e) {
return base_convert($e, 10, 36);
},
unqiueRandomValues($n)
);
}
$start = microtime(true);
$references = createReferences(400000);
$end = microtime(true);
echo count($references), ' ', count(array_unique($references)), ' ', $end-$start, ' ', $references[0];
prints e.g. 400000 400000 3.3981630802155 f3plox on my i7-4770. (The $end-$start part is constantly between 3.2 and 3.4)
Using base_convert() there can be strings like li10, which can be quite annoying to decipher if you have to manually type the string.

Wrongly asked or am I stupid?

There's a blog post comment on codinghorror.com by Paul Jungwirth which includes a little programming task:
You have the numbers 123456789, in that order. Between each number, you must insert either nothing, a plus sign, or a multiplication sign, so that the resulting expression equals 2001. Write a program that prints all solutions. (There are two.)
Bored, I thought, I'd have a go, but I'll be damned if I can get a result for 2001. I think the code below is sound and I reckon that there are zero solutions that result in 2001. According to my code, there are two solutions for 2002. Am I right or am I wrong?
/**
* Take the numbers 123456789 and form expressions by inserting one of ''
* (empty string), '+' or '*' between each number.
* Find (2) solutions such that the expression evaluates to the number 2001
*/
$input = array(1,2,3,4,5,6,7,8,9);
// an array of strings representing 8 digit, base 3 numbers
$ops = array();
$numOps = sizeof($input)-1; // always 8
$mask = str_repeat('0', $numOps); // mask of 8 zeros for padding
// generate the ops array
$limit = pow(3, $numOps) -1;
for ($i = 0; $i <= $limit; $i++) {
$s = (string) $i;
$s = base_convert($s, 10, 3);
$ops[] = substr($mask, 0, $numOps - strlen($s)) . $s;
}
// for each element in the ops array, generate an expression by inserting
// '', '*' or '+' between the numbers in $input. e.g. element 11111111 will
// result in 1+2+3+4+5+6+7+8+9
$limit = sizeof($ops);
$stringResult = null;
$numericResult = null;
for ($i = 0; $i < $limit; $i++) {
$l = $numOps;
$stringResult = '';
$numericResult = 0;
for ($j = 0; $j <= $l; $j++) {
$stringResult .= (string) $input[$j];
switch (substr($ops[$i], $j, 1)) {
case '0':
break;
case '1':
$stringResult .= '+';
break;
case '2':
$stringResult .= '*';
break;
default :
}
}
// evaluate the expression
// split the expression into smaller ones to be added together
$temp = explode('+', $stringResult);
$additionElems = array();
foreach ($temp as $subExpressions)
{
// split each of those into ones to be multiplied together
$multplicationElems = explode('*', $subExpressions);
$working = 1;
foreach ($multplicationElems as $operand) {
$working *= $operand;
}
$additionElems[] = $working;
}
$numericResult = 0;
foreach($additionElems as $operand)
{
$numericResult += $operand;
}
if ($numericResult == 2001) {
echo "{$stringResult}\n";
}
}
Further down the same page you linked to.... =)
"Paul Jungwirth wrote:
You have the numbers 123456789, in
that order. Between each number, you
must insert either nothing, a plus
sign, or a multiplication sign, so
that the resulting expression equals
2001. Write a program that prints all solutions. (There are two.)
I think you meant 2002, not 2001. :)
(Just correcting for anyone else like
me who obsessively tries to solve
little "practice" problems like this
one, and then hit Google when their
result doesn't match the stated
answer. ;) Damn, some of those Perl
examples are ugly.)"
The number is 2002.
Recursive solution takes eleven lines of JavaScript (excluding string expression evaluation, which is a standard JavaScript function, however it would probably take another ten or so lines of code to roll your own for this specific scenario):
function combine (digit,exp) {
if (digit > 9) {
if (eval(exp) == 2002) alert(exp+'=2002');
return;
}
combine(digit+1,exp+'+'+digit);
combine(digit+1,exp+'*'+digit);
combine(digit+1,exp+digit);
return;
}
combine(2,'1');

Categories