PHP Huffman Decode Algorithm

PHP Huffman Decode Algorithm - php

I applied for a job recently and got sent a hackerrank exam with a couple of questions.One of them was a huffman decoding algorithm. There is a similar problem available here which explains the formatting alot better then I can.
The actual task was to take two arguments and return the decoded string.
The first argument is the codes, which is a string array like:
[
"a 00",
"b 101",
"c 0111",
"[newline] 1001"
]
Which is like: single character, two tabs, huffman code.
The newline was specified as being in this format due to the way that hacker rank is set up.
The second argument is a string to decode using the codes. For example:
101000111 = bac
This is my solution:
function decode($codes, $encoded) {
$returnString = '';
$codeArray = array();
foreach($codes as $code) {
sscanf($code, "%s\t\t%s", $letter, $code);
if ($letter == "[newline]")
$letter = "\n";
$codeArray[$code] = $letter;
}
print_r($codeArray);
$numbers = str_split($encoded);
$searchCode = '';
foreach ($numbers as $number) {
$searchCode .= $number;
if (isset($codeArray[$searchCode])) {
$returnString .= $codeArray[$searchCode];
$searchCode = '';
}
}
return $returnString;
}
It passed the two initial tests but there were another five hidden tests which it did not pass and gave no feedback on.
I realize that this solution would not pass if the character was a white space so I tried a less optimal solution that used substr to get the first character and regex matching to get the number but this still passed the first two and failed the hidden five. I tried function in the hacker rank platform with white-space as input and the sandboxed environment could not handle it anyway so I reverted to the above solution as it was more elegant.
I tried the code with special characters, characters from other languages, codes of various sizes and it always returned the desired solution.
I am just frustrated that I could not find the cases that caused this to fail as I found this to be an elegant solution. I would love some feedback both on why this could fail given that there is no white-space and also any feedback on performance increases.

Your basic approach is sound. Since a Huffman code is a prefix code, i.e. no code is a prefix of another, then if your search finds a match, then that must be the code. The second half of your code would work with any proper Huffman code and any message encoded using it.
Some comments. First, the example you provide is not a Huffman code, since the prefixes 010, 0110, 1000, and 11 are not present. Huffman codes are complete, whereas this prefix code is not.
This brings up a second issue, which is that you do not detect this error. You should be checking to see if $searchCode is empty after the end of your loop. If it is not, then the code was not complete, or a code ended in the middle. Either way, the message is corrupt with respect to the provided prefix code. Did the question specify what to do with errors?
The only real issue I would expect with this code is that you did not decode the code description generally enough. Did the question say there were always two tabs, or did you conclude that? Perhaps it was just any amount of space and tabs. Where there other character encodings you neeed to convert like [newline]? I presume you in fact did need to convert them, if one of the examples that worked contained one. Did it? Otherwise, maybe you weren't supposed to convert.

I had the same question for an Coding Challenge. with some modification as the input was a List with (a 111101,b 110010,[newline] 111111 ....)
I took a different approach to solve it,using hashmap but still i too had only 2 sample test case passed.
below is my code:
public static String decode(List<String> codes, String encoded) {
// Write your code here
String result = "";
String buildvalue ="";
HashMap <String,String> codeMap= new HashMap<String,String>();
for(int i=0;i<codes.size();i++){
String S= codes.get(i);
String[] splitedData = S.split("\\s+");
String value=splitedData[0];
String key=(splitedData[1].trim());
codeMap.put(key, value);
}
for(int j=0;j<encoded.length();j++){
buildvalue+=Character.toString(encoded.charAt(j));
if(codeMap.containsKey(buildvalue)){
if(codeMap.get(buildvalue).contains("[newline]")){
result+="\n";
buildvalue="";
}
else{
result+=codeMap.get(buildvalue);
buildvalue="";
}
}
}
return result.toString();
}
}

Related

PHP Questions. Loops or If statement?

I am trying to learn PHP while I write a basic application. I want a process whereby old words get put into an array $oldWords = array(); so all $words, that have been used get inserted using array_push(oldWords, $words).
Every time the code is executed, I want a process that finds a new word from $wordList = array(...). However, I don't want to select any words that have already been used or are in $oldWords.
Right now I'm thinking about how I would go about this. I've been considering finding a new word via $wordChooser = rand (1, $totalWords); I've been thinking of using an if/else statement, but the problem is if array_search($word, $doneWords) finds a word, then I would need to renew the word and check it again.
This process seems extremely inefficient, and I'm considering a loop function but, which one, and what would be a good way to solve the issue?
Thanks

I'm a bit confused, PHP dies at the end of the execution of the script. However you are generating this array, could you also not at the same time generate what words haven't been used from word list? (The array_diff from all words to used words).
Or else, if there's another reason I'm missing, why can't you just use a loop and quickly find the first word in $wordList that's not in $oldWord in O(n)?
function generate_new_word() {
foreach ($wordList as $word) {
if (in_array($word, $oldWords)) {
return $word; //Word hasn't been used
}
}
return null; //All words have been used
}
Or, just do an array difference (less efficient though, since best case is it has to go through the entire array, while for the above it only has to go to the first word)
EDIT: For random
$newWordArray = array_diff($allWords, $oldWords); //List of all words in allWords that are not in oldWords
$randomNewWord = array_rand($newWordArray, 1);//Will get a new word, if there are any
Or unless you're interested in making your own datatype, the best case for this could possibly be in O(log(n))

Parsing ridiculous json string in php

I am querying a restful web service (clickbank) I am able to send and receive my requests but (embarrassingly enough), I am having trouble parsing the results to make them readable. When I select XML as the return format what I get back is a 97,000 character long string(yay var_dump).... with no spaces in it... and no delimiter (such as ',' or '/', or even '<', or '>'), separating the values. So, I have selected JSON as the return format. I have been able to decode this herculean string using json_decode and I have deciphered that what I am getting back is an array of 100 objects, each object has 30 vars (ala get_object_vars), but some of these vars are themselves arrays of objects. Forgive my ignorance but any ideas on how I can parse this so that it's in the realm of readable? By the way, I am currently trying my hand at PHP (as that is what we use at the shop where I work). I am mildly retarded when it comes to using Eclipse PDT so any suggestions would be welcome....
P.S. I have the following function that has been helpful in determining what the hay is going on but it still doesn't separate things out like I want
function getInfo($datum) {
switch (gettype($datum)) {
case "object":
$GLOBALS['counts']++;
//var_dump($datum);
echo "<hr>";
//$members = get_object_vars($datum);
//getInfo($members);
break;
case "array":
foreach($datum as $v) {
getInfo($v);
}
//var_dump($datum);
break;
default:
echo "<div>$datum</div>";
}
}
This is a bit of a side question here: down at the bottom of my snippet (see the default condition of switch statement), $datum WAS part of an array (before it was passed back into the function at the final depth of recursion here). Is there a way to elaborate that echo statement so that it shows whatever it's key was? I'm trying to word this in the least confusing way possible, but if that value (probably a string) is the value of some index of an associative array, but without me having to specify any of the possible names of the arrays? (think in_array but without the $haystack argument)

How To Get The Unique Name Count With PHP?

Let's say I have text file Data.txt with:
26||jim||1990
31||Tanya||1942
19||Bruce||1612
8||Jim||1994
12||Brian||1988
56||Susan||2201
and it keeps going.
It has many different names in column 2.
Please tell me, how do I get the count of unique names, and how many times each name appears in the file using PHP?
I have tried:
$counts = array_count_values($item[1]);
echo $counts;
after exploding ||, but it does not work.
The result should be like:
jim-2,
tanya-1,
and so on.
Thanks for any help...

Read in each line, explode using the delimiter (in this case ||), and add it to an array if it does not already exist. If it does, increment the count.
I won't write the code for you, but here a few pointers:
fread reads in a line
explode will split the line based on a delimiter
use in_array to check if the name has been found before, and to determine whether you need to add the name to the array or just increment the count.
Edit:
Following Jon's advice, you can make it even easier for you.
Read in line-by-line, explode by delimiter and dump all the names into an array (don't worry about checking if it already exists). After you're done, use array_count_values to get every unique name and its frequency.

Here's my take on this:
Use file to read the data file, producing an array where each element corresponds to a line in the input.
Use array_filter with trim as the filter function to remove blank lines from this array. This takes advantage that trim returns a string having removed whitespace from both ends of its argument, leaving the empty string if the argument was all whitespace to begin with. The empty string converts to boolean false -- thus making array_filter disregard lines that are all whitespace.
Use array_map with a callback that involves calling explode to split each array element (line of text) into three parts and returning the second of these. This will produce an array where each element is just a name.
Use array_map again with strtoupper as the callback to convert all names to uppercase so that "jim" and "JIM" will count as the same in the next step.
Finally, use array_count_values to get the count of occurrences for each name.
Code, taking things slowly:
function extract_name($line) {
// The -1 parameter (available as of PHP 5.1.0) makes explode return all elements
// but the last one. We want to do this so that the element we are interested in
// (the second) is actually the last in the returned array, enabling us to pull it
// out with end(). This might seem strange here, but see below.
$parts = explode('||', $line, -1);
return end($parts);
}
$lines = file('data.txt'); // #1
$lines = array_filter($lines, 'trim'); // #2
$names = array_map('extract_name', $lines); // #3
$names = array_map('strtoupper', $names); // #4
$counts = array_count_values($names); // #5
print_r($counts); // to see the results
There is a reason I chose to do this in steps where each steps involves a function call on the result of the previous step -- that it's actually possible to do it in just one line:
$counts = array_count_values(
array_map(function($line){return strtoupper(end(explode('||', $line, -1)));},
array_filter(file('data.txt'), 'trim')));
print_r($counts);
See it in action.
I should mention that this might not be the "best" way to solve the problem in the sense that if your input file is huge (in the ballpark of a few million lines) this approach will consume a lot of memory because it's reading all the input in memory at once. However, it's certainly convenient and unless you know that the input is going to be that large there's no point in making life harder.
Note: Senior-level PHP developers might have noticed that I 'm violating strict standards here by feeding the result of explode to a function that accepts its argument by reference. That's valid criticism, but in my defense I am trying to keep the code as short as possible. In production it would be indeed better to use $a = explode(...); return $a[1]; although there will be no difference as regards the result.

While I do feel that this website's purpose is to answer questions and not do homework assignments, I don't acknowledge the assumption that you are doing your homework, since that fact has not been provided. I personally learned how to program by example. We all learn our own ways, so here is what I would do if I were to attempt to answer your question as accurately as possible, based on the information you have provided.
<?php
$unique_name_count = 0;
$names = array();
$filename = 'Data.txt';
$pointer = fopen($filename,'r');
$contents = fread($pointer,filesize($filename));
fclose($pointer);
$lines = explode("\n",$contents);
foreach($lines as $line)
{
$split_str = explode('|',$line);
if(isset($split_str[2]))
{
$name = strtolower($split_str[2]);
if(!in_array($name,$names))
{
$names[] = $name;
$unique_name_count++;
}
}
}
echo $unique_name_count.' unique name'.(count($unique_name_count) == 1 ? '' : 's').' found in '.$filename."\n";
?>

php/as3 regex to split multiple json in one

For example I have this string of 2 json objects:
{"glossary": {"title": "example glossary"}, "aaa": "1212"}{"adada": "faka"}
I want to split it in the array for PHP and Actionscript 3
Array (
[0] => '{"glossary": {"title": "example glossary"}',
[1] => '{"adada": "faka"}'
)
What is the best method to do it.
Edit:
I don't need answer how to parse json. To simplify I need to split
{...{..}....}{....}{........}
into
{...{..}....}
{....}
{........}

either you modify a JSON parser to do it, as Amargosh suggested, or you can make a simple algorithm to do it for you, that skipps through strings and comments and keeps track of open braces. when no open braces are left, then you have a complete value. repeat it until you're out of input.
however my suggestion is to try to solve the problem by talking to whoever is responsible for that output and convince him to generate valid JSON, which is [{"glossary": {"title": "example glossary"}, "aaa": "1212"},{"adada": "faka"}]
edit: to separate the JSON objects, you need to prefix every JSON object with a length (4 byte should suffice). then you read off the socket until you have the right amount of chars. the next characters will again be the length of the next object. it is imperative that you do this, because TCP works in packets. Not only can you have multiple JSON-objects in one single packet, but you can have one JSON-object split between two packets.
Also, a couple of advises:
do not use PHP for socket servers. it's not made for that. have a look at Haxe, specifically the neko backend.
do not write this kind of stuff on your own unless you really need to. it's boring and dumb work. there are virtually millions of solutions for socket servers. you should also have a look at Haxe remoting that allows transparent communication between e.g. a flash client and a neko socket server. also, please have a look at smartfox and red5.
edit2: you're underestimating the problem and your approach isn't good. you should build a robust solution, so the day you want to send arrays over the wire you won't have a total breakdown, because your "splitter" breaks, or your JSON parser is fed incomplete objects, because only half the object is read. what you want to do can be easily done: split the input using "}{", append "}" to any element but the last and prepend "{" to any element but the first. Nonetheless I heavily suggest you refrain from such an approach, because you will regret it at some point. if you really think you should do thinks like these on your own, then try to at least do them right.

Sorry for posting on an old question but I just ran into this question when googling for a similar problem and I think I have a pretty easy little solution for the specific problem that was defined in the question.
// Convert original data to valid JSON array
$json = '['.str_replace('}{', '},{', $originalData).']';
$array = json_decode($json, true);
$arrayOfJsonStrings = array();
foreach ($array as $data) {
$arrayOfJsonStrings[] = json_encode($data);
}
var_dump($arrayOfJsonStrings);
I think this should work as long as your JSON data doesn't contain strings that might include '}{'.

Regex can't handle it. If you don't actually need a JSON parser, writing a simple parsing function should do it.
Something like this would do:
function splitJSONString($json) {
$objects = array();
$depth = 0;
$current = '';
for ($i = 0; $char = substr($json, $i, 1); $i++) {
$current .= $char;
if ($char == '{') {
$depth += 1;
} else if ($char == '}' && $depth > 0) {
$depth -= 1;
if ($depth == 0) {
array_push($objects, $current);
$current = '';
}
}
}
return $objects;
}

Regex is not the right tool for this - use a JSON Parser

Regex Match PHP Comment

Ive been trying to match PHP comments using regex.
//([^<]+)\r\n
Thats what ive got but it doesn't really work.
Ive also tried
//([^<]+)\r
//([^<]+)\n
//([^<]+)
...to no avail

In what program are you coding this regex? Your final example is a good sanity check if you're worried that the newline chars aren't working. (I have no idea why you don't allow less-than in your comment; I'm assuming that's specific to your application.)
Try
//[^<]+
and see if that works. As Draemon says, you might have to escape the diagonals. You might also have to escape the parentheses. I can't tell if you know this, but parentheses are often used to enclose capturing groups. Finally, check whether there is indeed at least one character after the double slashes.

To match comments, you have to think there are two types of comments in PHP 5 :
comments which start by // and go to the end of the line
comments that start by /* and go to */
Considering you have these two lines first :
$filePath = '/home/squale/developpement/astralblog/website/library/HTMLPurifier.php';
$str = file_get_contents($filePath);
You could match the first ones with :
$matches_slashslash = array();
if (preg_match_all('#//(.*)$#m', $str, $matches_slashslash)) {
var_dump($matches_slashslash[1]);
}
And the second ones with :
$matches_slashstar = array();
if (preg_match_all('#/\*(.*?)\*/#sm', $str, $matches_slashstar)) {
var_dump($matches_slashstar[1]);
}
But you will probably get into troubles with '//' in the middle of string (what about heredoc syntax, btw, did you think about that one ? ), or "toggle comments" like this :
/*
echo 'a';
/*/
echo 'b';
//*/
(Just add a slash at the begining to "toggle" the two blocks, if you don't know the trick)
So... Quite hard to detect comments with only regex...
Another way would be to use the PHP Tokenizer, which, obviously, knows how to parse PHP code and comments.
For references, see :
token_get_all
List of Parser Tokens
With that, you would have to use the tokenizer on your string of PHP code, iterate on all the tokens you get as a result, and detect which ones are comments.
Something like this would probably do :
$tokens = token_get_all($str);
foreach ($tokens as $token) {
if ($token[0] == T_COMMENT
|| $token[0] == T_DOC_COMMENT) {
// This is a comment ;-)
var_dump($token);
}
}
And, as output, you'll get a list of stuff like this :
array
0 => int 366
1 => string '/** Version of HTML Purifier */' (length=31)
2 => int 57
or this :
array
0 => int 365
1 => string '// :TODO: make the config merge in, instead of replace
' (length=55)
2 => int 117
(You "just" might to strip the // and /* */, but that's up to you ; at least, you have extracted the comments ^^ )
If you really want to detect comments without any kind of strange error due to "strange" syntax, I suppose this would be the way to go ;-)

You probably need to escape the "//":
\/\/([^<]+)

This will match comments in PHP (both /* */ and // format)
/(\/\*).*?(\*\/)|(\/\/).*?(\n)/s
To get all matches, use preg_match_all to get array of matches.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.