Decompressing Tiled TMX file contents with PHP

Decompressing Tiled TMX file contents with PHP - php

I am having problems extracting the layer contents from a .tmx (Tiled) file.
I would like to get the complete uncompressed data in PHP and make a little image of it.
Getting the header information like width, height and so on is no problem - SimpleXML is doing its job there. But somehow decompressing of the tile layer is not working.
The data itself is stored as a base64 and gzip encoded string (sth like H4sIAAAAAAAAC+3bORKAIBQEUVzuf2YTTSwEA/gL00EnJvJQsAjcSyk7EU3v+Jn3OI) but I am having problems even getting the base64 decoded code (it just gives me wierd characters and when i reopened the map in tiled and saved it as "base64 uncompressed" the result was just an empty string - not using gzip decompressing of course).
I already searched through the web and saw how the data is exactly compressed (Github article). It seems like i have to use the gzinflate() command instead of all the others (e.g. gzuncompress), but this is also not working for me.
The code i have now is the following:
<?php
// Get the raw xml data
$map_xml = new SimpleXML(file_get_contents("map.tmx"));
$data = $map_xml["layer"][0]["data"]["#content"]; // I would make a loop here
$content =gzinflate(base64_decode($map_content)); // gives me and error
var_dump($data); // results in nothing
?>
After some more research I found out that I should use a zlib filter (php.net article).
Now I was really confused I don't know what I should pick - I asked google again and got the following: Compressing with Java Decompressing with PHP. According to the answer I have to crop our the header before using the base64 and gzip methods.
Now my questions: Do I have to crop out the header before? If yes, how do I do that?
If not, how can I get the uncompressed data then?
I really hope that someone can help me in here!

Php's gzinflate and gzuncompress are, as previously noted, incorrectly named. However, we can take advantage of gzinflate which accepts raw compressed data. The gzip header is 10 bytes long which can be stripped off using substr. Using your example above I tried this:
$base64content = "H4sIAAAAAAAAC+3bORKAIBQEUVzuf2YTTSwEA/gL00EnJvJQsAjcSyk7EU3v+Jn3OI";
$compressed = substr( base64_decode($base64content), 10);
$content = gzinflate($compressed);
This gives you a string representing the raw data. Your TMX layer consists mostly of gid 0, 2, and 3 so you'll only see whitespace if you print it out. To get helpful data, you'll need to call ord on the characters:
$chars = str_split($content);
$values = array();
foreach($chars as $char) {
$values[] = ord($char);
}
var_dump( implode(',', $values) ); // This gives you the equivalent of saving your TMX file with tile data stored as csv
Hope that helps.

Wow, these PHP functions are horribly named. Some background first.
There are three formats you are likely to encounter or be able to produce. They are:
Raw deflate, which is data compressed to the deflate format with no header or trailer, defined in RFC 1951.
zlib, which is raw deflate data wrapped in a compact zlib header and trailer which consists of a two-byte header and a four-byte Adler-32 check value as the trailer, defined in RFC 1950.
gzip, which is raw deflate data wrapped in a gzip header and trailer where the header is at least ten bytes, and can be longer containing a file name, comments, and/or an extra field, and an eight-byte trailer with a four-byte CRC-32 and a the uncompressed length module 2^32. This wrapper is defined in RFC 1952. This is the data you will find in a file with the suffix .gz.
The PHP functions gzdeflate() and gzinflate() create and decode the raw deflate format. The PHP functions gzcompress() and gzuncompress() create and decode the zlib format. None of these functions should have "gz" in the name, since none of them handle the gzip format! This will forever be confusing to PHP coders trying to create or decode gzip-formatted data.
There seem to be (but the documentation is not clear if they are always there) PHP functions gzencode() and gzdecode() which, if I am reading the terse documentation correctly, by default create and decode the gzip format. gzencode() also has an option to produce the zlib format, and I suspect that gzdecode() will attempt to automatically detect the gzip or zlib format and decode accordingly. (That is a capability that is part of the actual zlib library that all of these functions use.)
The documentation for zlib_encode() and zlib_decode() is incomplete (where those pages admit: "This function is currently not documented; only its argument list is available"), so it is difficult to tell what they do. There is an undocumented encoding string parameter for zlib_encode() that presumably would allow you to select one of the three formats, if you knew what to put in the string. There is no encoding parameter for zlib_decode(), so perhaps it tries to auto-detect among the three formats.

I know this is old now, but I've literally spent all day playing with this code.
It's been really picky about what I do. However, here's a quick function to turn TMX files into an array of IDs for each tile on each layer.
Credits go to the other answerers who helped me piece together where I was going wrong.
<?php
function getLayer($getLayerName = '')
{
$xml = simplexml_load_file('level.tmx');
$values = array();
foreach($xml->layer as $child)
{
$name = $child->attributes()->name;
if(!empty($getLayerName))
if($name != $getLayerName)
continue;
$data = gzinflate(substr(base64_decode(trim($child->data)), 10));
$chars = str_split($data);
$i = 0;
foreach($chars as $char)
{
$charID = ord($char);
if($i % 4 == 0) // I'm only interested in the tile IDs
{
$values[(String) $name][] = $charID;
}
$i++;
}
}
return $values;
}
print_r(getLayer());
//or you could use getLayer('LayerName') to get a single layer!
?>
On my example 3x3 map, with only one tile image, I get the following:
Array
(
[floor] => Array
(
[0] => 1
[1] => 1
[2] => 1
[3] => 1
[4] => 1
[5] => 1
[6] => 1
[7] => 1
[8] => 1
)
[layer2] => Array
(
[0] => 0
[1] => 0
[2] => 1
[3] => 0
[4] => 1
[5] => 0
[6] => 1
[7] => 1
[8] => 0
)
)
Hopefully this function proves handy for anyone out there who needs it.

Related

string contain many '\0' after inflate

I try to decompress blocks of data which were compressed with zlib and author made remarks that for decompress i must use inflate_init and inflate with Z_SYNC_FLUSH. I sure that this must work because that works on php in this way :
$temp = substr($temp, 2, -4);
$temp{0} = chr(ord($temp{0}) | 1);
$temp = gzinflate($temp);
but i ckecked many method for decompress this on C++ and every time fail.
Here is one of them :
char compressedblockbuffer[3371];
char uncompressedblockbuffer[8192];
is.read(compressedblockbuffer, 3371);
z_stream strm;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.avail_in = 3371;
strm.next_in = (Bytef *)compressedblockbuffer;
strm.avail_out = 8192;
strm.next_out = (Bytef *)uncompressedblockbuffer;
inflateInit(&strm);
inflate(&strm, Z_SYNC_FLUSH);
inflateEnd(&strm);
It's not full code, just example to show problem and thats why i specified already known sizes.
I use last zlib realize so may be something change in the zlib inflate since 2003-2004 years?
So the result is :
So seems that uncompressedblockbuffer contains '\0' at the 2,3,4 indexes and many other and if i print this to console i just see two first elements.
UPD:

If gzinflate() in PHP works on the data, then your code won't. gzinflate() expects raw deflate data. Your code is looking for zlib-wrapped deflate data. If you want to decode raw deflate data, you need to use inflateInit2(&strm, -15) instead.
Your call to inflate() is likely returning an error that you are not checking for. You need to always check the return codes of the zlib routines, or for that matter any function that has the potential to return an error.

What kind of data are you decompressing? Many binary formats are perfectly accepting of NUL bytes in their data, since it just reads as a value of 0. For example, inside of image data in many formats, it'd just represent a value of 0 in either that channel or pixel (depending on data size). Not to mention, binary formats don't necessarily read as bytes. A NUL byte may actually be a part of a 2- or 4-byte value.
This is the problem with trying to read binary data as a character string. Binary data needn't follow the rules of text. This is why usually the data boundary is a separate size value, because it can't terminate on NUL values like text.
If you have the original uncompressed data for comparison, either load that data into memory and compare the data, or save the decompressed data to a file and use a diff tool to do a binary comparison of the files.

Part of EXIF data read by exif_read_data() is corrupted

When I read the EXIF data from a raw file with exif_read_data() a lot of the data gets corrupted. Or so I think.
The file I'm trying to read is a DNG Raw file from a Pentax K-x camera.
Here is a demo: http://server.patrikelfstrom.se/exif/?file=_IGP6211.DNG
(I've added a standard JPEG from a Canon EOS 1000D as comparison)
I get no errors on this site and it seems to include data that exif_read_data() doesn't return.
http://regex.info/exif.cgi
And the corrupt data I'm talking about is: ...”¯/Ñ³f/ÇZ/íÔ.ƒ.9:./<ñ.TÛ¨.zâh!o†!™˜...
And: UndefinedTag:0xC65A
The server is running PHP version 5.5.3

Just because the data isn't human readable doesn't mean it's garbage.
Those values that you're seeing are private EXIF fields which are left up to the implementer to determine. They could be binary data, they could be text, they could be anything. This listing can help you determine what some of those values are.
For example, tag 0xC634 is DNGPrivateData which is data specifically for programs that deal with DNG files.

You can map the undefined tags to what they most likely are using this file:
https://github.com/peterhudec/image-metadata-cruncher/blob/master/includes/exif-mapping.php
It looks like your script is dying on 0xc634 => 'SR2Private'
Looking here http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Pentax.html it looks like it is used to store information about the flash on the camera? I don't know for sure, but it probably is not imporant information, and probably not meant to be viewed in text format.
I would probably just make a list of what keys it seems to die on, loop through the exif data, see if it starts with undefinedkey: and either rename the key to the mapped one, or unset those items:
$bad_keys = array('0xc634', ..., '0xc723');
foreach ( $exif as $key => $value ) {
if ( strtolower( substr( $key, 0, 13 ) ) == 'undefinedtag:' ) {
//use the file with the map of undefined tags
//either change the key, or unset it if it's one
//that seems to be corrupt
}
}

Parsing a very hectic space delimited file

I'm trying to help my dad out -- he gave me an export from a scheduling application at his work. We are trying to see if we can import it into a mysql database so he/co-workers can collaborate online with it.
I've tried a number of different methods but none seem to work right -- and this is not my area of specialties.
Export can be seen here: http://roikingon.com/export.txt
Any help / advice on how to go about parsing this would be greatly appreciated!
Thanks !!

I've made an attempt to write a (somewhat dynamic) fixed-with-column parser. Take a look: http://codepad.org/oAiKD0e7 (it's too long for SO, but it's mostly just "data").
What I've noticed
Text-Data is left aligned with padding on the right like "hello___" (_ = space)
Numerical data is right aligned with padding on the left "___42"
If you want to use my code there's yet stuff to do:
The record types 12.x have variable column count (after some static columns), you'd have to implement another "handler" for it
Some of my width's are most probably wrong. I think there is a system (like numbers are 4 characters long and text 8 characters long, with some variations for special cases). Someone with domain knowledge and more than one sample file could figure out the columns.
Getting the raw-data out is only the first step, you have to map the raw-data to some useful model and write that model to the database.

With that file structure you're basically in need of reverse engineering a proprietary format. Yes, it is space delimited but the format does not follow any kind of standard like CSV, YAML etc. It is completely proprietary with what seems to be a header and separate section with headers of their own.
I think your best bet is to try and see if there's some other type of export that can be done such as Excel or XML and working from there. If there isn't then see if there's an html output of some kind that can be screen scraped, and pasted into Excel and seeing what you get.
Due to everything I mentioned above it will be VERY difficult to massage the file in its current form into something that can be sensibly imported into a database. (Note that from the file structure a number of tables would be needed.)

you can use split with a regular expression (zero or more spaces).
I will try and let you know.
There doesnt seem to be a structure with you data.
$data = "12.1 0 1144713 751 17 Y 8 517 526 537 542 550 556 561 567 17 ";
$arr = preg_split("/ +/", $data);
print_r($arr);
Array
(
[0] => 12.1
[1] => 0
[2] => 1144713
[3] => 751
[4] => 17
[5] => Y
[6] => 8
[7] => 517
[8] => 526
[9] => 537
[10] => 542
[11] => 550
[12] => 556
[13] => 561
[14] => 567
[15] => 17
[16] =>
)
Try this preg_split("/ +/", $data); which splits the line by zero or more spaces, then you will have a nice array, that you can process. But looking at your data, there is no structure, so you will have to know which array element corresponds to what data.
Good luck.

Open it with excel and save it as comma-delimited. Treat consecutive delimiters as one, or not. Then resave it with excel as a csv, which will be comma-separated and easier to import to mysql.
EDIT:
The guy who says to use preg_split on "[ +]" is giving you essentially the same answer as I just did above.
The question is what to do after that, then.
Have you determined yet how many "row types" there are? Once you've determined that and defined their characteristics it will be a lot easier to write some code to go through it.
If you save it in csv, you can use the PHP fgetcsv function and related functions. For each row, you would check it's type and perform operations depending on the type.
I noticed that your data rows could possibly be divided on whether or not the first column's data contains a "." so here's an example of how you might loop through the file.
while($row = fgetcsv($file_handle)) {
if(strpos($row[0],'.') === false) {
// do something
} else {
// do something else
}
}
"do something" would be something like "CREATE TABLE table_$row[0]" or "INSERT INTO table" etc.
Ok, and here's some more observation:
Your file is really like multiple files glued together. It contains multiple formats. Notice all the rows starting with "4" next have a 4-letter company abbreviation followed by full company name. One of them is "caco". If you search for "caco", you find it in multiple "tables" within the file.
I also notice "smuwtfa" (days of the week) sprinkled around.
Use clues like that to determine the logic of how to treat each row.

Parse large XML file over FTP

I need to parse a large XML file (>1 GB) which is located on a FTP server. I have a FTP stream aquired by ftp_connect(). (I use this stream for other FTP-related actions)
I know XMLReader is preferred for large XML files, but it will only accept a URI. So I assume a stream wrapper will be required. And the only ftp-function I know of which will allow me to retrieve only a small part of the file is ftp_nb_fget() in combination with ftp_nb_continue().
However, I do not know how I should put all of this together to make sure that a minimum amount of memory is used.

It looks like you may need to build on top of the low-level XML parser bits.
In particular, you can use xml_parse to process XML one chunk of the XML string at a time, after calling the various xml_set_* functions with callbacks to handle elements, character data, namespaces, entities, and so on. Those callbacks will be triggered whenever the parser detects that it has enough data to do so, which should mean that you can process the file as you read it in arbitrarily-sized chunks from the FTP site.
Proof of concept using CLI and xml_set_default_handler, which will get called for everything that doesn't have a specific handler:
php > $p = xml_parser_create('utf-8');
php > xml_set_default_handler($p, function() { print_r(func_get_args()); });
php > xml_parse($p, '<a');
php > xml_parse($p, '>');
php > xml_parse($p, 'Foo<b>Bar</b>Baz');
Array
(
[0] => Resource id #3
[1] => <a>
)
Array
(
[0] => Resource id #3
[1] => Foo
)
Array
(
[0] => Resource id #3
[1] => <b>
)
Array
(
[0] => Resource id #3
[1] => Bar
)
Array
(
[0] => Resource id #3
[1] => </b>
)
php > xml_parse($p, '</a>');
Array
(
[0] => Resource id #3
[1] => Baz
)
Array
(
[0] => Resource id #3
[1] => </a>
)
php >

This will depend on the schema of your XML file. But if it's something similar to RSS in that it's really just a long list of items (all encapsulated in a tag), then what I've done is to parse out the individual sections, and parse them as individual domdocuments:
$buffer = '';
while ($line = getLineFromFtp()) {
$buffer .= $line;
if (strpos($line, '</item>') !== false) {
parseBuffer($buffer);
$buffer = '';
}
}
That's pseudo code, but it's a light way of handling a specific type of XML file without building your own XMLReader. You'd of course need to check for opening tags as well, to ensure that the buffer is always a valid xml file.
Note that this won't work with all XML types. But if it fits, it's a easy and clean way of doing it while keeping your memory footprint as low as possible...

Hmm, I never tried that with FTP, but setting the Stream Context can be done with
libxml_set_streams_context — Set the streams context for the next libxml document load or write
Then just put in the FTP URI in open().
EDIT: Note that you can use the Stream Context for other actions as well. If you are uploading files, you can probably use the same stream context in combination with file_put_contents, so you dont necessarily need any of the ftp* functions at all.

fgetcsv() drops characters with diacritics (i.e. non-ASCII) - how to fix?

Similar questions:
Some characters in CSV file are not read during PHP fgetcsv() ,
fgetcsv() ignores special characters when they are at the beginning of line
My application has a form where the users can upload a CSV file (its 5 internal users have always uploaded a valid file - comma-delimited, quoted, records end by LF), and the file is then imported into a database using PHP:
$fhandle = fopen($uploaded_file,'r');
while($row = fgetcsv($fhandle, 0, ',', '"', '\\')) {
print_r($row);
// further code not relevant as the data is already corrupt at this point
}
For reasons I cannot change, the users are uploading the file encoded in the Windows-1250 charset - a single-byte, 8-bit character encoding.
The problem: and some (not all!) characters beyond 127 ("extended ASCII") are dropped in fgetcsv(). Example data:
"15","Ústav"
"420","Špičák"
"7","Tmaň"
becomes
Array (
0 => 15
1 => "stav"
)
Array (
0 => 420
1 => "pičák"
)
Array (
0 => 7
1 => "Tma"
)
(Note that č is kept, but Ú is dropped)
The documentation for fgetcsv says that "since 4.3.5 fgetcsv() is now binary safe", but looks like it isn't. Am I doing something wrong, or is this function broken and I should look for a different way to parse CSV?

It turns out that I didn't read the documentation well enough - fgetcsv() is only somewhat binary-safe. It is safe for plain ASCII < 127, but the documentation also says:
Note:
Locale setting is taken into account
by this function. If LANG is e.g.
en_US.UTF-8, files in one-byte
encoding are read wrong by this
function
In other words, fgetcsv() tries to be binary-safe, but it's actually not (because it's also messing with the charset at the same time), and it will probably mangle the data it reads (as this setting is not configured in php.ini, but rather read from $LANG).
I've sidestepped the issue by reading the lines with fgets (which works on bytes, not characters) and using a CSV function from the comment in the docs to parse them into an array:
$fhandle = fopen($uploaded_file,'r');
while($raw_row = fgets($fhandle)) { // fgets is actually binary safe
$row = csvstring_to_array($raw_row, ',', '"', "\n");
// $row is now read correctly
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.