I've played around with pack() for longer than I'd like to admit, and accidentally stumbled across data structure alignment. Is there a "good" way to create C structures from within PHP and account for these extra padding bytes?
I know that you can use "x" to inject NULL bytes, but how do you determine where these go and how many to use programmatically?
Along the same lines, which of these methods is better for modeling a char foo[10] array?
$struct = pack('a9x', $string);
// vs.
$struct = pack('a10', substr($string, 0, 9));
I believe they both accomplish the same thing, but I'm not sure of any potential pitfalls.
What you try to do is achievable, but highly dangerous. Let me explain:
PHP code is largely independant from the Hardware/OS/C runtime it runs on
(Portable) C source code mostly is also
The compilation/linking/loading process of a C binary definitly is not (especially if you leave the padding/packing issues to the compiler)
Now what you try to do in PHP is a partial reimplementation of exactly this compilation/linking/loading process, but without a reliable mechanism to determine to adapt to the environment.
I recommend one of two IMHO more reliable ways to do this:
Write a PHP extension (this is rather easy), that does the structure-dependant stuff for you, including the headers from your C program. If you compile this on the target system (or one with an equal environment) you will not run into the troubles above
Chose another, higher level mechanism to communicate: We have successfully used SysV messages for a very similar requirement, but there are plenty of ways to go.
Related
I am running Windows 7 - 64 bit, with the latest XAMPP version that has a 32-bit PHP version.
On testing http://php.net/manual/en/function.fseek.php#112647
for a very big file (bigger than PHP_MAX_INT 2147483647) I'm now pretty sure, that the consecutively following fseeks are summed up before being executed on the filepointer.
I have two questions:
Could I break up this summing up with reasonable means (or only with the workaround mentioned in the link above)?
Is this aggregation happening in PHP (as I assume, though I don't know where in PHP) or in Windows 7?
Answering myself: Trying two workarounds with multiple seeks didn't work
on my system. Instead they put the filepointer to different positions
at under PHP_MAX_INT. (32-bit PHP only can seek up to PHP_MAX_INT +
8192. Reading from there on is still possible, but I don't know how far.)
Therefore the question is obsolete for my specific case, as
32-bit PHP only can seek up to PHP_MAX_INT + 8192, whatever you do. I
leave the question, because two people voted it up, and might be
interested in a general answer.
I filed a bug report here:
https://bugs.php.net/bug.php?id=69213
Result: With a 64-bit PHP build it might work, but I didn't try it.
It doesn't. It actually does something even dumber. Here's a snippet from the PHP source code:
switch(whence) {
case SEEK_CUR:
offset = stream->position + offset;
whence = SEEK_SET;
break;
}
This is in the guts of the implementation for PHP's fseek. What's happening here is: if you tell PHP to seek from the current position, it translates that to an "equivalent" seek from the start of the file. This only works when that offset computation doesn't overflow; if it does, well, offset is a signed integer, so that's undefined behavior.
And, okay, this is there because PHP buffers streams internally, so they need to do something. But it doesn't have to be this.
You're probably best off trying to do your work in a language that actually does what you tell it to.
If aggregation were to happen it would likely have to be as an opcode optimization or would have to occur at the low level via a buffer.
I can answer at the low level. fseek() in php is implemented using php streams. It is declared in ext/standard/file.h and defined in .c. Its implementation calls php_stream_seek() which calls through to _php_stream_seek() in streams.c. The low level implementation of this is handled through the plain streams wrapper, in which case seek calls through to either zend_seek or zend_fseek, which in turn just map through to either 32 or 64-bit seek _seeki64 c calls.
So... if any aggregation happens, it would seem to have to be in the opcode optimizations or even further below in the OS or hardware. Hard disks implement out-of-order fetching to reduce head seek distances and filesystem buffering systems might be able to reduce seeks that have no side-effects. If you are concerned about disk read time, the first automatically handles this. If you are concerned with perhaps thrashing memory (seeking great distances unnecessarily in the buffer) you might considered another approach. See: http://www.cs.iit.edu/~cs561/cs450/disksched/disksched.html for more info on how disks avoid wasting seek time.
I hope this helps.
I'm creating a web application that does some very heavy floating point arithmetic calculations, and lots of them! I've been reading a lot and have read you can make C(and C++) functions and call them from within PHP, I was wondering if I'd notice a speed increase by doing so?
I would like to do it this way even if it's only a second difference, unless it's actually slower.
It all depends on the actual number of calculations you are doing. If you have thousands of calculations to do then certainly it will be worthwhile to write an extension to handle it for you. In particular, if you have a lot of data this is where PHP really fails: it's memory manager can't handle a lot of objects, or large arrays (based on experience working with such data).
If the algorithm isn't too difficult you may wish to write it in PHP first anyway. This gives you a good reference speed but more importantly it'll help define exactly what API you need to implement in a module.
Update to "75-100 calculations with 6 numbers".
If you are doing this only once per page load I'd suspect it won't be a significant part of the overall load time (depends what else you do of course). If you are calling this function many times then yes, even 75 ops might be slow -- however since you use only 6 variables perhaps their optimizer will do a good job (whereas with 100 variables it's pretty much guaranteed not to).
Check SWIG.
Swig is a way to make php (and other languages) modules from your C sources rather easily.
Can any body give me a a introduction of how to program efficiently minimizing memory usage in PHP program correctly and generate my program results using minimum memory ?
Based on how I read your question, I think you may be barking up the wrong tree with PHP. It was never designed for a low memory overhead.
If you just want to be as efficient as possible, then look at the other answers. Remember that every single variable costs a fair bit of memory, so use only what you have to, and let the garbage collector work. Make sure that you only declare variables in a local scope so they can get GC'd when the program leaves that scope. Objects will be more expensive than scalar variables. But the biggest common abuse I see are multiple copies of data. If you have a large array, operate directly on it rather than copying it (It may be less CPU efficient, but it should be more memory efficient).
If you are looking to run it in a low memory environment, I'd suggest finding a different language to use. PHP is nice because it manages everything for you (with respect to variables). But that type coersion and flexibility comes at a price (speed and memory usage). Each variable requires a lot of meta-data stored with it. So an 8 byte int (32 bit) would take 8 bytes to store in C, it will likely take more than 64 bytes in PHP (because of all of the "tracking" information associated with it such as type, name, scoping information, etc). That overhead is normally seen as ok since PHP was not designed for large memory loads. So it's a trade-off. More memory used for easier programming. But if you have tight memory constraints, I'd suggest moving to a different language...
It's difficult to give advice with so little information on what you're trying to do and why memory utilization is a problem. In the common scenarios (web servers that serve many requests), memory is not a limiting factory and it's preferable to serve the requests as fast as possible, even if this means sacrificing memory for speed.
However, the following general guidelines apply:
unset your variables as soon as you don't need them. In a program that's well written, this, however, won't have a big impact, as variables going out of scope have the same effect.
In long running scripts, with lot's of variables with circular references, and if using PHP 5.3, trey calling the garbage collector explicitly in certain points.
First of all: Don't try to optimize memory usage by using references. PHP is smart enough not to copy the contents of a variable if you do something like this:
$array = array(1,2,3,4,5,);
$var = $array;
PHP will only copy the contents of the variable when you write to it. Using references all the time because you think they will save you copying the variable content can often fire backwards ;)
But, I think your question is hard to answer, as long as you are more precise.
For example if you are working with files it can be recommendable not always to file_get_contents() the whole file, but use the f(open|...) functions to load only small parts of the file at once or even skip whole chunks.
Or if you are working with strings make use of functions which return a string offset instead of the rest of a string (e.g. strcspn instead of strpbrk) when possible.
I have a data table with 600,000 records that is around 25 megabytes large. It is indexed by a 4 byte key.
Is there a way to find a row in such dataset quickly with PHP without resorting to MySQL?
The website in question is mostly static with minor PHP code and no database dependencies and therefore fast. I would like to add this data without having to use MySQL if possible.
In C++ I would memory map the file and do a binary search in it. Is there a way to do something similar in PHP?
PHP (at least 5.3) should already be optimized to use mmap if it's available and it is likely advantageous. Therefore, you can use the same strategy you say you would use with C++:
Open a stream with fopen
Move around for your binary search with fseek and fread
EDIT: actually, it seems to use mmap only in some other circumstances like file_get_contents. It shouldn't matter, but you can also try file_get_contents.
I would suggest memcachedb or something similar. If you are going to handle this entirely in PHP the script will have to read the entire file/datastruct for each request. It's not possible to do this in reasonable time dynamically.
In C++, would you stop and start the application each time a user wanted to view the file in a different way, therefore loading and unloading the file? Probably not, but that is how php is different than an application, and application programming languages.
PHP has tools to help you deal with the environment teardown/buildup. These tools are the database and/or keyed caching utilities like memcache. Use the right tool for the right job.
I would like to implement Singular Value Decomposition (SVD) in PHP. I know that there are several external libraries which could do this for me. But I have two questions concerning PHP, though:
1) Do you think it's possible and/or reasonable to code the SVD in PHP?
2) If (1) is yes: Can you help me to code it in PHP?
I've already coded some parts of SVD by myself. Here's the code which I made comments to the course of action in. Some parts of this code aren't completely correct.
It would be great if you could help me. Thank you very much in advance!
SVD-python
Is a very clear, parsimonious implementation of the SVD.
It's practically psuedocode and should be fairly easy to understand
and compare/draw on for your php implementation, even if you don't know much python.
SVD-python
That said, as others have mentioned I wouldn't expect to be able to do very heavy-duty LSA with php implementation what sounds like a pretty limited web-host.
Cheers
Edit:
The module above doesn't do anything all by itself, but there is an example included in the
opening comments. Assuming you downloaded the python module, and it was accessible (e.g. in the same folder), you
could implement a trivial example as follow,
#!/usr/bin/python
import svd
import math
a = [[22.,10., 2., 3., 7.],
[14., 7.,10., 0., 8.],
[-1.,13.,-1.,-11., 3.],
[-3.,-2.,13., -2., 4.],
[ 9., 8., 1., -2., 4.],
[ 9., 1.,-7., 5.,-1.],
[ 2.,-6., 6., 5., 1.],
[ 4., 5., 0., -2., 2.]]
u,w,vt = svd.svd(a)
print w
Here 'w' contains your list of singular values.
Of course this only gets you part of the way to latent semantic analysis and its relatives.
You usually want to reduce the number of singular values, then employ some appropriate distance
metric to measure the similarity between your documents, or words, or documents and words, etc.
The cosine of the angle between your resultant vectors is pretty popular.
Latent Semantic Mapping (pdf)
is by far the clearest, most concise and informative paper I've read on the remaining steps you
need to work out following the SVD.
Edit2: also note that if you're working with very large term-document matrices (I'm assuming this
is what you are doing) it is almost certainly going to be far more efficient to perform the decomposition
in an offline mode, and then perform only the comparisons in a live fashion in response to requests.
while svd-python is great for learning, the svdlibc is more what you would want for such heavy
computation.
finally as mentioned in the bellegarda paper above, remember that you don't have to recompute the
svd every single time you get a new document or request. depending on what you are trying to do you could
probably get away with performing the svd once every week or so, in an offline mode, a local machine,
and then uploading the results (size/bandwidth concerns notwithstanding).
anyway good luck!
Be careful when you say "I don't care what the time limits are". SVD is an O(N^3) operation (or O(MN^2) if it's a rectangular m*n matrix) which means that you could very easily be in a situation where your problem can take a very long time. If the 100*100 case takes one minute, the 1000*1000 case would 10^3 minutes, or nearly 17 hours (and probably worse, realistically, as you're likely to be out of cache). With something like PHP, the prefactor -- the number multiplying the N^3 in order to calculate the required FLOP count, could be very, very large.
Having said that, of course it's possible to code it in PHP -- the language has the required data structures and operations.
I know this is an old Q, but here's my 2-bits:
1) A true SVD is much slower than the calculus-inspired approximations used, eg, in the Netflix prize. See: http://www.sifter.org/~simon/journal/20061211.html
There's an implementation (in C) here:
http://www.timelydevelopment.com/demos/NetflixPrize.aspx
2) C would be faster but PHP can certainly do it.
PHP Architect author Cal Evans: "PHP is a web scripting language... [but] I’ve used PHP as a scripting language for writing the DOS equivalent of BATCH files or the Linux equivalent of shell scripts. I’ve found that most of what I need to do can be accomplished from within PHP. There is even a project to allow you to build desktop applications via PHP, the PHP-GTK project."
Regarding question 1: It definitely is possible. Whether it's reasonable depends on your scenario: How big are your matrices? How often do you intend to run the code? Is it run in a web site or from the command line?
If you do care about speed, I would suggest writing a simple extension that wraps calls to the GNU Scientific Library.
Yes it's posible, but implementing SVD in php ins't the optimal approach. As you can see here PHP is slower than C and also slower than C++, so maybe it was better if you could do it in one of this languages and call them as a function to get your results. You can find an implementation of the algorithm here, so you can guide yourself trough it.
About the function calling can use:
The exec() Function
The system function is quite useful and powerful, but one of the biggest problems with it is that all resulting text from the program goes directly to the output stream. There will be situations where you might like to format the resulting text and display it in some different way, or not display it at all.
The system() Function
The system function in PHP takes a string argument with the command to execute as well as any arguments you wish passed to that command. This function executes the specified command, and dumps any resulting text to the output stream (either the HTTP output in a web server situation, or the console if you are running PHP as a command line tool). The return of this function is the last line of output from the program, if it emits text output.
The passthru() Function
One fascinating function that PHP provides similar to those we have seen so far is the passthru function. This function, like the others, executes the program you tell it to. However, it then proceeds to immediately send the raw output from this program to the output stream with which PHP is currently working (i.e. either HTTP in a web server scenario, or the shell in a command line version of PHP).
Yes. this is perfectly possible to be implemented in PHP.
I don't know what the reasonable time frame for execution and how large it can compute.
I would probably have to implement the algorithm to get a rought idea.
Yes I can help you code it. But why do you need help? Doesn't the code you wrote work?
Just as an aside question. What version of PHP do you use?