i wrote something that i tend to call a "parser" (no idea if that's the correct term)
all it currently does is grab a local lua file that has a lot of similarity with a json and remove / replace all contents that keep it from being an actual json.
process is something like:
search through all lua files with glob() / file_get_contents()
remove/replace (preg_replace/str_replace) everything from the file that keeps it from being a json
use json_decode() on the final thing
grab all info that I need and add to database
display data
I used php for all these steps
These lua files can be retrieved from a guy that maintains it on github so I thought it would be a smart idea to just parse them right from the raw url.
Ideally I would need github to send me some kind of clue that the guy who maintains it sent an update and then I start parsing in order to automate that process.
Can this be done? Is there a better way to do this? Any pointers are welcome
Related
I have a huge list of URL's from a client which I need to run through so i can get content from the pages. This content is in different tags within the page.
I am looking to create an automated service to do this which I can leave running to complete.
I want the automated process to load each page and get the content from particular html tags, then process some this content to ensure the html is correct.
If possible I want to generate one XML or JSON file, but I can settle for an XML or JSON file per page.
What is the best way to do this, preferably something I can run off a mac or a linux server.
The list of URL's are to an external site.
Is there something I can already use or an example somewhere which will help me.
Thanks
This is a perfect application of BeautifulSoup, IMHO. Here is a tutorial on a similar process. It is certainly a headstart.
Scrapy is an excellent framework for spidering and scraping.
I think you'll find it will involve a little more learning overhead based on the Requests + Beautiful Soup or LXML tutorial mentioned by tim-cook in his answer. However if you're writing a lot of scraping / parsing logic it should guide toward a pretty well-factored (readiable, maintainable) codebase.
So, if it's a one-off run I'd go with Beautiful Soup + Requests. If it'll be re-used, extended and maintained over time then Scrapy would be my pick.
I have a python script running on a cron job, once a minute, which builds some data into a dictionary. I am writing that data into a simple, space separated, text file and reading that into PHP whenever the webpage is requested.
What is correct way to do pass this information? What I am doing now certainly feels clunky since I am rebuilding essentially the same structure in PHP. How do I just write my python dictionary (perhaps an order dictionary is needed) to disk so that PHP can read it directly into a PHP array?
I also worry a bit about whether this method of transfer is kinda archaic. Should I be using a database instead, e.g. SQLlite? I worry that someone will load the page as the file is being updated which might cause some problems.
I just need something light and simple. Don't want to overcomplicate it. Thanks in advance.
Any language can read JSON, as JSON is simply a way to format data as text, which can easily be parsed by any program. To write the dictionary as a json file, you do this:
import json
myfile = open('data.json','w+')
#create your dict
json.dump(mydict,myfile)
myfile.close()
now to read it in PHP:
$data = json_decode(file_get_contents("data.json"));
I don't know whether PHP handles json data, but if so, that's probably the way to go. See the json module documentation for more...
And in fact, PHP does handle json data: http://php.net/manual/en/book.json.php
This should be a relatively simple but robust way to transfer the data.
I am using a Javascript JFlot Graph to monitor a list of .rrd files I have on my server. I hard coded all the .rrd file href links into the JavaScript code of of the Graph in a drop down menu. I then realized it was bad practice to hard code them in.
Here comes the problem.
I want to write some code that each time I open the Graph, would check the server or the page, for all the .rrd files, extract them as href links, and place them inside 'something' so that the JavaScript 'drop down menu' for the Graph could read it.
I researched there was two solutions to do this, 1. Using client-side programing via JQuery and Ajax (I am not cross domain coding) 2. Server-Side via PHP and Json.
I figured Server-Side was the better option, for the understanding that it's easier and that the code wont need to download anything, regardless if it would only take a few seconds.
I just could not figure out the solution to this problem yet. I hope I explained my problem well and any advice or code is welcome, including any good practices to abide by or whether to use option 1 by extracting it from the directories on the page or 2 collecting the data from the server.
Thank You for your time and consideration I truly appreciate it.
As long as all the files are in one directory, you can generate the list pretty easily using PHP's glob function:
$files = glob($storage_dir . DIRECTORY_SEPARATOR . '*.rrd');
It returns an array where each entry is a file name. You can easily convert this to JSON and send it back using:
$response = json_encode($files);
This is a string (see below, after the dashed line) in a database.inf file for a free program I downloaded that lists some websites.
The file is plain text as you can see , but there is a string after it that looks base64 encoded (due to the end chars of ==). But b64_decoding it gives giberish.
I wanted to decode it so I could add to the list of sites it had (the program lists a bunch of sites and data about them which I can read in the GUI) and to do that I need to decode this, add to it, and re-encode it.
I think the program uses .net since I think the .net library was required on install, but I know nothing of the original source language.
I am using php to figure out if there is a simple way to read this. I have tried using unpack, binhex, base_convert, etc as I suspect the file is binary at some level, but I am lost.
Nothing illegal, just wanting to know what it is and if I can add a few things to it to make it more useful for me.
here is the file - any ideas how to decode and recode this for playing with?
Site List
file size: 62139
db version: 13
generated: 2010-04-27 11:53:40
eJztPWmT27iVnze/gjVVk56pXXXz1BW3XW2PPXaNr/iIa69SUSIkMaZIhaQstys/fgHwEIiDBEjQk6Q2lUraeuAD8PDwbgAPtkl6eBoZsX8At1enDKRXRvGfL350gj/9uEmBn4MVAqFGP14Z+f0R3FoP//CA+Rb9ddXj26OfZYJ+EeicpEHrt/5V/2/XA75FDTjzlf7WvlL/NgWXnlW/BQc/jPjzxaAfMUzU7+Xr7m+pj7cVZ/C63pa8Iewa/e+299fbMM3yuv8+fa8wCs5Cy/W9EmwLztfU51Eb2SKZoUe9v458gmq9+l4hFDyiS/W9Ei0Z52tO5/W8+VQ3aGwipvcjIRGUMPmnfN8XE40qCFKABSaFpgQg2ggGARtwB9D55SbM77lfIkCxGIIvsxw2460jBrRxwbfwyJdVEFB2eSERTaTstP4r2OQsG+RhHoEfL7+XOGy6d9yObKaKwE/zJo4eCFYNDD0QhJsIXHD0RHAZRV8EzF6WRQCXkU/ECnDBIUSwB37A8oEsgg3kuF2S3rOCtASwCNq1H4ECCYVCuTT4mRlDUwH2QNDUgT1HcFGDfUewIobQjaBVGdIYkMqoV0I8h2gIgqZK7DmCi1bsOYKVcB25CFp1I38djAY6wXqeokg8Enk8qEWSXg0eT4GHGFFPPMk5BmkHn8rgaVoOA+ZVStCaTgPoo2U8eBzD6bNdHUHci38Ycwi14Xk2F0C7aCpZh3Vv1BAQQ1BFUDDdAATk9PsjuBiW/WjQGIUqglPamACBAJpBH9+9JIwFsbkE27GaXhoBXkZiHKoIzmCdhTnH9VBEsKrHoIoAfd0gZB8i8pexAnQpGMhBySndsIKmAnDsJc6GXocJX1RBQJfJViwkgaEfAmIMqgjI0fcbwTo55cINLYOg0BsXPL1GQGrnnkQMQA65hvRWxQgoDAHINlxbBQHS8JiHSUz6nswQiHZXvRCUVGzi4SNpV98NDCoIstPh4BPeR8scWkdA4FFEkOVJo38JBBSGz+AehSWzKwZDBekwe8s5EHj6IVhdMHQioN3AJM5BnHPWocQtsSFXTSTqCPAc1klAhWLUEAyaAmpGzKIfghx87UuDQqYMQFBFNGoMiggu1JcmImP5VpGD8BKW4AcV2udQtV2FZCqAiwCOIQeHYwSBxotfbq/uChTGL362Xyd+GhhRsgtjOzutD2EOO7g+7o9XD//wbw+yI9iEfrRCmB5KffbgpvENxLGN/F328O/P/axo//e7U54U3/z9wU0Bhc0wDNk+D2+4aC/wS+MMpA+DZHM6QIa83oH8aQTQn9nj+9eQVj9BYtd5qZ//2/zfa0ymGhX6usaFsicduOrMC4sLf13j2oZxmO0f3tQzwCKYnEbZAlEYt0HzsvkfkA1g+xTswiyH/gKmVBbu4tOxaNiAXDAXQrpjanW08jI149b4oQzU/fCn/4H/6cRQKRkKB6knJDHhbUahqTaYJApob9IYahNUEgVUDjSKWl/88Kd6ZUoCXygeJDEoludwX466sZQ1nPokSpLPcKtb9UZ7f9psoEeGgi33xtsELm7QRFJ/ATGVDnXRcfmPolsSQjQkMTx8C2IDTb3Z510QoC65X5C8KMVjDSeTomsjlSizPGU+Uhc6ztYmEdWpVbmh6cS2paQXiahIHMlgiVqwRNJYSmpbAkRVGkkGVZHd4eNBCR4ZHDgrxUeB81IyOIr8FB9JkaJCO51mdCTpUSx2s/fjHXhom+Z8YjoT2zKsxdKyl5YHBT3R4A8PbioF/JBWxjdhHICvaKc+Ovo7cIsVBKt8uc10KFs+Xo62rTb+R6g3jZdkM0IkSCpm1GBV8qQ1TC8Tm43GxNfK9YRZZUz+u54VnKrx5pQ3W5NzbqhwipwNjc4o88s/rRrhc5AC4z45GRs/NooGRgzORmUVkEgsrjTmLWsFHGImNPOBPS0FKqJNK3l++FcebRaHxyPIh9kgNTK+QXOxRGRsAGohDlBCwv+XNwb+E9os1eIbe7iv1wBUjNFqEHB+t5vYwqwDj82RdLOJucCSbrq05kvPVJB0aGdf413EdzCoBkLpdhdFydl4/uHVS0LQ8aUbjbGPF1FERAc7EBLuA7Gzy6FDnXpN2JCPHjW2PyN9ur/hOBH4o+Kn1EdbBH8FZc4BHNYgvX0Nzv/+Cv85RHoMkhp15EZsj3du6ktUlyd0UERSWjIUGMpIyCe4w5Pzdf0Zcl6uwzgGKeJQUmKgf0t0IuVHcUSPijlOKuAhpuUlJT3MuMTjoTdaH1ten+2t046vckOsJsF5me/FvRQSWQa+BO0xB19JWcRYT1gLw7KgObw05wp6ohhndiMyhRtwCRuYrxooLH00w8dLdGmgFfu224ptCPpq8NU63ARJp5ynvxkizT8MkebvdQSKnoXsACrDbRuKzDMVGfmRjR2S27+ua+8e61ttgqTC9CSJoZl8GI7wGTK2Xw9XBC/9VjSyeuCpBhXwOEx2qX/cs4b7RVRKSjBvYi8M21ma9tK25SXYVRs2c2nPl46CPCzDgR9SP4CG2iY5CEPzlya9pSL+86afOMQHSLTYydIh9uZwOzx56PxAD+oLgF4PEeSETXcgMKCHIOPiW1WU/qbdEuc04ojkWnxr8Mt1uOVaQvhknq93EL9RwTDYYJfwHzQY1MMVhS4L+G64YkBNH8uhURGmlr10vKXlKpiD5+OkLWhAgnWEDJr4+shB2EyDGDzLRQpsPGI/OEA5eEyyfAIdd4nMITnLR/4Gbe2WIGH970omvit/MJ4lqfFhH2bGe+jfEB+ywlNiE6HxDwsvHjZwowRhDt0cc5iXrsPBLhyYwVJByTlW3IzmdOk6CpuxULjGT2s/MPb5IfqZsyOZNv+61ohena9f4Uhzg2UhT92ZLW1LnhtevH/6y+vrJN2xTFCDdIjkgp6TykjhMwSRrwlj44/G+yJ3clfnTYYkq/TUkGC9KS3W6VkX/nsp6itgPOGVGAiqRZoI+WUjjAR/gpnKuIuNu80mOcU5j7bD5H5lXw6zO1uqRiQjr63WJoFjoG4gTkMNc7oPYRBEYAXdlRxuMArZ9wveQn/z6Mc07SXI3Vq9IV9OMLwQ4JgmX8J401UJ0IkG2i/MOqiVNRz3SUwPQ6KQwf/KlFN8h6i0DneFI117mkn6wvUbNVRKSnaG0qa2Sji8zfnR5flUYptE2Fdf+hqrLv0VZYAV9O7Umit5rVj9a2gVpZ8OrYxYZVC/5/i0wMAITL6KwvgzOgvUP+OUrvApT95oWK0qwPb4/kXw01WIwtXICUueQtfMuvp5UOp0oPLlcmdPiVNwp6geTkHgFIhsHYh0yGR/aKIRSo0NLiUYppb9lZSKUgrmm0vbWbrTVvkbhKkB0d9enc/na0j5z0kcQoZNY5DX8XckH/jgBzel7EBc9PCVn34GeRjvDD8OjAxEEfo7ieH/AeMc5nvDN7KDH0VGkhr75ACM9SmDsAxa1RgBRLRLkl0EHk6ga1b+CX/0I/DVx78Vf1Fh6n2eH5c3N/wh3hQG/Ko04K+Rl94azx6EzVYTpNXlHT3laHFiqVVicUQIyy7uxLQxu7hLh/WJIYswnOIf/G9o5RgeIQFN7rgrIcxau+xSW3PPnBGrrexHE4tIjuimqPMpolXXm+1hQJkUa9KPVy3VyPmozq0jVbRPwfb2Bzib2pCA3z7CLFjWdK7uk1NaQTO51JFokFQ/sjtRFgdr9IiQ1LYQxvD7pqSInI1Ffq9S80WWaykIA2VnXlMllQZfXOOxCP1VCxIGjqQu7yGci0Vfh99Y8dwAUQK6rJV9HH5jZPSUldGeNTOtVhlNSGaRaGgMh3Jo2tPOvbCoq2fUUhzEIoq4ZYodMTZhKEqlCLzExbWAiYsopBENEl8ljuFByQpRq0U+kkWTAT+FLaqAMbT3GhuHBTc3z3sMv0SJqf3jcWwcZ+bNZ90mLdszFd1W0aPKyBR3DK8uqvdJB26FlBJ3D6rdu2tVqTLFf+Eu9vNTyqfIqNxcWSXNQipKC+REDRVPERQ1VlK87E3nU132Oj24MjX6KABZfnvJPP7DJkqFxrrSxIYVeMkd4WodnFSKtxVDnf6lZ6jrYBZzrFL1aFaADoKE6xOK36zIm9V0H9ZqdKTjiAcPr86SturiiGFJxuoEXHk5T3e6UX3REOrVMfLZ6pN2Kfv/FW/aPBRvYs9wScV8abGH5HjqqQyKTzZJmvoRo59YcK2gdFzTgfVZGac3nuA+GCXnsErOtheeaxNaToeiBpUtVoRJGVJw4PppUZmtxhvciUyEbu5NvXkHLSxMCw+Ro5MWMUhqC5QgQfNn/TN/DZK76sSyRFjSsuYzT+OsCb2Zhpu9yFCrYKOtvPEO9iBDAddzGk6/RgJ8gYLGF1GgBo5Ggr+gHmTEgOWac1RHqJUGh/tS5q1PYcQz2XkN9NPi1b1RycXHRT9SFHHMhauTK2r/FP4CNijJsUuioEETcRP9VPml6uJX2IVRY6p+ldk39tS1zREo5KbgmIZxzqUNCRxRabjGu6IfhhAmJ2o4s217JBG69UOhr1vBxhOhz2APDAVsDis4c9ObjkOBLAY5SlqKqEDCR+SI10UvUrLDMq2FTkMCzTZOcnD0g+vySG1FhMvPIxgSBW6ZKU89+B9X85RPx9M6gq4+nNw2acy6AdE/8Y/G2wI/USLbLQ8905nqXvUqDRAleS4OehHg8WTB46ITGW6YzSxzoZkS2xSAKqA7CcAxYWs5+E30U+QZ7OeiMlE/MkRxp/OF9i2Shrt9dT8RQw8aqJ8S71APFSmkbG3L9izdiuLo536cQ1syO0VFRUulGr748JcGTTra6ifRW9whMkFRj/DPzHiBi29wgRXHS+eEomeOO7VnI6lXHG0SKtcaOp5qvUNdyFgZlu0ubO2CxY8i6BsgsxLE4JQijcJSg99mBJpE0VOioxdYvUlQZr5wF7oZBHyrHRJGAVMw/YR4+l+17m3xSni6Z+o4C61uCcrAX2++3dSapRrOxTkRNhlPHSu5apY7s2Y6iUKIiA2I2GgnBRuPDE9gDzI+GnRVLVMnBa7a+xN3ZZmoK9vDZ6ndpSMXXa62G9bpcXIWCmyywXgyGyv+18mZIT2nxtW25wvt7lCtn04b1vChgeOx33vUhYwKt73ZwtYdYSvn+dlPD8IoYw0cjQi/oR6kAkbuwprqjpOgaxn/dgIZMlKuT58pY68BG8O4KzuQmf5s6lnaI+1lOGxSRlJZH4CFj+AGFJ1UcVYZYixmnjOSResHX+AgTqlQLjQajCcb7qpuZMwW6BU5nnYNbU2SYx4ewm/FxZCku9MkTEvDEQhkTd4QvUlEnzuPTggLOFom1nr9Pldnu4YJdba5NNkLolv48VAdIamP2lP82GgwHj/WR1nkMkLWCOGKKvcHcpSXF6YGL+ARs4NFJ1J7E7rf2oMWdRKQVyxAA8cjw19C2UoBbzGf6SZCqZW2JxCBgKECAx2tYuIZ7kKKDObMm+q244LT4RCCbHcKA5Cz0QcWrJ8QH/bwu6If41fUkQwxFtO5N9ettML4S5iFaygW16xRSwP1E+JF1cOrx3cyNHBm5szVLSfzvZ9nh/s43OxZK4YGjsEMsAfj1b3xGvUhRQUbunj6qVBVCp2TNGLlAwc+zsaovv+EupGSElNvrt3TgdMtJeImSWNONojXYByCEIVmsWw9heNYuhkEneGr8htMiJIG6qcEvo2/23jl88fMHKligCkr4cBGsyd+ffPyF5mAtePazlglEwdo2SXHJAo5J3l5LUa0uOt+pIooZtPFQmuMskqKnsG6FAwsSTjwETKm754+NT6BdSU4pOgxm831kwO/5wGCyy1arIXBNhjDysC9iOsLeEkec6rf9IS+FhRcqJA6Cr9A37Ccdp3A4ASUuj8YI8JU9Wq8xN1KyF2e2IF62TV10/Ds7w9ihmKg+qnz6e75q4kSK808z/S66r77yl8/Pol92gt0zEgH6kMmLWTPp+58rMx6GGcoTSJMTxDwEQvXXhS9tGwUHl2mtjt3RmKQjTj0sxk57POEE/HhZSqcETRxHd3yo4M49FUC+RSQfEaOCnRBlHJli97CstpnPZ2YHn4TxFracolLZF9wdASdqhE3G06JZr2WyjaYtO8AdWocQbqF3QuVBQc+fP5vC6RiY4PHDPbcnC/aC+DVpr+FIyZWuTF1DkzDssO2Brn2UpUSpgudM93LTs6NTgJQMD3srjBjG/rl3rRdB6rN+LN/hhPa+2WRLkpskIvNBw+f928l3qp4l5fp4B6AWkxnHcEa9RWvHG6opUAsdMdrqJLCk7CAfsWYZQoTPHPecYpDffKHe7E5TMF0TfzVvUy9GO/Qn23PdS7+8Logc7F00Du0UqT+GzR0w/yeECIsyQVtdJH+zwX6hqiV8kJcqGQcnSqG2Fq1ESEMAjVbfIdCPZYm3GIpx5yJhfGFSWboIkSLvYiWR5Mo8FEEND0USXD6HhgWrJ8aL3+5I/ooeKVinH6uvOWZC7GeJgllzXg3RnLzkX6aRGHsC7cSr8EIWcmyF6XjJ9bUk6LHdOnA/y7kNtN60xkQErQZYUM9ftLvoKJltVXPk5RBkkaOU8opx+DrSShhauB4wuU16kLGk7Es1xbXYJI0gBvGk9M9+LwNMt3F2p7fRJfmwVIEm/nCjcIlxXwxd3UavGQAIzkcw6guNuLGOJpNRqyNvnQk5/qYltgSarCIs/TkBMgO1dhUS4/Ef3JgmUTUSBeb/IrwX8xD2IPxCnYhQ5Pp3LQX7fHi/iZKNQGhhUI2GNFAqb6VqmOeOmJyXHjEW3rO0mZfJuKGz4tLoi8SgrZPeA108canArdSbMRFx5BGchiz2M/Fl2ZcoKOxw3vchSEZM+Pdm7bwxDZ9g0MgbeSkyD5Zr+87DRFhK1288hx10C/hb9mLxdTSHVYqJzspbkAE8Y53246okeaQw+Q9gf93ia0GAByFhggNHOFGDdiDkvvrebOZ3E5xob6Vk6U130+S7UVeMrUx4mYj3jRivNkq2WmOYzVvYGwhkLc0lZIwAeaGA+/EBLeFrs1SvryFK4deg7MwMv099osfJPgCvWr9W1y8tpYjaKKyu74RvelUcl85ztLp3lenz9fr1IcuzBpOoUEbBqKLTR4jrOjGhZ1avmpmubO5tSCPSHPuxmSGfUOvSvN2TJbZqucnoQqXOL9YC5jsGAYg3VCnF/lgQT4A3U143bnLMKInyScleWPPLdsy20vc1XdZvgenzxIJ0JZ2Q0iBqjU//tYv++k4ZmuMrR9BCmue5x83IIJJFxeaZjdtiaBPFR6pU3nzmWe77RuGGRxxhqjaOYUYl9s5tolOEXEekW/XWAnvjCML5l98zL+nhHfZhNes6Zd765W9NJcYEfssWXXK6nd5wrP9XZL2y4TJWcXgXCVuO98pa8fU/oiZyrWuJZks4VMWRq8nKQQof4dXAs9aHgjUgaX7kS2ll60slNCRiLxW70t9jpMz5KIdyKAxSR2Cb2nTlA6Py4a/VQ3f44ZydQPedDZvF5wt42hKzqGPTpVojIFP+BVIylro/pfko3umfchPrzks71Nvn0lcle9NzJlhLZauJVtqhp4P3KWodoI9JULBmvzwEgJ/LYBy94pMZ2YHD3D6pF955L3P0EoGp3ubVLgBfv01TU4x97JNGt4kB6qdM57iFv1MKNtcLOwOc1wwEuFl9TzKWCaijDNbmtLmODSjwm+UFcaC+MbE86KBlH01c8xGgYWSPcEO6FIR1PqM+IfE8IPA8GOj2tX3yck4nLIc9Uq8JqD+QCpu9l2fppcgxiP8rvhtYZJImSIcbMVra1mBb5PfHtNkGyKztvOV8WdhFBnQhTG2IYiCzFgD9EpGnhhVS8M34NgMRKP/QE8lwj8MqAv4T4+rGDz0avS0KzTdY6/nGnuogLZymNTeL8ZCwptDZ03eZ4c0PO8TvsN+gTXFxEe4uJ/27M1lPB9zbi08zjMrtXiwZeQDOZSbR2Fwe2f9k7yCKN7m7JwclX3Nfm7L7uMwRsIyNQKQQ2rUm3kPYmMThZvPxnv0SAu6p+6ev3+5zGfhwjZXJnO82QP/CB3ufZKh+you0WoMKH9NwZcQnOGCfAk3IMNPwAdIeJGcOhRRk62foI/wMbbnxZeqtoDpTht3vHAWf+iI6+W/3EFHvYZVx/snuDgDA7MsTGJJSwOvI8pIdIcvAHqa8pJfICufWVCT1k8x/BIA9nN/7cfs9cs8kYKumeqwudjui5+ouI4KSTx76XbfC4NyLMckiVA86QtcwKbfxoU2CfOphBjobsi3sK2Ur2YvPIvz+KycEcYdVfnQ0z/wm1WcZefPROVVKD6G+lWoYcEbuTd55PW9ZVjmEp1R6t6qB7gF/CrBgxK5IcWaogbUc9m4Ve0r3RXtpG4GXszdRUccVjSGm80unKwhNxbfXB/ZUIKYPlAloepACa86gUvHPj9a/yqIWJ+Pk64gPYFCppoRDn/Bf2Sx4XTXWNEQsHtwc4T6QylW2cAhGaWsDIF3lemPCq7yfQgNCUgIeYsBBcWK67bkEgto2ML8Pw1ssi2CKpUAefa0+W6SYA3obrtixmI29dylK0cHaz6fC+lAA5t0QFAxHXinR1zT7q1XeOMpUwVFOLCPasm/5vrSBRBZ/4wBMzXqVVJuyFO0ERlk9WYs0PR/fjD/WgVhBwQ8v+aDIq9oDDix1x1T70Cj5b1AiKd8t7DjOcOBb/DBflqyIj3zNfwN0DMswt8APUMbENkmidXwyXBO63PpJVRyrvskBtSTjlIfbv2vVz0+Qwqxz3eohNyP7zlrLPW5jlcbEZulNBLZwR/Y9IqiXWstzfnSlCuKq+RmmEV+LKy2JsACS67weIyf1n5gIKn7c+fVCxij1LGvKXTQzPZLsvtUeE0Ofp6GX5mS6gZk6HSDySuMS+qiuOl0Pm2/B6zH8eow9uMNmFTpxknksxekixo1LaBnRSujSowad3FgvPTPxrsydymXFlvYjt0W1ey0i0Sj/ad0u1snVHrfjx5J20JCVC1uuHIqWUcmWefjxZIPLEtiwy8h63gHuXgGufsRZDG6D/6uwBhu07p4pUzBfwrjIDlf1x+uk+D+OoxjkCLUSqNmu7GVumlO6V/q8WVfT9ZqLYdG6QFmdNBsaUkeVi1PXYYb4YHMAqRHvUNkMlX+C2vutFSx91Ts8b3Qw6dgVAI/vlc6VOzC0VtOd5yD6vQGWt3tfq6CfyoWVTKlXoWxSYtNpeqbwt5Ex+3V7M3yeWvBre8EmFtkofJytTdz3Mapq9LeuItQ5g7JLNLgaK9wJMZF1+nwIlbKirVd7SgsibN0F7LnUMEXkN4HvnjX8BpQGSmqhdx9g1OnsS4C0vN6V01BFbLSlKNHtkmSY/U0SpMSTVCTBu8R7M+yD55YrjdrXvAhmH6zy5tqAyhTwF3ai6Ut9zgDCj5P3JCdPwmgiuPevPlt4r74IBcgXkwtTtmT0o4kh0JHDwfuRa7Jtg3TLP/RKP6BaChjhHAxyZofG5+R7fAnsEvSe5kYRF/7nAk+qZnkjKWjZjpvWkKcCvIPcru3tLtLefAVQGeQsamr+tcmn/8Z/Sz18LU7h//t3t91P9+jOgHSxYHaWu4Ki3wPinOxRSEdTSAW3KQUOl1TnHs18I2+UifoZ47tSKSO2L6pkmmV/JGLLn2xJF/USsEhBOnlEBJtwPAaNOnytmiheHDc9UyZvCave/pATrvx2UKluaxF4Uw9oTFBwZq0gUCVU2kLy5pyMmyEHpEPKlHjujkkwQn9jeo90Uhu0bnU1d1mk5zQK16/71Ed1HzVP/vWOdfiKO6qPoErGXxSouEfkyMqycVTHxKTyvBzo6qGs0pEAl08qz8XpS8RhblBR8BDxFY9Ax8NdLbmCAiSSTN8SlnpPohjmgjvpKpgXJfzLQTKnbA2p5wjgn28zWo8OGKQnOOBdi2bkqIO3LTustO6Mj5XvVJrKG6ozb2doVsNJC9IIa929ONA9BikuBmVBFG9yXEOdZQpcYhWPIBKga+qEq5BbLD1/Q1vCfu7ORgjV/4o+TsYzaDUQoFhUEwMo1hHp5T1Y+jziyrcOlc80ZztEyg9henYGsqPjj1HYKnaV3QxokQQgukXiaQoxAXX1352fARlw61m0SRf2kI4qPRZ3MvCK62XN1tacqGSKs+2D3f7CL9YQHsE3Bb8k6bG86qNzOKhBzssicXjDgDfkDBpOgRDAtFf83LrGkbPzYtQ7IEfQJYaXKG0Gn5MFSJZ36NgbwdjyiDCQrVL5sqNCbOkEJOkwEaICnnNrJaSxEZ4tJZA1c3RAkKyHPRgxYZoU0ooV+asypv2VqLKHgUBM8e3RIoiDv8H/FOXyg==
In all likelihood they created this string in such a way that you couldn't change it.
This isn't that they haven't thought about whether or not they want the data to be changed, they have specifically sought to obfuscate it to make it harder to change, which suggests they don't want you to do it.
Given that you are using some else's code, you should carefully check what license covers your use of the code and whether it permits you to make the modification. Once you've done that, you should approach the originator of the code to ask them how to make the change, if you feel you are entitled to.
My guess is that you have a script that contains this string. Check if you have eval() function calling this string to be base64_decode (ed). Change the eval to print. Then, execute it, redirecting the output to a file for later reading.
kevin#server:~# php suspicious_script.php > out.php
You should be able to see what's going on.
Pse forgive what is most likely a stupid question. I've successfully managed to follow the simplehtmldom examples and get data that I want off one webpage.
I want to be able to set the function to go through all html pages in a directory and extract the data. I've googled and googled but now I'm confused as I had in my ignorant state thought I could (in some way) use PHP to form an array of the filenames in the directory but I'm struggling with this.
Also it seems that a lot of the examples I've seen are using curl. Please can someone tell me how it should be done. THere are a significant number of files. I've tried concatenating them but this only works with doing this through an html editor - using cat -> doesn't work.
You probably want to use glob('some/directory/*.html'); (manual page) to get a list of all the files as an array. Then iterate over that and use the DOM stuff for each filename.
You only need curl if you're pulling the HTML from another web server, if these are stored on your web server you want glob().
Assuming the parser you talk about is working ok, you should build a simple www-spider. Look at all the links in a webpage and build a list of "links-to-scan". And scan each of those pages...
You should take care of circular references though.