HTML to csv file in php - php

I would like to read a HTML file as and convert the table content inside the html to csv format file using php.
In general, in a directory HTML, i have a list of html files say
KMC_Doctors_list_A.html
KMC_Doctors_list_B.html
KMC_Doctors_list_C.html
....
KMC_Doctors_list_Z.html
I would like to read these HTML file and the write its table content to a CSV file.
Can any one help me in this regard.

Get ready for some reading: fputcsv will write in the csv format. But you need to understand regular expressions preg_match and preg_match_all will be very useful in the process. There's no quick way to turn html into csv.

Related

Extract XML from .prt file using PHP but file becomes unreadable when opened with PHP

I have a .prt (CAD Design File) that I need to extract some XML from using PHP. When I view this file directly in the browser, I can see the XML along with some unreadable areas. However, when I go to open it using PHP to get the XML I need from it, the file becomes mostly unreadable and the XML is no where to be found as the file looks like it was encrypted.
This is an example of what the .prt file looks like when opened directly in the browser: File in Browser
This is an example of what the file looks like when opened using PHP: Using PHP
This is how I am trying to open the file with PHP:
$handle = fopen("thePart.prt", "rb");
$contents = trim(stream_get_contents($handle));
fclose($handle);
//echo out contents to see what happens
echo $contents;
If I could get this file to open without doing what it is doing, I can get the XML out of it myself. How do I fix the issue that I am having? Thank you very much in advance.
Real Answer
Turns out that there was no problem at all with the code. The browser was just interpreting the XML tags as HTML and so the data was not displayed (PHP by default sets a content type of text/html). When viewing the source code, the XML was plain and visible. The XML can also be seen without viewing the source by setting the content type of the php file:
header('Content-Type: text/plain');
This way, the browser will just display the XML as it is, without attempting to parse it as HTML first.
Initial Answer
Just a guess here, but it might be that you're opening the file in binary mode (the "rb" in your first line of code. Try opening it as a plain text file (use "r" instead of "rb").
More likely, it's an encoding issue where PHP is trying to decode a UTF-8 file as ASCII, for instance. Since you are opening a binary file (CAD Design File is binary with a little XML, I'm assuming), PHP might be getting confused while trying to detect the encoding of the file. I would need a copy of the file to know for sure.
Try comparing the result of mb_detect_encoding:
mb_detect_encoding($contents)
and the actual encoding of the XML data within the .prt file. If they are different, that's how you know that PHP is using the wrong encoding. In that case, use mb_convert_encoding to convert from PHP's detected encoding to that of the XML data.

file_get_content and file_put_content to include php code? [duplicate]

I m using the php function file_get_contents to parse a php file. But it seems that as soon as it is reading the php tags the file_get_contents is malfunctioning.
I checked the function with a normal text file, its functioning perfectly. But even if it finds php tags in a text file, the file is being half read. How can i find a way to get the full contents.
Is the file local? Or are you trying to get a remote file? How did you check that the content is not read? Echoing it to a browser might trick you because of the < char in <?php
Use htmlspecialchars or <pre> to view the whole text. Or just look at the source of the page.

How to convert Yaml to csv and vice versa

I have an application and I want to make the translation.
I want to externalise the translation and people who translate my application wants an Excel spreedsheet.
I have my messages in yaml format but I don't know how to extract them and make a csv or a spreedsheet.
The second step is that I want to extract the csv file to make a yaml file.
I have you got an idea?
Help please
I guess you are trying to convert the YAML files containing translations. This files are basically key values stored like this:
button.ok.value: Ok
button.ok.tooltip: Commits the action
Of course YAML can be more complicated, but if you have something like this, just replace the ':' char with ',' then save the file as CSV (or change the extension), open it using Excel. Then you can save it as xls format or whatever format you want.
If your file uses hierarchical nodes like:
button:
ok:
value: Ok
tooltip: Commits the action
Then you might want to write some script to iterate over the values (that's a tree traversal) and write the values to a file.
You should provide an example of how your YAML looks like since it's a very flexible format.
$csv = "out.csv";
$yaml = "in.yml";
fputcsv($csv, yaml_parse_file($yaml));
Requires http://pecl.php.net/package/yaml

Read the content of a PDF with PHP?

I need to read certain parts from a complex PDF. I searched the net and some say FPDF is good, but it cant read PDF, it can only write. Is there a lib out there which allows to get certain content of a given PDF?
If not, whats a good way to read certain parts of a given PDF?
Thanks!
I see two solutions here:
converting your PDF file into something else before: text, html.
using a library to do so and bad news here, most of them are written in Java.
https://whatisprymas.wordpress.com/2010/04/28/lucene-how-to-index-pdf-files/
What about that ?
http://www.phpclasses.org/package/702-PHP-Searches-pdf-documents-for-text.html
ps: I don't test this class, just read the description.
$result = pdf2text ('sample.pdf');
echo "<pre>$result</pre>";
How to get “clean” text :source code pdf2text
http://webcheatsheet.com/php/reading_clean_text_from_pdf.php

How to save a the content into the file with neat alignment?

I am having a textbox, in that i have loaded a xml file.
After editing and saving the xml content into the xml file, the content is not in the right formate.
While loading again, its not in the xml format
How to save a the content into the file with neat alignment?
Please help me
For ExampleI need to save like the following
<section>
<value>a</value>
<value>b</value>
</section>
But after saving its looks like
<section><value>a</value><value>b</value></section>
Thanks,Praveen J
As Gordon says your question makes no sense - the XML fragment is still "well-formed" (but its far from complete) so it is in the right format.
Do you mean you want to preserve the format it was submitted in? In which case output it using <pre>...</pre> tags. OTOH there are standard tools out there which wil format XML according to specific standards - e.g. geshi
C.
I think is issue is that it doesn't preserve whitespace, so opening the xml file later shows it all in a single line as opposed to spaced/tabbed as originally created.
You can try white-space: physical as a CSS attribute on your textarea. Alternatively you can try adding the attribute/value pair "wrap=hard" to your textarea declaration. Both methods should preserve whtepace.

Categories