I want to use Drupal for building a Genealogy application. The difficulty, I see, is in allowing users to upload a gedcom file and for it to be parsed and then from that data, various Drupal nodes would be created. Nodes in Drupal are content items. So, I'd have individuals and families as content types and each would have various events as fields that would come from the gedcom file.
The gedcom file is a generic format for genealogy information. Different desktop applications take the data and convert it to a proprietary format. What I am not sure about how to accomplish is to give the end user a form to upload their gedcom file and then have it create nodes, aka content items, with Drupal. I can find open source code to parse a gedcom file, using python, or perl, or other languages. So, I could use one of these libraries to create from the gedcom file output in JSON, XML, CSV, etc.
Drupal is written in PHP and another detail of the challenge is that for the end user, I don't want to ask him/her to find that file created in step one (where step one is the parse and convert of the gedcom file) and upload that into Drupal. I'd like to somehow make this happen in one step as far as the end user sees. Somehow I would need a way to trigger Drupal to import the data after it is converted into JSON, or XML or CSV.
I am not personally familiar with Drupal. But I do know there already is a fairly advanced program written for Drupal called Family Tree that is a project at Drupal.org.
You might want to check that project out, or even better, contribute to it and improve it.
Disclaimer: I have no connection at all to the project, and I've never even tried the module. I just happen to be aware of it.
You'll need create a specific module (or search for an existent one) that gets the gedcom file and generates the nodes. It would be easy for the user (just one file upload).
There have been some genealogy websites built on Drupal. The best one I've seen is https://ancestry.sandes.uk/, which was announced at Functional Drupal 7 Family History Website in the Genealogy Discussion Group. I've launched a project to build a multi-user, multi-family-tree website on Drupal at www.greatalbum.net, based on Drupal7+OG. I discuss it here and here.
To be able to upload a GEDCOM file, you will need a GEDCOM module to do the work. I found this GEDCOM Sandbox Module based on D8, but it's still very early stages, and the code is currently missing. I've contacted the author, skybow, and he's going to restore the code. I will probably hire someone to develop the module for the GreatAlbum site, but it will be on D7 for now.
Related
This is just a speculative idea for a client who has a lot of PDF files.
Algolia say in their FAQs that to search PDF files you first need to extract the text from the file. How would you go about this?
The way I envisage the a system working would be:
Client uploads PDF via CMS
CMS calls some service / program to
extract the text
Algolia indexes the extracted and it's somehow
linked to the original PDF
It would need to be an automated system as the client shouldn't have to tell it to index.
It would be built in PHP, probably Laravel running on Ubuntu.
What software / service could do the text extraction from the PDFs and is any magic needed to 'link' this with the PDF file?
I'm also happy to have suggestions on other search services which may handle this.
Fortunately, text extraction from pdf's is a subject that has been covered multiple times. On the command line, you could use pdftotext (available on Linux or Mac) or in your code a library as Apache Tika (for which you can find a PHP wrapper).
To avoid having too much noise in your records, I'd recommend you to then split the text and create one record per paragraph. You can then use Algolia's distinct feature to deduplicate the results.
You should already have the links to your files somewhere, just store them in your records and then, in your front-end you'll easily be able to create links to them using for instance autocomplete.js or instantsearch.js .
For anyone still looking for a solution, I put together a GitHub repository that does exactly that: https://github.com/PDFTron/pdftron-document-search.
The text extraction happens client-side as the user uploads the document using React + Firebase + Algolia.
You can check out a quick video walking you through the sample app: https://youtu.be/IQATnzHTp7Q.
Let me know if you have any questions.
I've used a couple of days to think of a best practice to generate a PDF, which end users can customize the layout for themselves. The PDF output needs to be saved on the server or sent back to the PHP file so the PHP file can save it, and the PHP file needs to know that it went OK.
I thought the best way to do this was to use XML, XSLT and Apache Cocoon. But I'm not sure if this is possible or if it's a good idea since I can't find any information of people doing anything similar. It cannot be an uncommon problem.
The idea came when I read about Cocoon converting XML through XSLT to PDF:
http://cocoon.apache.org/2.1/howto/howto-html-pdf-publishing.html
and being able to take in variables:
http://old.nabble.com/how-to-access-post-parameters-from-sitemap-td31478752.html
This is what I had in mind:
A php file gets called by a user, the php file generates a source XML file with a specific name
The php file then makes a request to Cocoon (on the same web server) to apply the user defined XSLT on the XML file. A parameter will be needed here to know which XSLT to apply.
The request is handled by the PHP file and then saved as a PDF on the server, and can later be mailed away.
Will this work at all? Is there a better way to handle this?
The core problem is that the users need to be able to customize the layout on the PDFs themselves, and I need the server to save the PDF and to mail it later on. The users will use it for order confirmations, invoices, etc. And I wouldn't like to hard code the layout for each user.
I've had some good results in the past by setting up JasperReports Server and creating reports using iReport Designer. They're both available in F/OSS ("community") editions, though you can pay for support and value-adds if you need those things.
This was a good solution for us, since we could access it via the Java API for our Java system, and via SOAP for our PHP system. The GUI designer made tweaking reports very easy for non-technical business staff too.
I use webkithtml2pdf to generate my PDF:s. Just create a document with HTML and CSS for printing like you would usually do, the run it through the converter.
It works great for generating things like invoices. You can use SVG for logos and illustrations, and they will look great in print since they are vector based. Even rounded corners with dotted outlines works perfectly.
A minor gotcha is that the input html must have th htm or html file name suffix, so you can't use the default tempfile functions.
I'm familiar with HTML, CSS, and some PHP and Javascript. I've made several fairly complicated websites for which I've acted as webmaster, manually adding all content in HTML.
I'm about to put in a proposal for my first outside client at a larger business. They have an IT person that would be responsible for updating the website that I create for them.
My question is what to do about content management. I've looked into things like Drupal, but they seem overly complex for this kind of situation, with a single person adding updates of things like text, images, and PDFs.
What would you recommend as the next step above the simple way of manually uploading files and editing HTML like I'm used to? Something like a MySQL database and PHP calls? Would I then store all the images in the database as well?
I guess I'm just trying to figure out what's most common at a medium-sized business. I appreciate any guidance you can offer!
Nathaniel
My company has built large scale projects and medium scale as well. What we like to do is setup a outer page with navigation and an inside page that the client has control of by a control panel with FCK Editor or TinyMCE.
So essentially we have a wrapper page (in our case a MasterPage but in PHP you would use an include or a index.php with a query string to pull the content) and then we drop in HTML content from the database.
That database is populated by the client in their control panel. FCK Editor allows them to upload images and manage links, etc.
For our bigger clients we get very specific in our control panel allowing them to add videos, PDF attachments, blog entries, FAQ content, etc.
Some examples we have are http://pspwllc.com and http://needsontime.com and http://nwacasa.org
Drupal can be bit complex at first but if you stick with the basic modules - it is great for website content management.You can write your own mini content management system - store text and images(MySQL blob format) in MySQL.It will be couple of PHP admin pages and a good render() function responsible for page rendering.
Also have a look at wordpress, it is much easier than drupal. It is less powerful but it may serve your needs. You will NOT need to configure modules like FCKeditor, with it bcoz they come inbuilt. Anybody will be able to edit the content easily. Do note that wordpress is not just for blogs, you can create different kinds of websites with it. Another choice is Joomla, it is also simpler than drupal. But, wordpress is the simplest.
We currently use PDForm to grab a blank pdf file (no values, just form fields and text) and list the form fields. We then query our database for the values which match those field names and create a pdf file with the newly populated data which the user can download from our site. The thing is PDForm is about $5,000 per machine and we are migrating servers. We want an alternative which is actively supported and recommended by the community.
I know Zend is working on a PDF manipulation extension, but we need something quick. I have done testing with PDFtk but the last update for that project was in 2006 and it now seems dead. It would be fine as it is open source, however it seems to be causing errors with certain files that seem to be generated with PDFPenPro (our pdf form creator).
Another solution I thought up was why not just use iText and write a java wrapper which accepts command line input, so that PHP can call it with passthru() or exec(). There are other applications that will work should we completely rewrite our code but we do not want to do that.
What we need.
The ability for PHP to receive the PDF form field names.
PHP to then either create and FDF file (then merge it with the PDF) or send a string to a command line application which will populate the fields with values from our database.
The user can then download the newly created PDF file with the populated form fields.
Am I moving in the write direction by creating a java command line application that will use iText to parse and create the PDF files specified by PHP or does anyone know of any cost effective alternatives?
TCPDF seems to have the most robust feature set that I have seen so far.
Thanks, d2burke, for the tip on TCPDF. I'm not trying to do quite as much as the OP, but the software packages available to accomplish any kind of pdf generation are in the $2k to $3k range. TCPDF is php based, open source and the guy developing it is very supportive.
Always donate to these guys! Where in the world would web development be without it?
So since none of the above solutions would work since TCPDF doesn't work with forms the way we are wanting and since PDFlib converts the form fields to blocks we decided to create a command line wrapper for iText which will grab the form field names from the PDF and then populate them based on the database values.
I don't know if another product whose license costs range from $1k -> $3k could be considered "cost effective", but PDFlib work quite nicely. And if you don't need the PPS functionality, it does get cheaper.
Just today I've started using Drupal for a site I'm designing/developing. For my own site http://jwm-art.net I wrote a user-unfriendly CMS in PHP. My brief experience with Drupal is making me want to convert from the CMS I wrote. A CMS whose sole method (other than comments) of automatically publishing content is by logging in via SSH and using NANO to create a plain text file in a format like so*:
head<<END_HEAD
title = Audio
keywords= open,source,audio,sequencing,sampling,synthesis
descr = Music, noise, and audio, created by James W. Morris.
parent = home
END_HEAD
main<<END_MAIN
text<<END_TEXT
Digital music, noise, and audio made exclusively with
#=xlink=http://www.linux-sound.org#:Linux Audio Software#_=#.
END_TEXT
image=gfb#--#;Accompanying image for penonpaper-c#right
ilink=audio_2008
br=
ilink=audio_2007
br=
ilink=audio_2006
END_MAIN
info=text<<END_TEXT
I've been making PC based music since the early nineties -
fortunately most of it only exists as tape recordings.
END_TEXT
( http://jwm-art.net/dark.php?p=audio - There's just over 400 pages on there. )
*The jounal-entry form which takes some of the work out of it, has mysteriously broken. And it still required SSH access to copy the file to the main dat dir and to check I had actually remembered the format correctly and the code hadn't mis-formatted anything (which it always does).
I don't want to drop all the old content (just some), but how much work would be involved in converting it, factoring into account I've been using Drupal for a day, have not written any PHP for a couple of years, and have zero knowledge of SQL?
How would I map the abstraction in the text file above so that a user can select these elements in the page-publishing mechanism to create a page?
How might a team of developers tackle this? How do-able is it for one guy in his spare time?
You would parse the text with PHP and use the Drupal API to save it as a node object.
http://api.drupal.org/api/function/node_save
See this similar issue, programmatically creating Drupal nodes:
recipe for adding Drupal node records
Drupal 5: CCK fields in custom content type
Essentially, you create the $node object and assign values. node_save($node) will do the rest of the work for you, a Drupal function that creates the content record and lets other modules add data if need be.
You could also employ XML RPC services, if that's possible on your setup.
Since you have not written any PHP for a long time, and you are probably in a hurry, I suggest you this approach:
Download and install this Drupal module: http://drupal.org/project/node_import
This module imports data - nodes, users, taxonomy entries etc.- into Drupal from CVS files.
read its documentations and spend some time to learn how to use it.
Convert your blog into CVS files. unfortunately, I cannot help you much on this, because your blog entries have a complex structure. I think writing a code that converts it into CVS files takes same time as creating CVS files manually.
Use Node Import module to import data into your new website.
Of course some issues will remain that you have to do them manually; like creating menus etc.