how to import large xml(custom) file into wordpress

how to import large xml(custom) file into wordpress - php

I am new to Wordpress, tried couple of plugins to import xml file. This file is quite huge. I am unable to import it. Any tutorial or suggestions?
Edit : format of the XML file is below
<articles>
<article id="1240xxxx" timestamp="April 27, 2009, 8:26 am" published="1">
<title>Theme Parks in Tenerife</title>
<pageName>theme-parks-in-tenerife-408</pageName>
<imageFile>blogthemeparkstenerife.jpg</imageFile>
<imageAlt>Theme Parks in Tenerife</imageAlt>
<content>
<p>Anyone taking a
holiday in Tenerife.
</p>
</content>
<summary>Theme Parks in Tenerife offer an alternative to the traditional beach holiday, providing entertainment for families.</summary>
<tags>
<tag>tenerife</tag>
<tag>holidays</tag>
<tag>parks</tag>
<tag>beaches</tag>
<tag>island</tag>
</tags>
</article>

So the markup is incompatible with the Wordpress import function. That leaves two other options:
Transform the XML into wordpress-compatible XML using XSLT.
Import the XML into the MySQL database. This might require transforming XML to SQL with XSLT.
Both require some understanding of wordpress internals. In the first case you will need to learn about the wordpress export markup, in the second case you will need to learn about the database scheme of wordpress posts (and tags and categories). In both cases you need to learn some XSLT, but that's a real valuable tool anyway.
A 'last resort' option would be something along the lines of parsing the XML and scripting the submission of the articles. The viability of this option depends on the ways you can 'automate' article submission in Wordpress. I know there's a way to submit articles through email, but I don't know how well that supports tags and categories.
These might not be the most pretty options, and you might be screwed anyway. But this is the least I could do.

You can parse the XML into a different format using a scripting language. I would use JQuery for familiarity [a good guide is here: http://net.tutsplus.com/tutorials/javascript-ajax/use-jquery-to-retrieve-data-from-an-xml-file/]
You could use that method to then create a HTML document which this import plugin could use: http://wordpress.org/extend/plugins/import-html-pages/
Or you could use one of the CSV importers and render the XML into a useful format for that.
If it is massive your browser/script might struggle but it should be fine.

Use the WP All Import plugin for WordPress.
WP All Import v3 was just released and it has support for huge files (100Mb and up).
Free version: http://wordpress.org/extend/plugins/wp-all-import/ (good enough unless you need to import to Custom Post Types or download images)
Paid version: http://www.wpallimport.com/

Related

Online user guide plus PDF download

At the moment I am managing my user guide using Microsoft Word 2003 and am converting it to a PDF file that can be downloaded from website plus is included by product installer.
I would like to move to a mechanism that achieves the following:
Generates PDF file with clickable TOC and front page
Generates HTML5 compliant output per chapter/section but without HTML skeleton
Generates JSON TOC for user guide (chapter/section outline)
I would like to package the PDF file with the distributed product.
I would like to create some simple PHP scripts that generate HTML pages with a context sensitive TOC (showing sections of current chapter) plus showing the relevant documentation.
I have no issues with developing the PHP scripts to achieve this, but I would like to know how I can generate the above outputs. I would preferably like to type documentation using an off-the-shelf GUI. I am happy to write XSLT2 stylesheets to perform any necessary conversions.
To give people an idea of what I am after:
Current PDF manual: http://rotorz.com/tilesystem/user-guide.pdf
API documentation which is generated using custom XSLT2 stylesheets into a bunch of "incomplete" HTML files, with a JSON TOC which is then brought together by PHP: http://rotorz.com/tilesystem/api
As you navigate through my API documentation you will notice that the TOC on the left is context sensitive. I would like my user guide to work in a similar way.
Is there a free alternative to Prince: http://www.princexml.com/ for paged media CSS?

After spending quite some time reading into lots of variations I have come across a potential solution...
Create a very simple "static" CMS using PHP and http://aloha-editor.org for my WYSIWYG editor. Possibly using https://github.com/chillitom/CefSharp to embed the editor straight into a more relevant GUI.
Convert the HTML5 pages into PDF using "wkhtmltoxdoc" with custom cover, header and footer .html files. Plus generates a TOC page automatically.
"wkhtmltoxdoc" also generates an XML TOC which can easily be converted to JSON.
I am still experimenting with "wkhtmltoxdoc" but it seems pretty good! Unless of course there is an even easier solution...
ADDED:
It seems that my TOC file will need to be a mixture of manually written and automatically generated. Something along the lines of the Eclipse TOC schema will suffice where a simple XSLT stylesheet can automatically fill in the blanks by grabbing H1-6 tags plus adding unique identifiers for hash links.
This TOC can thus be consumed by XSLT2 stylesheets and then finally converted to JSON for consumption by PHP scripts.
Mock-up extract for my existing documentation:
<?xml version="1.0" encoding="UTF-8"?>
<toc>
<topic label="Introduction" href="introduction.html"/>
<topic label="Getting Started">
<topic label="Installation" href="getting-started/installation.html"/>
<topic label="User Interface" href="getting-started/ui/index.html">
<topic label="Menu Commands" href="getting-started/ui/menu-commands.html"/>
<topic label="Tile System Panel" href="getting-started/ui/tile-system-panel.html"/>
<topic label="Brush Designer" href="getting-started/ui/brush-designer.html"/>
</topic>
<topic label="User Preferences" href="getting-started/user-preferences.html"/>
</topic>
<topic label="Creating a Tile System" href="creating-a-tile-system">
<!-- ... -->
</topic>
</toc>
Reference to Eclipse documentation:
http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.isv%2Freference%2Fextension-points%2Forg_eclipse_help_toc.html

After a lot of research and experimentation I have decided to use DITA (Darwin Information Typing Architecture). For me the nicest thing about DITA is that it is topic based which makes the documentation modular and reusable.
The DITA schema is relatively simple and good XML editors provide useful insight into the available elements and attributes.
DITA documents can be combined for purpose using DITAMAP's. For example one might choose to distribute a "Quick Start Guide" which encompasses a minimal amount of information whilst a full blown "User Guide" will contain far more detail. The beauty is that the same information can be reused for both documents; plus the documents can be outputted to a number of delivery formats:
XHTML (single file or chunked)
PDF
Docbook
The process of transforming the output into the delivery format is easily handled using the DITA Open Toolkit (aka DITA-OT). This toolkit is available from: http://dita-ot.sourceforge.net which is installed simply by extracting the provided archive. The toolkit can be accessed easily by running startcmd.bat (on Windows) or startcmd.sh (Unix-like systems).
Customising and branding PDF output is not an easy task. Customizing XHTML output is significantly easier but still requires knowledge of XSL transforms. Customisations can be made by creating a plugin and placing it within the plugins folder of DITA-OT. One thing that I would like to emphasise is that once customisations have been made you must invoke ant -f integrator.xml before changes will become apparent. Lack of this knowledge caused me a lot of confusion!
The generated XHTML files are very simple (which is great!) because this makes them easy to customize. Adding the HTML5 DOCTYPE is not so easy though; but for my purposes this really doesn't matter seen as though my PHP scripts only care about what's inside <body>.
I haven't been able to find any good WYSIWYG editors XML Mind seems to be a really good WYSIWYG editor that is also really easy to use. I suspect that it wouldn't be too hard to create a basic web-based solution using something like the Aloha Editor (http://aloha-editor.org).
Whilst it seems rather difficult to customise the PDF output, it seems quite easy to generate all documentation into a single XHTML page which can then be formatted using CSS, and then finally converted using wkhtmltopdf. I haven't decided on my solution yet, but at least this is a viable option for those who are unable (or don't have the time to) customise the XSL:FO stylesheets of DITA-OT.
ADDED: After some searching I found that there is an another open source alternative to DITA-OT called "Ditac" which seems a lot easier to use and produces a far nicer output. The tool is created by the creators of "XML Mind". Whilst the tool is command line based, those who use "XML Mind" can benefit from a feature rich GUI:
http://www.xmlmind.com/ditac/
Note: I left my previous answer because it may be of use to others.

Importing XML to WordPress

My client is using WordPress as CMS but want's to deliver their posts (in their case property's) trough a XML feed.
Is it possible to get the info from that XML feed and import it to WordPress as posts?
This is what my feed looks like: http://vrds.nl/test.xml
Hoping for help!

There are quite a few plugins that do this. Perhaps one of them fits what you want to do.
http://codex.wordpress.org/Importing_Content

My company made a plugin that can do exactly this.
It is called WP All Import - http://wordpress.org/extend/plugins/wp-all-import/ - it can import XML in any format, download images (something you'll probably need if you are importing properties), and works with any theme.

Wordpress plugin, XML files or Database table..both?

Ok guys, so i am 50% the way through creating a "content manager" plugin for wordpress (mainly for the internal benefit of the company i work for) that can create custom post types, taonamies and meta boes with a prety interface.
At the moment im using XML files created through php to parse and hold the data relating to "post types", "Taxonamies" and "metaboxes". The main reason i began down the xml road was so i could allow users to export to an xml file and import on another wordpress install. simple.
Although no im not sure? is it too server heavy to have the plugin recursing through directorys every each time to init the post types, taxonamies and meta boxes? would i be better served to crete 3 db tables and when i need to import or export simple do the XML from there?
would love to hear our opions?!

I would go with the database-solution. When the XML-File grows size, the parsing will take more and more time, as the whole file is read every time.
In a Database, you can select only the values you need and don't need to parse the whole document every time.
Also, realizing a XML import/export from the values stored in the database shouldn't be that much of a problem.
But if you have very tiny XML-files (like less then 100 chars) and they don't grow much, you'll have to decide if it's worth the time to change to a database.

How would the conversion of a custom CMS using a text-file-based database to Drupal be tackled?

Just today I've started using Drupal for a site I'm designing/developing. For my own site http://jwm-art.net I wrote a user-unfriendly CMS in PHP. My brief experience with Drupal is making me want to convert from the CMS I wrote. A CMS whose sole method (other than comments) of automatically publishing content is by logging in via SSH and using NANO to create a plain text file in a format like so*:
head<<END_HEAD
title = Audio
keywords= open,source,audio,sequencing,sampling,synthesis
descr = Music, noise, and audio, created by James W. Morris.
parent = home
END_HEAD
main<<END_MAIN
text<<END_TEXT
Digital music, noise, and audio made exclusively with
#=xlink=http://www.linux-sound.org#:Linux Audio Software#_=#.
END_TEXT
image=gfb#--#;Accompanying image for penonpaper-c#right
ilink=audio_2008
br=
ilink=audio_2007
br=
ilink=audio_2006
END_MAIN
info=text<<END_TEXT
I've been making PC based music since the early nineties -
fortunately most of it only exists as tape recordings.
END_TEXT
( http://jwm-art.net/dark.php?p=audio - There's just over 400 pages on there. )
*The jounal-entry form which takes some of the work out of it, has mysteriously broken. And it still required SSH access to copy the file to the main dat dir and to check I had actually remembered the format correctly and the code hadn't mis-formatted anything (which it always does).
I don't want to drop all the old content (just some), but how much work would be involved in converting it, factoring into account I've been using Drupal for a day, have not written any PHP for a couple of years, and have zero knowledge of SQL?
How would I map the abstraction in the text file above so that a user can select these elements in the page-publishing mechanism to create a page?
How might a team of developers tackle this? How do-able is it for one guy in his spare time?

You would parse the text with PHP and use the Drupal API to save it as a node object.
http://api.drupal.org/api/function/node_save
See this similar issue, programmatically creating Drupal nodes:
recipe for adding Drupal node records
Drupal 5: CCK fields in custom content type
Essentially, you create the $node object and assign values. node_save($node) will do the rest of the work for you, a Drupal function that creates the content record and lets other modules add data if need be.
You could also employ XML RPC services, if that's possible on your setup.

Since you have not written any PHP for a long time, and you are probably in a hurry, I suggest you this approach:
Download and install this Drupal module: http://drupal.org/project/node_import
This module imports data - nodes, users, taxonomy entries etc.- into Drupal from CVS files.
read its documentations and spend some time to learn how to use it.
Convert your blog into CVS files. unfortunately, I cannot help you much on this, because your blog entries have a complex structure. I think writing a code that converts it into CVS files takes same time as creating CVS files manually.
Use Node Import module to import data into your new website.
Of course some issues will remain that you have to do them manually; like creating menus etc.

Drupal Views: Generate xml file

Is there a views plugin that I can use to generate a xml file? I would like something that I could choose the fields I would like to be in the xml and how they would appear (as a tag or a attribute of the parent tag).
For example: I have a content type Picture that has three fields: title, size and dimensions. I would like to create a view that could generate something like this:
<pictures>
<picture size="1000" dimensions="10x10">
<title>
title
</title>
</picture>
<picture size="1000" dimensions="10x10">
<title>
title
</title>
</picture>
...
</pictures>
If there isn't nothing already implemented, what should I implement? I thought about implementing a display plugin, a style, a row plugin and a field handler. Am I wrong?
I wouldn't like do it with the templates because I can't think in a way to make it reusable with templates.

A custom style plugin is definitely capable of doing this; I whipped one up to output Atom feeds instead of RSS. You might find a bit of luck starting with the Views Bonus Pack or Views Datasource. Both attempt to provide XML and other output formats for Views data, though the latter was a Google Summer of Code project and hasn't been updated recently. Definitely a potential starting point, though.

You might want to look at implementing another theme for XML or using the Services module. Some details about it (from its project page):
A standardized solution for building API's so that external clients can communicate with Drupal. Out of the box it aims to support anything Drupal Core supports and provides a code level API for other modules to expose their features and functionality. It provide Drupal plugins that allow others to create their own authentication mechanisms, request formats, and response formats.
Also see:
http://cmsproducer.com/generate-how-to-drupal-node-XML-XHTML

In Drupal 8 the Services module is now part of core (RESTful Web Services). This will allow you to provide any entity in xml or json. Also with views.
Read more here: https://drupalize.me/blog/201401/introduction-restful-web-services-drupal-8

There is a somewhat old description of this process on the Drupal forums. It references Drupal 4.7 and 5.x. I suspect the steps for 5.x would be same technique if not same code for Drupal 6.

if you use drupal 7 and a higher version of it you can use views data export module for export as xml,xls,...
https://www.drupal.org/project/views_data_export

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.