Just an update to a recent post (Office 2007 vs. OpenOffice vs. Novell OpenOffice). I just installed Office 2007 SP1. Previously, Sun's converter didn't support Office 2007, it turns out this was due to the fact you could save as ODF, but not open an ODF document in Word 2007. Well now, you can open an OpenOffice document in Word 2007. I just tried this myself using my same test document from before. I opened the document in Word, saved as an OpenOffice document, closed Word and opened the document in OpenOffice to check the formatting, then closed OpenOffice and reopened the document into Word. The formatting was retained throughout. Very cool.
Sun's official stance on the download page is still "support for Microsoft Office 2007 is planned for one of the next releases." But it looks like you can start using it now with more than just a "hello world" test document.
When I save as .odt from Word, I sometimes have an issue where Word will crash, it's intermittent and I'm always able to save on the second attempt. I just started using this converter, so if I come across any recurring or other issues, or find out about any more updates, I'll post them.
Friday, December 14, 2007
Monday, December 10, 2007
ODF - Hello World with XQuery
This is the "How to create the simple hello-world document for OpenOffice using XQuery" example. I'm using MarkLogic Server for this of course. If you're interested you can download a free copy using a Community License here. I'm also using OpenOffice 2.3.1, available here.
An OpenOffice document is just a zip file. It's actually a .jar file, but we don't care about that right now. We can unzip it and extract the separate XML parts, just how we would with any zip file. In fact, in Windows, you can just change the extension of your OpenOffice document from .odt to .zip, right-click, select "Extract All", then take a look at the files in the folder.
Similarly, we can create an OpenOffice file by creating the required parts and zipping them up into a package. When we say an ODF or OpenOffice document, we're usually referring to the collection of XML documents that make up the .odt you use with OpenOffice. ODF stands for Open Document Format, and is the name given to the XML that OpenOffice is using to create your documents.
This should all sound very familiar. It's very similar to what I've posted on Office Open XML, and a .docx file in Word; We often say an Office 2007 / Word document, but mean the collection of XML files that make up the .docx.
The minimal .odt document has just 2 parts: content.xml and manifest.xml.
We place the main text and body of our document in content.xml, and place the assorted files that compose the document in the manifest.xml. I'll explore the other files in future posts, they of course have to do with styling your document, meta-information about your document (created by, created date, etc.), images, etc. Since we aren't using any other files in this example, this document will have zero formatting and no meta-information associated with it.
Ok, place the following in a file named openODF.xqy under /Docs of your MarkLogic install. You can then evaluate by opening your browser and navigating to http://localhost:8000/openODF.xqy. Your test document will open directly into OpenOffice Writer. You can then mess with the XQuery and XML to create other types of documents. Good times!
Note: To keep the code readable I had to split a couple of nodes across lines. I was able to cut-and-paste this into a .xqy and evaluate with no problems, but I mention in case you run into any issues.
For those interested in the ODF format, there's a free book, OpenDocument Essentials, as well as the ODF specification.
The content.xml in a nutshell: <office-document> is our root element.
It's first children are optional and can be <office:scripts>, <office:font-face-decls>, and <office:styles>.
We don't see those here, we'll examine those more in the future. The only required element is <office:body> and this is where the magic happens. It's first child element tells us what type of document we're actually dealing with; we have the choice of:
<office:text>
<office:drawing>
<office:presentation>
<office:spreadsheet>
<office:chart>
<office:image>
We're dealing with text. From there we see it's child element <text:p> , which signifies a paragraph. Now, the only thing funky above is the use of <text:s>, which signifies whitespace. There's a couple of pages on how to handle whitespace in the Essentials book. When you opened the document, you might not have noticed, but each sentence was indented 5 spaces. You can safely remove the <text:s> node for the example above.
Ok, so it's a little more than just a HelloWorld example, but we're not really interested in a one paragraph, one word document. For more fun, we can just start extracting OpenOffice documents and insert the pieces into our XML Server. It's all just XML at the end of the day, and I actually think it's fun to dissect these formats and then transform them into whatever I want. So with ODF and Office Open XML documents in my server, I can write queries to find what I'm looking for and then just deliver the content in any requested format. Sweet!
An OpenOffice document is just a zip file. It's actually a .jar file, but we don't care about that right now. We can unzip it and extract the separate XML parts, just how we would with any zip file. In fact, in Windows, you can just change the extension of your OpenOffice document from .odt to .zip, right-click, select "Extract All", then take a look at the files in the folder.
Similarly, we can create an OpenOffice file by creating the required parts and zipping them up into a package. When we say an ODF or OpenOffice document, we're usually referring to the collection of XML documents that make up the .odt you use with OpenOffice. ODF stands for Open Document Format, and is the name given to the XML that OpenOffice is using to create your documents.
This should all sound very familiar. It's very similar to what I've posted on Office Open XML, and a .docx file in Word; We often say an Office 2007 / Word document, but mean the collection of XML files that make up the .docx.
The minimal .odt document has just 2 parts: content.xml and manifest.xml.
We place the main text and body of our document in content.xml, and place the assorted files that compose the document in the manifest.xml. I'll explore the other files in future posts, they of course have to do with styling your document, meta-information about your document (created by, created date, etc.), images, etc. Since we aren't using any other files in this example, this document will have zero formatting and no meta-information associated with it.
Ok, place the following in a file named openODF.xqy under /Docs of your MarkLogic install. You can then evaluate by opening your browser and navigating to http://localhost:8000/openODF.xqy. Your test document will open directly into OpenOffice Writer. You can then mess with the XQuery and XML to create other types of documents. Good times!
Note: To keep the code readable I had to split a couple of nodes across lines. I was able to cut-and-paste this into a .xqy and evaluate with no problems, but I mention in case you run into any issues.
define function generate-odt(
$docmanifest as node(),
$content as node()
) as binary()
{
let $manifest :=
<parts xmlns="xdmp:zip">
<part>META-INF/manifest.xml</part>
<part>content.xml</part>
</parts>
let $parts := ($docmanifest, $content)
return
xdmp:zip-create($manifest, $parts)
}
let $docmanifest :=
<manifest:manifest
xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
<manifest:file-entry
manifest:media-type="application/vnd.oasis.opendocument.text"
manifest:full-path="/"/>
<manifest:file-entry manifest:media-type="text/xml" manifest:full-path="content.xml"/>
</manifest:manifest>
let $content :=
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
office:version="1.1">
<office:body>
<office:text>
<text:p text:style-name="Standard">
<text:s text:c="5"/>Hello World! This is my first paragraph.
</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="Standard">
<text:s text:c="5"/>This is another paragraph.</text:p>
</office:text>
</office:body>
</office:document-content>
let $package := generate-odt($docmanifest, $content)
let $filename := "hello-world.odt"
let $disposition := concat("attachment; filename=""",$filename,"""")
let $x := xdmp:add-response-header("Content-Disposition", $disposition)
let $x := xdmp:set-response-content-type("application/vnd.oasis.opendocument.text")
return
$package
For those interested in the ODF format, there's a free book, OpenDocument Essentials, as well as the ODF specification.
The content.xml in a nutshell: <office-document> is our root element.
It's first children are optional and can be <office:scripts>, <office:font-face-decls>, and <office:styles>.
We don't see those here, we'll examine those more in the future. The only required element is <office:body> and this is where the magic happens. It's first child element tells us what type of document we're actually dealing with; we have the choice of:
<office:text>
<office:drawing>
<office:presentation>
<office:spreadsheet>
<office:chart>
<office:image>
We're dealing with text. From there we see it's child element <text:p> , which signifies a paragraph. Now, the only thing funky above is the use of <text:s>, which signifies whitespace. There's a couple of pages on how to handle whitespace in the Essentials book. When you opened the document, you might not have noticed, but each sentence was indented 5 spaces. You can safely remove the <text:s> node for the example above.
Ok, so it's a little more than just a HelloWorld example, but we're not really interested in a one paragraph, one word document. For more fun, we can just start extracting OpenOffice documents and insert the pieces into our XML Server. It's all just XML at the end of the day, and I actually think it's fun to dissect these formats and then transform them into whatever I want. So with ODF and Office Open XML documents in my server, I can write queries to find what I'm looking for and then just deliver the content in any requested format. Sweet!
Sunday, December 9, 2007
Office 2007 vs. OpenOffice vs. Novell OpenOffice
Ok, first things first. I work for the Mighty MarkLogic in San Carlos. I'm currently doing a series of blog posts on OOXML and MarkLogic Server for our developer network. Today's post came about as my sole focus for the last few months has been OOXML, but I've been a big fan of OpenOffice for quite some time. I thought I'd repeat some of the examples I'm doing for OOXML with ODF. I have no allegiance to any Office Productivity application, my allegiance, if to anything, is to XML and XQuery.
Now that you know this: The opinions and posts on this blog are mine and mine alone and don't necessarily reflect the opinions of MarkLogic Corp. If they like it, I'll let 'em claim it, but if not, they were out of town when I wrote this, capiche?
The following includes my experience testing OpenOffice 2.3.1 , Novell's OpenOffice 2.3, Novell's OpenOffice.OpenXML translator 1.0.0-2, and the Sun ODF Converter 1.1 for Microsoft Office to see how easily they interoperate with a .docx file, and a .odt file.
Novell has an OpenXML translator available, that will allow their version of OpenOffice to open and save .docx files. To install their version of OpenOffice, it's an ISO, so you'll need daemon tools or to burn it to disk. Once installed, you can add the converter by opening OpenOffice and going to tools -> Extensions. Click 'add' in the dialog box and add the addin by browsing to its location. Click ok. Restart OO and you'll now have the option to open and save as .docx. Ok, not too simple for an ordinary user, but it works. I think you'd need to be kind of a geek to even know that Novell has their own version of OpenOffice or the translator or how to install, but its cool that its there.
I have that same sample document I described above that originated as a .docx. I opened in Novell's Open Office, and the formatting was off. It lost the indentations for paragraphs, and it shifted a table halfway down the second page.
I then saved the .docx as a .odt in Word 2007. I opened in Novell's OpenOffice, and it looked perfect. I then saved as .docx from within Novell's OpenOffice. I closed NOO. When I re-opened, the formatting was off again, similar to the original document. When I opened in Word 2007, it had lost the indentation and the bottom 1/2 of the still shifted table was missing.
For simple "Hello World" documents, the formatting was fine when I saved as .docx and re-opened in Office 2007 or Novell's OpenOffice, but for other documents, it appears Novell's OpenOffice OOXML support still needs some work.
I just started reading OASIS OpenDocument Essentials. I hope to post some examples soon.
Now that you know this: The opinions and posts on this blog are mine and mine alone and don't necessarily reflect the opinions of MarkLogic Corp. If they like it, I'll let 'em claim it, but if not, they were out of town when I wrote this, capiche?
The following includes my experience testing OpenOffice 2.3.1 , Novell's OpenOffice 2.3, Novell's OpenOffice.OpenXML translator 1.0.0-2, and the Sun ODF Converter 1.1 for Microsoft Office to see how easily they interoperate with a .docx file, and a .odt file.
- Office 2007 Professional out-of-the-box will not open a .odt document, nor save as .odt.
But we can save as .odt if we install and use Sun's ODF Converter. - OpenOffice 2.3.1 out-of-the-box will not open a .docx file, nor save as a .docx.
- Novell's 2.3 OpenOffice out-of-the-box will not open a .docx file, nor save as a .docx.
But we can save as .docx and open from a .docx if we install their translator.
Novell has an OpenXML translator available, that will allow their version of OpenOffice to open and save .docx files. To install their version of OpenOffice, it's an ISO, so you'll need daemon tools or to burn it to disk. Once installed, you can add the converter by opening OpenOffice and going to tools -> Extensions. Click 'add' in the dialog box and add the addin by browsing to its location. Click ok. Restart OO and you'll now have the option to open and save as .docx. Ok, not too simple for an ordinary user, but it works. I think you'd need to be kind of a geek to even know that Novell has their own version of OpenOffice or the translator or how to install, but its cool that its there.
I have that same sample document I described above that originated as a .docx. I opened in Novell's Open Office, and the formatting was off. It lost the indentations for paragraphs, and it shifted a table halfway down the second page.
I then saved the .docx as a .odt in Word 2007. I opened in Novell's OpenOffice, and it looked perfect. I then saved as .docx from within Novell's OpenOffice. I closed NOO. When I re-opened, the formatting was off again, similar to the original document. When I opened in Word 2007, it had lost the indentation and the bottom 1/2 of the still shifted table was missing.
For simple "Hello World" documents, the formatting was fine when I saved as .docx and re-opened in Office 2007 or Novell's OpenOffice, but for other documents, it appears Novell's OpenOffice OOXML support still needs some work.
I just started reading OASIS OpenDocument Essentials. I hope to post some examples soon.
Labels:
Novell OpenOffice,
ODF,
Office 2007,
OOXML,
OpenOffice
Subscribe to:
Posts (Atom)

