Stencila Converter


#1

We have been doing quite a lot of work around Stencila Converter recently.
The Converter is one of the key components enabling users to collaborate without having to give up their preferred editing format.

The Converter leverages the awesome power of Pandoc to convert between formats, including Journal Publishing Tag Set (JATS) that Stencila supports. Hamish Mackenzie contributed a JATS reader to Pandoc (included in 2.0.6 release) and Nokome Bentley improved the existing Pandoc JATS writer.

One of the decisions we have to make when polishing up the Converter is which Pandoc enhanced Markdown extensions to enable or disable. There is probably no silver bullet but we want the Converter to output documents in Markdown standard that suits most users.

We’d be keen to hear your opinions!
You can either comment under this post directly or file in an issue in the Stencila Comvert repository.


#2

One vote for tex_math_dollars, so that we can use $ and $$, which is a widely supported syntax.


#3

Thank you! I filed in an issue https://github.com/stencila/convert/issues/27 - please feel free to comment there as well.


#4

I posted the following issue also on the repository: https://github.com/stencila/cli/issues/71

I’ve followed the installation instructions for MacOsX, moving the stencila executable to the /Applications folder. However when I open it by double click it automatically executes an exit command. Here’s the transcript:

dhcp-10-249-27-139:~ memyselfandi$ /Applications/stencila ; exit;

   stencila 0.30.1 

   USAGE

     stencila <command> [options]

   COMMANDS

     convert [input] [output]      Convert files or folders to other formats
     setup                         Setup required software dependencies     
     help <command>                Display help for a specific command      

   GLOBAL OPTIONS

     -h, --help         Display help                                      
     -V, --version      Display version                                   
     --no-color         Disable colors                                    
     --quiet            Quiet mode - only displays warn and error messages
     -v, --verbose      Verbose mode - will also output debug messages    

logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.

[Process completed]

Am I missing something here or is this a bug? thanks


#5

Hi @lollopus,

Thanks for your post and sorry for the problems. I had a look at the issue you filed in (thanks for that as well!) and I think there are two things that are getting tangled up here (which is the problem with our documentation which should make it clearer that at the moment different components which can work independently and not always are connected through the common interface).

The above problem you describe is, I assume, when you try to use the Stencila Converters? At the moment the easiest way to use them is through Stencila Command Line (CLI) which I suppose you installed following these instructions? Then to use it, you need to run the commands as described in your terminal -at the moment double clicking only runs a single stencila command
which displays help on how to use it in the terminal (which is the output you get).

I hope I understood correctly what your problem is - not, sorry! I

We have some examples which you can try converting. You could try downloading it, unzipping and then navigate to the hello-world directory and try some conversions (eg. stencila convert hello-world/hello-world.md hello-world/hello-world.docx)

Hope this helps!


#6

Hi @Aleksandra and thanks for your feedback!

I did follow the instructions but (on MacOsX High Sierra) had to issue
me$ PATH=$PATH:/Applications
before Terminal could see the “stencila” command without using its full path. This might need to be specified in the installation instructions (btw this will work only for the current Terminal session, I think)

Once I got over this initial obstacle I tested converting the hello-world.md example, but again encountered problems:

  1. MS World for mac (latest version) throws an error when attempting to open the file converted to ~/Desktop/output.docx
  2. converting to pdf I get errrors from Terminal:
    Error converting "/Users/lorenzocangiano/Downloads/examples-master/hello-world/README.md" to "/Users/lorenzocangiano/Desktop/output.pdf": Error calling Pandoc: message: pdflatex not found. Please select a different --pdf-engine or install pdflatex...
    Admittedly I didn’t attempt to install pdflatex.

No need to apologize on your part :slight_smile: as an eLife author and supporter I am enthusiastic about the concepts driving texture and stencila development. I look forward to helping out as a beta tester and, In fact, would love to dispense of Word, Endnote, etc for my next manuscripts.

A suggestion I have is to make it extra clear on your websites that these software are in full development, not quite ready for daily use in research but are publicly released to encourage feedback on desirable new features, etc. The current homepage of stencila, for instance, is very bold and attractive (something which definitely caught my attention) and may lead one to expect a smooth experience and an immediately viable alternative to “word processor and spreadsheet interfaces that you and your colleagues are already used to”. This I think would be counterproductive and should be avoided.

Lorenzo


#7

Hi Lorenzo, thank you for quick reply and all the feedback. This is really useful and we shall act upon it :slight_smile:

Re the installation on Mac - yes, what you did updated the PATH only for the current session. It actually may be a better practice to copy the stencila executable file to /usr/local/bin which should be in your PATH already. I will update the documentation accordingly.

Re the MS Word error - any chance you can copy-paste what it says (if it says anything?) or take a screenshot? If you have a moment, maybe file it as an issue Alternatively, I can do it and attribute to you.

Re the PDF- indeed that will require pdflatex (this is more Pandoc thing) but you’re right, it should be more specific in the documentation.

Very good point about the website - @nokome put a lot of work into it and looks like he did a great job! But sure, a short disclaimer may be useful.

Please stay tuned and keep sending us your comments!


#8

@Aleksandra As you recommended I opened an issue for the docx conversion error. Unfortunately Word says nothing about the exact problem with the file. Let me know if you want me to attempt other conversions on my system…

As an aside, this morning several of my posts were flagged as SPAM! Here’s a screenshot of what I’m presented with:

I would be suprised if they were perceived as offensive or off-topic by someone in the community (in that case I’d like to know why). I suspect this might be a server-initiated action on a new user posting too frequently since even my reply to the welcome bot was tagged as spam!


#9

As an aside, this morning several of my posts were flagged as SPAM!

@lollopus: yes, your posts where automatically flagged by the Discourse forum software we use, with the message “This new user tried to create multiple posts with links to the same domain (github.com). See the newuser_spam_host_threshold site setting.”. I have unflagged them now. Apologies for the inconvenience.

A suggestion I have is to make it extra clear on your websites that these software are in full development, not quite ready for daily use in research but are publicly released to encourage feedback on desirable new features, etc

I agree, that it needs to be much more clear on the front page that the software is in early beta. We also need to have a dedicated Roadmap page.


#10

RE: the converters on Windows.
I get many errors, despite having a full TexLive installation and the latest pandoc. I see reference to a “customized pandoc”, but it’s only available for Linux, apparently. I would like to see a Windows binary for the customized pandoc, perhaps that would solve my never-ending “! Missing # inserted in alignment preamble.” errors.
Here’s one output:

 ?  Error converting "TM.xlsx" to "TM.pdf": Error calling Pandoc:
  message: Error producing PDF.
! Missing # inserted in alignment preamble.
<to be read again>
                   \cr
l.62 \begin{longtable}[]{@{}@{}}


  args: --from json --output TM.pdf --data-dir=C:\Users\burque505\AppData\Roaming\Stencila\data\pandoc
  content: {
  "pandoc-api-version": [
    1,
    17,
    5
  ],
  "meta": {
    "name": {
      "t": "MetaString",
      "c": "Monkeys"
    }
  },
  "blocks": [
    {
      "t": "Table",
      "c": [
        [],
        [],
        [],
        [],
        [
          [
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "# Monkeys"
                  }
                ]
              }
            ],
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "Tame?"
                  }
                ]
              }
            ]
          ],
          [
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "10"
                  }
                ]
              }
            ],
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "Partly"
                  }
                ]
              }
            ]
          ],
          [
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "30"
                  }
                ]
              }
            ],
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "Completely"
                  }
                ]
              }
            ]
          ],
          [
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "28"
                  }
                ]
              }
            ],
            [
              {
                "t": "Plain",
                "c": [
                  {
                    "t": "Str",
                    "c": "Utterly intransigent"
                  }
                ]
              }
            ]
          ]
        ]
      ]
    }
  ]
}

"Missing # inserted in alignment preamble" - xlsx to pdf
#11

I’m following up on this because I found a good conversion process.
(For really good chart results, save .xlsx (from Excel, not LO) to .ods, and use the first command below or modify the wrapper below accordingly). Microsoft actually seems to support the format better than LO!

LibreOffice running headless converts xlsx or ods to pdf pretty well. For one file at a time:

soffice --headless -convert-to pdf YourTestFile.ods

For most files this works fine, without intermediate conversion to .ods:

soffice --headless -convert-to pdf YourTestFile.xlsx

Using ‘headless’ appears to be optional on Windows.

On Windows, for batch files I needed to use a python wrapper for unoconv, which calls LO. All instances of LO should be closed before running.
wrapperPortrait.py (modified from a post here:

from __future__ import print_function

import fnmatch
import getopt
import glob
import os


matches = []
for root, dirnames, filenames in os.walk(os.getcwd()):
  for filename in fnmatch.filter(filenames, '*.xlsx'):
      matches.append(os.path.join(root, filename))

os.system("unoconv.py -f pdf -P PaperOrientation=portrait " + " ".join(matches))

Might this LO/unoconv combination be worth considering for the Stencila Converters? Of course it would require installing LO, and even the portable .paf version is huge.

Unoconv has a pretty wide range of formats it can convert. Details at its GitHub page.

Regards,
burque505


#12

@Aleksandra, I’m uploading the small test .xlsx file I mentioned. Please keep in mind that I get substantially the same error message with .ods files as well. Again, this results from straight conversion from .xlsx to .pdf. Intermediate conversion to markdown improves results, but has no support for charts.

Edit: Well, I guess I’m not uploading it. “Sorry, new users cannot upload attachments.” Guess I’ll have to wait until I’m an “old user.” :slight_smile:

Regards,
burque505


#13

So sorry about the issues you’re experiencing! The forum software (Discourse) has quite sophisticated levels of anti-spam security. I have changed the settings and hopefully it should work now. Sorry again!


#14

Sorry, I didn’t get notification that there was an answer! Here’s one of the small .xlsx files that generate the error:
TM.xlsx (5.1 KB)
But really, even an Excel file with two columns and one record, just text in all cells, will generate the error.
Regards,
burque505


#15

Hello @burque505
Many thanks! We will investigate. We are at the moment focusing on Dockter tool but will be back onto the Converters in one of the next development sprints. In the meantime, come and join us also on the chat.
We will be in touch!
Cheers,
Aleksandra