Adding tests for unformated HTML output

Question: Adding tests for unformated HTML output

4.2 years ago by

Germany

Hi,

I am currently adapting version 0.11.2 of FastQC as a Galaxy tool, as we want to include it in our workflow. As currently only an older version of the tool is available on the Galaxy Toolshed we thought of making it public. Unfortunately writing (output file comparison) tests for the new version has proven difficult as it generates HTML output in a single line of text (no newline characters included in the ascii text document of ~300 KB). While it might be possible to include regular expressions for date and time stamps in the output, it seems difficult to provide a good test under this circumstance.

A possibility would be to include the .txt output of FastQC and compare that one to a sample. Does anyone with more experience in Galaxy testing have more suggestions?

Thank you,

html fastqc test • 990 views

ADD COMMENT • link •

modified 4.2 years ago by fubar ♦ 1.1k • written 4.2 years ago by Philipp Rentzsch • 0

4.2 years ago by

Bjoern Gruening ♦ 5.1k

Germany

Bjoern Gruening ♦ 5.1k wrote:

Hi Philipp!

Thanks very much for updating the wrapper! The old fastqc wrapper is located here, including some test cases:

https://github.com/galaxyproject/tools-devteam/blob/master/tools/fastqc/rgFastQC.xml

Hopefully this will help you! Can you please create a PR against this repository, we can merge it if you want and update the old wrapper with the new one.

Thanks again,

Bjoern

ADD COMMENT • link written 4.2 years ago by Bjoern Gruening ♦ 5.1k

Hi Björn,

This is actually pretty much what I was doing (based on the iuc package that should be very similar to your package and including some changes from tmcgowan from the main testtoolshed).

Thanks, Philipp

ADD REPLY • link written 4.1 years ago by Philipp Rentzsch • 0

4.2 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

The current version Bjoern refers to uses test syntax which allows date, filename and other variations in the HTML output each time the test is rerun, by allowing up to 100 lines to differ between the saved and the generated html - the trick is the lines_diff parameter in the test:

There are some complexities in the companion python wrapper which munges the fastqc generated html directory/link structure to make it work properly as a Galaxy composite object Html page and I noticed in the update notes that there may have been changes to the file/directory structure of html output so there will probably be some changes needed. The tool_dependencies link and some references to version numbers will need updating but otherwise it should be fairly straightforward and as Bjoern has said, pull requests welcomed!

ADD COMMENT • link modified 4.2 years ago • written 4.2 years ago by fubar ♦ 1.1k

Yes, I have seen that. Unfortunatly the output for the new version is a single line, so "lines=diff='100' " does not work anymore. I thought about inserting a newline character between each '><' pair but did not want to change the output to much.

ADD REPLY • link written 4.1 years ago by Philipp Rentzsch • 0

I just took a closer look at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/RELEASE_NOTES.txt

Looks like the improvements require more work than I'd hoped.

If the images are embedded then the lines diff will need to allow for every base64 line. Not good.

Maybe the zip file contents are more tractable for creating the Html file output?

If so, then appending \n to <b> or <p> might give deterministic and countable lines to test.

Otherwise perhaps regexps for some headings or something similar ?

I've taken the liberty of starting https://trello.com/c/2gbudf66

ADD REPLY • link modified 4.1 years ago • written 4.1 years ago by fubar ♦ 1.1k

Please log in to add an answer.

Similar posts • Search »