OiO.lk Blog HTML Apache Tika: Converting Excel to HTML in Kotlin
HTML

Apache Tika: Converting Excel to HTML in Kotlin


I want to write an application that converts Excel spreadsheets to HTML and preserve styling. Apache Tika is the best free solution I have found so far. I tested some conversions in the command line such as:

java -jar tika-app-2.9.2.jar --html spreadsheet.xlsx > output.html

and the output looks great. I get an HTML file with tons of styling (i.e. table tags, tr tags, etc).

I am having some trouble replicating this behavior in code, using Kotlin with Tika packages. The closest I have gotten is the code below, which parses a spreadsheet and outputs the content as a string.

import org.apache.tika.metadata.Metadata
import org.apache.tika.parser.ParseContext
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser
import org.apache.tika.sax.BodyContentHandler
import java.io.File
import java.io.FileInputStream


fun tikaHelper(inputFilePath: String) {
    //detecting the file type
    val handler = BodyContentHandler()
    val metadata = Metadata()
    val inputstream = FileInputStream(File(inputFilePath))
    val pcontext = ParseContext()


    //OOXml parser
    val msofficeparser = OOXMLParser()
    msofficeparser.parse(inputstream, handler, metadata, pcontext)
    println("Contents of the document:$handler")
}

fun main() {
    val inputFilePath = "./spreadsheet.xlsx"
    tikaHelper(inputFilePath)
}

This simply outputs the data as a string. How can I mimic the behavior of that --html CLI argument?

I tried following various docs online such as https://www.tutorialspoint.com/tika/tika_extracting_ms_office_files.htm but that only gets me as far as spreadsheet–>string. I need to go from spreadsheet–>html.



You need to sign in to view this answers

Exit mobile version