Apache Tika + J2Html +Bootstrap + Java
I was just taking a look at Apache Tika today and because of this I had a chance to brush up my HTML and Regex
I used a library called J2HTML which really made my output easier to handle, Apart from this I used Bootstrap to make my table look neater.
I ended up with the below kind of look and feel
The Columns are as below
1. File Name along with the link. \
2. File Type String
3. Metadata Table - All metadata elements pulled out by Tika
4. Categories - Basically Tags. - Right now it is very rudimentary but will improve it soon - These categories are arrived by analysing the content of the files.
Below is the code that is used to parse a file
Parser parser = new AutoDetectParser(); BodyContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); FileInputStream inputStream = new FileInputStream(file); ParseContext context = new ParseContext();Once the above code is executed. handler.toString() can be used to get the textual content of the file.Below is the code that draws the TableString page = html() .with( head().with(title(pageTitle), link().withRel("stylesheet").withHref("Stylesheet.css"), link().withRel("stylesheet").withHref("http://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"), script().withSrc("https://ajax.googleapis.com/ajax/libs/jquery/1.12.2/jquery.min.js"), script().withSrc("http://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js") ), div().withClass("container").with(h2(headerTitle)), body().with(div().withId("header").with(h1(pageTitle)))
.with(table()
.withClass("extra-table-padding table table-responsive table-inverse table-hover table-bordered")
.with(thead().withClass("thead-inverse")
.with(tr().with(th("File Name"), th("Type"), th("Metadata"), th("Category"))))
.with(tbody().with(rows)), footer().withClass("footer")
.with(label(footerTitle)))).render(); PrintWriter out = null; try { System.out.println(location.getFile()); File htmlFile = new File("G:/Dropbox/Innominds/TikaSample/Results.html"); out = new PrintWriter(htmlFile); out.println(page); Desktop.getDesktop().browse(htmlFile.toURI()); } catch (FileNotFoundException e) { e.printStackTrace(); System.out.println("FileNotFoundException - " + e.getMessage()); } catch (IOException e) { e.printStackTrace(); System.out.println("IOException - " + e.getMessage()); } finally { out.close(); }
Comments