Pencil Down

Hello all

This is going to be one of the last posts by me under the GSoC 2013 tag. The Google summer of Code program is coming to its end with today being the ‘strict pencil down‘ date. The project was about improving the textLayer of the PDF.js.

My work started from fixing the canvas methods to generate complete details about the text position, height , width, angle etc using formulas which are present in the project proposal. After that I worked on the textLayer code to make use of those information and place the textLayer divs above the canvas. I worked on the vertical text issue separately which needed a few more adjustments after the  canvas – textLayer overlapping was over.

Then I started working on the generator code where I needed to implement various parser operators to get text position, angle and direction. This part of the project was the most important and most challenging part as the code needed to be built from the level zero. I had to read a lot of documentation to get the formulas to be used and to debug test pdfs to make sure all the operators required are implemented. I had to manually run test on all pdf documents present in master/test/pdfs/ to make sure the final code was good to go. When final code for this issue landed it was a big relief as implementation of this will improve the textLayer rendering and parsing easier. We can remove many lines of code from the canvas part of PDF.js now cause of the patch submitted.

The project is still in progress as I was not able to attend the issue related to text lines appearing in one single line when copy pasted, but because of the previous patch which was submitted, resolving this issue wont be hard now.

I have published the code on my gh-pages you can take a look if you go through my github account which is https://github.com/SSk123  or can directly try to view the published code from http://ssk123.github.io/pdf.js/

In the end I would want to thank from the bottom of my heart to my mentors Yury, Brendan and Bill. If it was not for you guys I would have not made these contributions. I really admire the patience with which my mentors have assisted and guided me and I would just want to say guys are really the coolest people I have talked to and it was such a pleasure and honor to work and learn under you. I look forward to make more contribution to PDF.js  and Mozilla as a whole.

Signing off .. 🙂

 

Text Extraction Code for the TextLayer

Hello all

The TextLayer building through the extract code needed implementation of certain formulas. As mentioned in the previous post Td, TD, Q, q, cm, BT and a few other operators needed to be implemented to do the job. The formula for a few of these operators can be seen below.

BT operator

bt

TD and Td operators

TD_Td

Tm and T* operators

Tm_T

Q, q, cm operator

qQcm

The formula employed to  calculate the position of rendered text is in form of matrix the formula for which is given below

texr space

The above formulas lay the ground work on how the text parser operators will work.

The implementation and details on code base can be seen here.

Till then happy coding  🙂

Extract Code working on TextLayer

Hello all

AS per the time line for my project the next job is to implement the TextLayer by implementing the transform matrix to the getTextContent code that is the extract code.

Before getting more into it I would like to show how our pdf document looks to the parser parsing it. Its really amazing how the pdf document viewed on a viewer is so many lines of code as the parser parses it.

pdftk

A pdf document can be can be uncompressed using pdftk toolkit. Download/install the latest version of pdftk form pdftk-download

The command line for uncompressing a pdf-document in terminal is:

pdftk doc.pdf output doc.unc.pdf uncompress

Our job is to implement the operators like TD, Td, T*, cm, q, Q, Tm etc in our getTextContent code to get the transformation matrix from the parser.

The function of operator was taken from the reference pdf PDF 32000-1:2008

The implementation of the transform matrix is still in progress, the diff file and other details will be added to my blog shortly.

Till then Happy Coding  🙂

 

 

Vertical Text Fix for PDF.js

Hello all I have been a lot behind schedule in updating about my work on my current GSoC project. My sincere apologies.

After me and my mentors established a good interaction between the Canvas and our div textLayer floating over the canvas we realized that we missed behind a small fact that their was still a small portion of the textLayer which was not in its place —  The rotated/vertical text.

So our next job was to edit the canvas function createTextGeometry such that it handles the text at different angles. Initially we were focused on fixing only CJK (Chinese/Japanese/Korean) text which are written vertically, so our job was to just rotate the text by 90DEG, but then we created a sample pdf document with text of angles 45DEG, 20DEG, 90DEG. Our job was to create a fix which works not just for CJK text but for all sorts of rotated text and the textLayer coincides with the Canvas even when our Page is rotated by (n * 90)DEG

So the work was to position the text as per it is visible on the canvas. The diff for the above given problem can be found here .

Our next job is to fix the textLayer such that it is implemented through the parser reading the pdf document.

Details of which I will be updating in the coming blog.

Till then Happy Coding 🙂

 

Resolving the messed up PDF while rotation

Hello, sorry but I have been little behind the schedule on updating my progress on my GSOC project. Well as mentioned in the previous blog me and my mentors have decided to take up one of the major issues first and so I started working on fixing the rotation issue #2095.

The text in a pdf is painted on the canvas of PDF.js but in order to select ,copy and select rotated text from the pdf, the PDF.js team has come up with a great innovative idea. They have created textLayer Div elements float all over the canvas. Now this textLayer does our job easier. As assumed this is a lot of work to create an entire different textLayer to do the job, but the end result is amazing.

Already PDF.js has made lives easier 🙂 , there is this issue where the mounted textLayer is floating a little not as desired, and that is where I am coming to Rescue 😉 .

The issue of messed up text rotation is because of this reason that the textLayer is not overlapping on the canvas and things go real bad when we start rotating the canvas.

Below is a small sample of our tragedy.

issue2095_before2

Well I have been busy with the code trying to find out why are things not working out between the canvas and the textLayer, and I found that our ‘dear’ canvas has not been truthful enough the ‘poor’ textLayer, but I must say ‘It was strictly platonic’ 😛

The canvas function needed to be incorporated with angle functionalities which would deal with situations where rotation or rotated text were treated as expected. After doing that the textLayer need to be informed about those values and transformation of the textLayer has to be done accordingly.

Well this week I sent a pull request consisting of around 26 commits, which I squashed to 1 commit (so that it looks good :-P), and the textLayer is looking pretty good for rotated text now. Here is a sample of the same.

issue2095_after2

The pull request with the diff file can be seen here –> DIFF FILE

Till the end of this week Yury (my coolest mentor) asked me to learn how I can run test and add them in the test_manifest.json. So probably that is goal for now before the pull request can be merged with the master.

Well to sum-up I will tell you about my mentor, Yury he is such a great guy and has been so supportive. Hes one of the most coolest Mozilla folks (including my other mentors) . I am not a developer and I did not have enough experience with the language in the past, but Yury has always been patient and ready to help.

I am hoping a great learning experience working with him.

Till then Happy Coding 🙂

GSoC Community Bonding is ON !!

Hello all,

First of all let me officially say it ‘I am a SoCian‘ 🙂 Congrats to all other 20 members who got selected for GSoC and are helping Mozilla with there projects. The list of selected Mozilla student helpers is here.

The first half of June is dedicated to the Community Bonding period by the GSoC, the time to get to know your mentor and discuss the project with them. Well my project is Improving the text selection and rotation in PDF.js. The project is as the name says is a JavaScript project. My mentors Yury, Bill and Brendan have discussed and sorted out a plan to go about the project.

As my project proposal didn’t include a project time-line we decided to make one for starters and came up with this.

We have decided to work on the major issues first that is the text rotation issue #2095.

Hope all goes well 🙂 All the best to all participating in GSoC2013.

Linting Js files for PDF.js

I was trying to run node make lint to lint JavaScript files for PDF.js and found an error saying ‘jshint not installed’. I went to the Mozilla PDF.js contributing wiki page

https://github.com/mozilla/pdf.js/wiki/Contributing#-4-run-lint-and-testing

which is a wiki on how to Run Lint test on PDF files.

The steps involved installing syntastics if you are a VIM user, and since I am a Linux lover VIM is my favorite editor.

I faced a few problems while installing syntastics through the wiki page, so I did a little Google search and came a with a solution to fix the issue.

->  To install syntastics for VIM you need to first install pathogen.vim

  • Open your Terminal and type the command
    mkdir -p ~/.vim/autoload ~/.vim/bundle && \
    curl -LSso ~/.vim/autoload/pathogen.vim https://tpo.pe/pathogen.vim

    create a ~/.vimrc file if you don’t have one and add the following command to it

    execute pathogen#infect()
  • Save and quit (:wq in command line)

-> Now you can install syntastics as a pathogen bundle

  • Go to  the directory

$ cd ~/.vim/bundle

  • Then clone syntactic using the following command in the Terminal
$ git clone https://github.com/scrooloose/syntastic.git

-> Close all existing VIM editor and open VIM in the Terminal

  • Type :Helptags in the command line

If you get an error do the above steps again.

The above were the steps to install syntastics now we can look at how to install jshint

If you have node.js properly installed in your system you can directly install jshint by the following command

$ npm install jshint

And hence we are DONE.

felipe's Blog

felipc traveling through the blogosphere

Minerva

through the lense of perception

Euphoria Reload3d

Journey towards the h1dd3ntru7h.......

FOR-BIN-SEC

Yet another blog by a security enthusiast!

The tempest of my soul

the thoughts of mine.....

antoraunplugged

Just another WordPress.com site

T.Neha

Nobody can go back and start a new beginning, but anyone can start today and make a new ending.