Tuesday, 25 February 2014

Read content of pdf using iTextSharp

How to read content from pdf file ?
Here is little code to do this stuff.

using iTextSharp.text.pdf.parser;
using System.Text;

namespace TestApplication
{
    public partial class WebForm1 : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            string Path = @"C:\Users\Ajay\Downloads\Anish jesani.pdf";

            iTextSharp.text.pdf.PdfReader pr = new iTextSharp.text.pdf.PdfReader(Path);
            ITextExtractionStrategy pes = new SimpleTextExtractionStrategy();
            int pageCnt = pr.NumberOfPages; // get number of Pages
            string str = PdfTextExtractor.GetTextFromPage(pr, 1);    // 1 = Page number

            str = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default,
 Encoding.UTF8, Encoding.Default.GetBytes(str)));
        }
    }
}

This is done using iTextSharp.dll

No comments:

Post a Comment