How to convert a web page to a PDF using iText.Net and a WebBrowser object

by Heathesh 12. August 2010 06:09

Disclaimer: This is a hack, it uses screen captures using the WebBrowser object to populate the PDF. I did it in my spare time for fun, and I would not advise using it in a production environment without thorough testing and nerves of steel.

I've noticed a few websites that charge a fee to convert a web page to a PDF. So I was curious if there was an open source alternative to this. I know there are libraries to create PDF's, but all my googling could not find one that would convert a web page or HTML to PDF. Having played around with the iText.Net (http://sourceforge.net/projects/itextdotnet/) libraries before, I thought I would like to see if there was a way to use them to accomplish this for free.

To begin with, you need to download the relevant libraries from http://sourceforge.net/projects/itextdotnet/. For convienience sake I've added the DLL's I used here (zipped - 2,816 KB download):

http://heathesh.com/ftp/itextnet.zip

I'm going to be using Visual Studio 2008 for this and have not tried it using Visual Studio 2010. So using Visual Studio 2008 create a Console Application project. This needs to work in a single-threaded apartment state so using a web page or web service would require you to do more "hacking". You can see my previous post "Run a single-threaded apartment method with parameters that returns a value within a web service" on what is needed to achieve that.

Once your project has been created add all the DLL's from the ZIP file as references to your project. Also add references to the following .Net DLLs:

System.Drawing
System.Windows.Forms

Now add the following usings to your console app Program.cs file:

//NOTE: Do not add System.Drawing as the namespace will cause conflicts with classes in the PDF libraries
using System.IO;
using System.Windows.Forms;
using com.lowagie.text;
using com.lowagie.text.pdf;


Next you need to add the following method to your code. The method will load a web browser, navigate to the specified url and take a screen shot of the web page.

        /// <summary>
        /// Generate the screen shot image for the specified URL
        /// </summary>
        /// <param name="url"></param>
        /// <param name="width"></param>
        /// <param name="height"></param>
        /// <returns></returns>
        public static System.Drawing.Bitmap generateScreenshotImage(string url, int width, int height)
        {
            // Load the webpage into a WebBrowser control
            using (WebBrowser webBrowser = new WebBrowser())
            {
                //disable the scroll bars and supress script errors, then navigate to the url
                webBrowser.ScrollBarsEnabled = false;
                webBrowser.ScriptErrorsSuppressed = true;
                webBrowser.Navigate(url);

                //wait for the page to load
                while (webBrowser.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }

                // Set the size of the WebBrowser control
                webBrowser.Width = width;
                webBrowser.Height = height;

                if (width == -1)
                    // Take Screenshot of the web pages full width
                    webBrowser.Width = webBrowser.Document.Body.ScrollRectangle.Width;

                if (height == -1)
                    // Take Screenshot of the web pages full height
                    webBrowser.Height = webBrowser.Document.Body.ScrollRectangle.Height;

                // Get a Bitmap representation of the webpage as it's rendered in the WebBrowser control
                System.Drawing.Bitmap bitmap = new System.Drawing.Bitmap(webBrowser.Width, webBrowser.Height);
                webBrowser.DrawToBitmap(bitmap, new System.Drawing.Rectangle(0, 0, webBrowser.Width, webBrowser.Height));
                return bitmap;
            }
        }

Okay... so we've now got a method to generate a screen shot of the web page, next we need to be able to retrieve the byte[] array of the specified bitmap the above method returns. So add the following method to your code:

        /// <summary>
        /// Gets the byte array for the specified bitmap
        /// </summary>
        /// <param name="bitmap"></param>
        /// <returns></returns>
        private static byte[] getBytesForBitmap(System.Drawing.Bitmap bitmap)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                bitmap.Save(memoryStream, System.Drawing.Imaging.ImageFormat.Png);
                return memoryStream.GetBuffer();
            }
        }

We're almost done. We now need to add methods to use the bitmap created above and create the actual PDF. For this I created and added the following four methods. The code comments should be self-explanatory:

        /// <summary>
        /// Converts the specified url to a pdf file
        /// </summary>
        /// <param name="url"></param>
        /// <param name="fileName"></param>
        private static void convertUrlToPdf(string url, string fileName)
        {
            byte[] pdfBytes = convertWebPageToPdf(url);

            using (FileStream fileStream = new FileStream(fileName, FileMode.Create, FileAccess.ReadWrite))
            {
                BinaryWriter binaryWriter = new BinaryWriter(fileStream);
                binaryWriter.Write(pdfBytes);
                binaryWriter.Close();
                fileStream.Close();
            }
        }

        /// <summary>
        /// Converts the specified url to a PDF byte array
        /// </summary>
        /// <param name="url"></param>
        /// <returns></returns>
        private static byte[] convertWebPageToPdf(string url)
        {
            // step 1: creation of a document-object
            Document document = new Document();

            // step 2:
            // we create a writer that listens to the document
            // and directs a PDF-stream to a file
            MemoryStream memoryStream = new MemoryStream();
            PdfWriter.getInstance(document, memoryStream);

            // step 3: we open the document
            document.open();

            //get the screen shot of the web page
            using (System.Drawing.Bitmap screenshot = generateScreenshotImage(url, 1020, -1))
            {
                //if there is more than one page, split the image otherwise just add the image
                if (screenshot.Height > 1500)
                    separatePages(document, screenshot);
                else
                    addImage(document, screenshot);
            }

            // step 5: we close the document
            document.close();
           
            //return the byte[] of the pdf
            return memoryStream.GetBuffer();
        }

        /// <summary>
        /// Add the image to the PDF document
        /// </summary>
        /// <param name="document"></param>
        /// <param name="screenshot"></param>
        private static void addImage(Document document, System.Drawing.Bitmap screenshot)
        {
            Image png = Image.getInstance(getBytesForBitmap(screenshot));
            png.scalePercent(50);
            document.add(png);
        }

        /// <summary>
        /// Separates the pages of the bitmap into the PDF document
        /// </summary>
        /// <param name="document"></param>
        /// <param name="screenshot"></param>
        private static void separatePages(Document document, System.Drawing.Bitmap screenshot)
        {
            int reminder = screenshot.Height % 1500;
            int pages = screenshot.Height / 1500 + (reminder > 0 ? 1 : 0);
            int y = 0;
            int height = 1500;

            for (int i = 0; i < pages; i++)
            {
                //if this is the last page, and we have a reminder, we need to adjust the height accordingly
                if (i == pages - 1 && reminder > 0)
                    height = screenshot.Height - y;

                using (System.Drawing.Bitmap pageBitmap = screenshot.Clone(new System.Drawing.Rectangle(0, y, 1020, height), System.Drawing.Imaging.PixelFormat.DontCare))
                {
                    //add the image
                    addImage(document, pageBitmap);

                    //increment the height counter to move to the next page
                    y += 1500;
                }
            }
        }

Okay so we've now got everything we need. In our Main method we simply need to call the convertUrlToPdf method with a URL and PDF file name and it will generate a PDF of the website for us. There is just one thing, the WebBrowser control can only run in a single-threaded apartment state. So we need to decorate our Main method with the STAThread attribute:

        [STAThread] //run in single-threaded apartment state
        static void Main(string[] args)
        {
            convertUrlToPdf("http://iservice.co.za", "iservice.pdf");
        }


That's basically it. Run the code and you'll find the iservice.pdf in your output (bin\Debug - depending on your configuration mode) folder.

Happy PDFing!

Tags: , , , , ,

Development | .Net | Visual Studio 2008 | VS2008 | PDF



Powered by BlogEngine.NET 1.5.0.7 (with enhancements by Heathesh)
Theme by Mads Kristensen (with tweeks by Heathesh)

Certifications

Microsoft Certified Professional

Microsoft Certified Technology Specialist

Answer Questions

 

Tag cloud

Calendar

<<  August 2014  >>
MoTuWeThFrSaSu
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567

View posts in large calendar

http://heathesh.com