How can I download PDF files with HttpRequest

Ask Question

Asked 2 years, 7 months ago

Modified 2 years, 7 months ago

Viewed 292 times

I have a problem, my program reads an ODT file and downloads the links inside it, these links correspond to PDF files available on an intranet. The problem is that as a result of the program, I don't have 128 pdf that are downloaded but I have 128 files (which correspond well in terms of name to what I am supposed to have) without extension, and which are all 18 kb of size.

My question is then the following: Why do I not have PDF files in output but files without extension as on the screenshot? Is it a redirect problem ? I also tried with DownloadFile method and i have the same result.

The System.Diagnostics.Process.Start(link); method works but I can't rename the files because the program only execute them and doesn't downloads them(the browser downloads them).

PS : i'm on .NET 3.5 and Visual Studio 2010

results of downloaded files ===> Here if the first link doesn't work : https://snipboard.io/nwcdM1.jpg

I expected the same names but with pdf extension and more than 18kb...

here is my code :

using System;
using System.IO;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.Zip;

namespace PrepareDocForExternalUse
{
    class Program
    {
        static void Main(string[] args)
        {
            // Prompts the user for the absolute path to an ODT file
            Console.WriteLine("Please enter the absolute path of an ODT file:");
            string odtFilePath = Console.ReadLine();

            // Read the contents of the ODT file
            byte[] content = File.ReadAllBytes(odtFilePath);
            MemoryStream ms = new MemoryStream();
            ms.Write(content, 0, content.Length);
            ZipFile zf = new ZipFile(ms);
            zf.UseZip64 = UseZip64.Off;
            zf.IsStreamOwner = false;
            ZipEntry entry = zf.GetEntry("content.xml");
            Stream s = zf.GetInputStream(entry);

            // Convert stream to string
            StreamReader reader = new StreamReader(s);
            string contentXml = reader.ReadToEnd();

            // Search for all links that start with "applnet.test.fr"
            string pattern = @"http://applnet\.test\.fr/GetContenu/Download\.aspx\?p1=.*?;p2=.*?;p5=.*?;p6=NOPUB";
            Regex regex = new Regex(pattern);
            MatchCollection matches = regex.Matches(contentXml);

            Directory.CreateDirectory(Path.GetDirectoryName(odtFilePath));

            // Process each link found
            foreach (Match match in matches)
            {
                string link = match.Value;
                string[] parts = link.Split(new string[] { "aspx?" }, StringSplitOptions.None);
                string queryString = parts[parts.Length - 1];


                // Download the corresponding intranet document
                string folderName = Path.GetFileNameWithoutExtension(odtFilePath);
                string subFolderName = "PJ - " + folderName;
                string fileName = queryString;
                string localFilePath = "C:/PiecesJointes/" + fileName;
                string onlineFilePath = "https://com.test.fr/files/test/test/" + queryString;

                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
                request.AllowAutoRedirect = false;
                request.Method = "GET";
                request.ContentType = "application/pdf";
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                Stream stream = response.GetResponseStream();
                byte[] buffer = new byte[4096];
                int bytesRead = 0;
                FileStream fileStream = new FileStream(localFilePath, FileMode.Create);

                do
                {
                    bytesRead = stream.Read(buffer, 0, buffer.Length);
                    fileStream.Write(buffer, 0, bytesRead);
                } while (bytesRead > 0);

                fileStream.Close();
                response.Close();


                // Replace the link with the path of the downloaded document
                string newLink = localFilePath.Replace("\\", "/");
                contentXml = contentXml.Replace(link, onlineFilePath);
            }

            // Updates the content.xml in the initial ZIP file
            byte[] contentXmlBytes = System.Text.Encoding.UTF8.GetBytes(contentXml);
            ms = new MemoryStream();
            zf.BeginUpdate();

            // Add updated content to ZIP file
            ZipOutputStream zos = new ZipOutputStream(ms);
            zos.UseZip64 = UseZip64.Off;
            zos.IsStreamOwner = false;

            // Add entry for content.xml file
            zos.PutNextEntry(new ZipEntry(entry.Name));
            StreamUtils.Copy(new MemoryStream(contentXmlBytes), zos, new byte[4096]);

            // Processes each entry from the original ODT file
            foreach (ZipEntry origEntry in zf)
            {
                // Ignore the entry for the content.xml file because it has already been added
                if (origEntry.Name == entry.Name) continue;

                // Add entry to new ZIP file
                zos.PutNextEntry(new ZipEntry(origEntry.Name));
                StreamUtils.Copy(zf.GetInputStream(origEntry), zos, new byte[4096]);
            }

            zos.Close();

            // Finish updating the ZIP file
            zf.CommitUpdate();
            zf.Close();


            // Renames and saves the updated ODT file
            Guid g = Guid.NewGuid();
            string updatedFilePath = Path.Combine(Path.GetDirectoryName(odtFilePath), g + "_" + Path.GetFileName(odtFilePath));
            using (FileStream stream = new FileStream(updatedFilePath, FileMode.Create))
            {
                ms.Position = 0;
                ms.WriteTo(stream);
            }
            Console.WriteLine("The ODT file has been successfully updated and saved as: " + updatedFilePath);
            Console.ReadLine();
        }
    }
}

edited Apr 20, 2023 at 13:04

asked Apr 20, 2023 at 13:02

Gekidow

13 bronze badges

"i'm on .NET 3.5 and Visual Studio 2010" - oof. Anyway, at fileName = queryString you don't set a .pdf extension, so why would you expect one to appear? And what do you see if you open the files with a text editor? Probably some HTML telling you you need to log in.

CodeCaster
– CodeCaster

2023-04-20 13:11:02 +00:00
Commented Apr 20, 2023 at 13:11
"Probably some HTML telling you you need to log in." Exactly. and how can I get the extension of the file downloaded ? because there is PDF files and docx files that can be downloaded

Gekidow
– Gekidow

2023-04-20 13:16:53 +00:00
Commented Apr 20, 2023 at 13:16
1

Then modify your code so that it logs in.

CodeCaster
– CodeCaster

2023-04-20 13:17:36 +00:00
Commented Apr 20, 2023 at 13:17
1

No, a console application has no connection to your browser.

CodeCaster
– CodeCaster

2023-04-20 13:51:11 +00:00
Commented Apr 20, 2023 at 13:51
1

No, the HttpClient/WebClient/HttpWebRequest are application components that talk HTTP. A browser also talks HTTP, but one does not depend on the other.

CodeCaster
– CodeCaster

2023-04-20 14:03:09 +00:00
Commented Apr 20, 2023 at 14:03

| Show 16 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How can I download PDF files with HttpRequest

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest