Powershell script using a lot memory on server when outputting csv file

Question

I am currently trying to get the last modified date of all files on a 128.5GB folder containing multiple sub-directories and files. However, whenever the scripts runs, its uses almost all the memory on the server. (I assume that this is because its trying to fit all of the data in the memory before outputting it in a .csv files). Is there a way that we could still output the data on a .csv file without using all the memory on the server. Please find my below script:-

$results = Get-ChildItem -Force -Recurse -File -Path "C:\inetpub\wwwroot\" | Sort LastWriteTime -Descending | Select-Object FullName, LastWriteTime 

$results | Export-Csv "C:\Users\serveradmin\Documents\dates.csv" -notype

How many files are there? Get-Childitem is known to be slow on certain scenarios, and NTFS doesn't really perform too well with a large number of files. — vonPryz
– vonPryz, Commented Nov 30, 2021 at 10:42
Yes. You use a database, not five million files. A filesystem is not a database. — Ian Kemp - SO dead by AI greed
– Ian Kemp - SO dead by AI greed, Commented Nov 30, 2021 at 10:54
A file system is a database, the problem here is that OP attempts to load a significant chunk of its metadata into memory when they don't need to. You wouldn't download the raw indices from your DB either :) — Mathias R. Jessen
– Mathias R. Jessen, Commented Nov 30, 2021 at 10:57
You might have to use gnu sort on a csv that big. Also ps 7 is more memory efficient. — js2010
– js2010, Commented Nov 30, 2021 at 17:58

NeoTheNerd · Accepted Answer · 2021-11-30 17:47:42Z

Powershell can be memory intensive and slow... So I wrote you a script in python. I tested it on my mac, works a charm. I've left notes on the script. Just ammend the folder path to be scanned and where you want to save the csv file. It will be faster, and uses less memory :o)

#Import Python Modules
import os,time
import pandas as pd

#Function to Scan files
def get_information(directory):
    file_list = []
    for i in os.listdir(directory):
        a = os.stat(os.path.join(directory,i))
        file_list.append([i,time.ctime(a.st_atime),time.ctime(a.st_ctime),time.ctime(a.st_mtime)]) #[file,most_recent_access,created]
    return file_list

#Enter Folder Path To Be Scanned
flist = get_information("/Users/username/FolderName1/FolderName2/data")
#print(flist)

#Build DataFrame Table
df = pd.DataFrame(flist)

#Insert DataFrame Table Colimns
df.columns = ['file name', 'last access time', 'last change time', 'last modification time']

#Print output as test
#print(df)

#Bulid Filepath for output
src_path ="/Users/username/FolderName1/"
csvfilename = "output.csv"
csvfile = src_path + csvfilename

#Export to CSV
df.to_csv(csvfile, index=False)

js2010 · Accepted Answer · 2021-11-30 21:44:38Z

2

For what it's worth, I successfully did this with all the 1.8 million files on my hard drive over 8 minutes.

# 5 min
Get-ChildItem -Force -Recurse -File -ea 0 | 
Select-Object @{n='lastwritetime';
e={$_.lastwritetime.tostring('yyyy MM dd HH mm ss')}}, fullname | 
export-csv sort.csv

# 3 min
import-csv sort.csv | Sort LastWriteTime -Descending | export-csv sort2.csv

edited Nov 30, 2021 at 21:44

answered Nov 30, 2021 at 20:47

js2010

28.3k6 gold badges82 silver badges88 bronze badges

Collectives™ on Stack Overflow

Powershell script using a lot memory on server when outputting csv file

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related