1

I am currently trying to get the last modified date of all files on a 128.5GB folder containing multiple sub-directories and files. However, whenever the scripts runs, its uses almost all the memory on the server. (I assume that this is because its trying to fit all of the data in the memory before outputting it in a .csv files). Is there a way that we could still output the data on a .csv file without using all the memory on the server. Please find my below script:-

$results = Get-ChildItem -Force -Recurse -File -Path "C:\inetpub\wwwroot\" | Sort LastWriteTime -Descending | Select-Object FullName, LastWriteTime 

$results | Export-Csv "C:\Users\serveradmin\Documents\dates.csv" -notype 


7
  • How many files are there? Get-Childitem is known to be slow on certain scenarios, and NTFS doesn't really perform too well with a large number of files. Commented Nov 30, 2021 at 10:42
  • Hi vonPryz, Site folder contains around 5,721,620 files. Commented Nov 30, 2021 at 10:47
  • Yes. You use a database, not five million files. A filesystem is not a database. Commented Nov 30, 2021 at 10:54
  • 2
    A file system is a database, the problem here is that OP attempts to load a significant chunk of its metadata into memory when they don't need to. You wouldn't download the raw indices from your DB either :) Commented Nov 30, 2021 at 10:57
  • 1
    You might have to use gnu sort on a csv that big. Also ps 7 is more memory efficient. Commented Nov 30, 2021 at 17:58

2 Answers 2

2

Powershell can be memory intensive and slow... So I wrote you a script in python. I tested it on my mac, works a charm. I've left notes on the script. Just ammend the folder path to be scanned and where you want to save the csv file. It will be faster, and uses less memory :o)

#Import Python Modules
import os,time
import pandas as pd

#Function to Scan files
def get_information(directory):
    file_list = []
    for i in os.listdir(directory):
        a = os.stat(os.path.join(directory,i))
        file_list.append([i,time.ctime(a.st_atime),time.ctime(a.st_ctime),time.ctime(a.st_mtime)]) #[file,most_recent_access,created]
    return file_list

#Enter Folder Path To Be Scanned
flist = get_information("/Users/username/FolderName1/FolderName2/data")
#print(flist)

#Build DataFrame Table
df = pd.DataFrame(flist)

#Insert DataFrame Table Colimns
df.columns = ['file name', 'last access time', 'last change time', 'last modification time']

#Print output as test
#print(df)

#Bulid Filepath for output
src_path ="/Users/username/FolderName1/"
csvfilename = "output.csv"
csvfile = src_path + csvfilename

#Export to CSV
df.to_csv(csvfile, index=False)
Sign up to request clarification or add additional context in comments.

Comments

2

For what it's worth, I successfully did this with all the 1.8 million files on my hard drive over 8 minutes.

# 5 min
Get-ChildItem -Force -Recurse -File -ea 0 | 
Select-Object @{n='lastwritetime';
e={$_.lastwritetime.tostring('yyyy MM dd HH mm ss')}}, fullname | 
export-csv sort.csv

# 3 min
import-csv sort.csv | Sort LastWriteTime -Descending | export-csv sort2.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.