Uploading PDF's to SharePoint Online

The problem...

I have something like 1200 PDfs, and the list is growing, of deliverables that we own the copywrite to that we want to put in a central place called SharePoint Online. This library will then get the benefit of being searchable, have associated metadata, and should free up time from a bunch of people for having to search for this information manually.

The solution...

This took a little while to get to, and some of the iterative steps that transpired between inception and solution are long since gone. That's OK, that's how the process worked. I had to laugh at myself a bit when I started searching for some solutions to the problems I ran into, and found my own posts on the topic...

Anyhow, on to the general program idea.

I don't know where all the content we were going to reference is, but our amazing team of office administrators do. They wound up collecting shortcuts (like, Windows document shortcuts) in a folder that referenced the files they were wanting to use. The shortcuts would be named a standard way (project number, deliverable type, and an index counter). This same identifier would go into a spreadsheet - along with the associated metadata, collected by our equally amazing records management person.

That left me needing to collect data from a spreadsheet (hello ImportExcel), and read data from a Shortcut .lnk file (Get-Shortcut from PoshFunctions.)

So - I'd read the content of the excel file, look for a corresponding shortcut file. If the shortcut target existed, watermark the PDF (using itext), then pack all this up and upload it to SharePoint Online using PnP Powershell.

Easy.

Tools / Resources used

Notes from the travels along the way

Most of this was straightforward. Outside of some variable definitions, my main function is pretty straightforward:

 1function Load-FinalReports() {
 2	[CmdletBinding(SupportsShouldProcess = $true)]
 3	param(
 4		# How many records to load
 5		[Parameter()]
 6		[Int]
 7		$HowManyToLoad = 100,
 8
 9		# Validate only. Do not post to SPO.
10		[Parameter()]
11		[Switch]
12		$ValidateOnly
13	)
14
15	$Metadata =
16	$ShortcutsPath =
17	$SPOUploadList = join-path $PSScriptRoot "ReportsAddedToSPO.xml"
18	$URL =
19	$Library = "Final Reports"
20
21	# Load content from sheet
22	$SheetData = Get-FinalReportSheetContent -XLSFilename $Metadata -SheetName "Ready to Add"
23
24	# Get shortcut target information
25	$RecordsToUpload, $RecordsWithErrors = Validate-SheetLinks -SheetInfo $SheetData -ShortcutsPath $ShortcutsPath
26
27	# Show errors. These are Marilyn's problems.
28	write-warning "Records with errors:"
29	$RecordsWithErrors | select-object ID, ErrorType | Out-String | Write-Output
30
31	if (!$ValidateOnly) {
32		# import the data we think we've uploaded to SPO
33		$LoadedToSPO = [System.Collections.ArrayList]@()
34		if (test-path $SPOUploadList) {
35			$LoadedToSPO = [System.Collections.ArrayList]@(import-clixml -Path $SPOUploadList)
36		}
37
38		# Remove the data we've already uploaded from our list of content to upload
39		$RemoveFromDataToLoad = [System.Collections.ArrayList]@()
40		foreach ($Record in $RecordsToUpload) {
41			if ($Record.ID -in $LoadedToSPO.ID) {
42				write-verbose "$($Record.ID) has already been uplaoded?"
43				$RemoveFromDataToLoad.Add($Record) | out-null
44			}
45		}
46		foreach ($Record in $RemoveFromDataToLoad) {
47			$RecordsToUpload.Remove($Record)
48		}
49
50		# Do the uploading.
51		Connect-PnPOnline -ClientId $_CLIENT_ID -Url $URL -Tenant $_TENANT -CertificatePath $_CERTIFICATE
52
53		$Counter = 0
54		foreach ($Record in $RecordsToUpload) {
55			$Counter++
56			if ($Counter -gt $HowManyToLoad) {
57				break
58			}
59			Write-Progress -Activity "Loading to SharePoint" -Status "Record $($Counter) of $($RecordsToUpload.Count)" -PercentComplete (100 * $Counter / $RecordsToUpload.Count)
60			$LoadState = Add-SPORecord -RecordToLoad $Record -Library $Library
61			if ($LoadState) {
62				$LoadedToSPO.Add($Record) | out-null
63			}
64		}
65		Write-Progress -Activity "Loading to SharePoint" -Status "Complete" -Completed
66		$LoadedToSPO | Export-Clixml -Path $SPOUploadList -WhatIf:$false
67	}
68}
69
70Load-FinalReports -HowManyToLoad 5000

A few parameters exist: -HowManyToLoad should be pretty clear - it limits how many records I am importing at a given time. This only applies when the records are getting loaded to SharePoint Online, and not when collecting data from the XLS or shortcuts.

The function Get-FinalReportSheetContent does what it says on the tin, except in the function, I collect a hash of the XLS file and save it. On subsequent runs, I'll compare the saved hash against the current file hash, and if it's mismatched, I reload data from the XLS file. If it's the same, I instead load the content from an existing XML file (that I write to disk at the end of the function). This saves some time during import in the event that the data remains unchanged.

Validate-SheetLinks also does what it says on the tin, but it returns two arrays - successful data and error data. Successful data is the information passed in from Get-FinalReportSheetContent with added properties (the target path for the file to be uploaded), while the error data generates a list of IDs that either have a problem with the shortcut, or the target file is not a PDF. Both these error conditions require manual intervention.

This exercise taught me that I can return multiple values from a function in PowerShell.

We do some checks and manipulation of the array of data to be uploaded (if we think the file has already been uploaded, we'll skip it), then connect to SharePoint Online using PnP, and an AzureAD Application, and start uploading content.

This is where stuff gets spicy.

Taxonomy from a Term Store

We decided early on to put several terms into the site's Term Store instead of an individual's library taxonomy terms. This would (should?) enable some common data between different libraries within the same site.

This poses a small challenge when adding content. The formatting of a value for the term store is Term Group|Termset|Term - so we have our group is "Project Data", Termset is "Author" and the Term is each individual author. The metadata in the spreadsheet has this information as comma separated, so a small function was written to handle this conversion:

 1function Fix-AuthorsForTermStore($TextString) {
 2	$Authors = $null
 3	if ($TextString.Count -gt 0) {
 4		$Authors = $TextString -split ','
 5		$Authors = $Authors.Trim()
 6		$Authors = $Authors | ForEach-Object { if ($_.Length -gt 0) { "Project Data|Authors|$($_)" } }
 7	} else {
 8		$Authors = $null
 9	}
10	return $Authors
11}

Also to note, that if a term was missing from the term store, the record would still get created in SharePoint Online, but the insertion would generate a warning. I'm sure there's a way to capture these errors, but I opted for the manual method (read the console output).

Improperly formatted metadata

In the early stages, we assumed that some of the metadata would be a single line. This turned out not to be true and, during the record insertion, would cause all the metadata to fail. The file would load, but all the field (except for the uploaded file) would be blank. This surfaced in a few places (single line / multiline), but also in an early bug in the Fix-AuthorsForTermStore function where the term would sometime return blank instead of null.

Watermarking

Initial Setup

This turned out taking a lot more time and energy than expected, but the output turned out to be right on point. I'm using the iText7 C# libraries to do this work for me, which needs a bit of work get running in PowerShell (see Nuget in PowerShell).

So.. I bring in all the libraries using something like this:

1Install-Package -Name itext7 -ProviderName nuget -RequiredVersion 7.1.17 -Scope CurrentUser -Destination ./lib -SkipDependencies
2Install-Package -Name Portable.BouncyCastle -ProviderName nuget -RequiredVersion 1.8.9.0 -Destination ./lib -SkipDependencies
3Install-Package -Name Common.Logging -ProviderName nuget -RequiredVersion 3.4.1.0 -Destination ./lib -SkipDependencies
4Install-Package -Name Common.Logging.Core -ProviderName nuget -RequiredVersion 3.4.1.0 -Destination ./lib -SkipDependencies

I'm staying in the 7.1 branch of iText7, since this allows me to use the .NET 4.0 compiled DLL's. 7.2+ is compiled for 6.5 and above (I think).

The result here is that all the libraries I need are in the .\lib folder. Next, I need to import them. The old iText libraries were a few DLLs, but in newer versions all the pieces are split into multiple parts. Importing all the DLL's can be accelerated with something like:

1$DLLFiles = get-childitem -path (join-path $PSScriptRoot "lib") -Filter "*.dll" -Recurse  | Where-Object { $_.DirectoryName -like "*net40*" -or $_.DirectoryName -like "*net45*" }
2foreach ($DLLFile in $DLLFiles ) {
3	try {
4		Add-Type -Path $DLLFile.FullName
5		# Write-Host $DLLFile.FullName
6	} catch {
7		# write-warning $DLLFile.FullName
8	}
9}

Making it easier to use in PowerShell

So, that brings all the libraries into the environment, but I would then need to reference the fully qualified name of the library - like iText.Kernel.Pdf.PdfReader. This was a bit tedious, but a Reddit post helped solve this. By using the ..using operator (which I didn't know existed), I am able to reference PdfReader without the full namespace:

 1using namespace iText.IO.Font;
 2using namespace iText.IO.Font.Constants;
 3using namespace iText.Kernel.Colors;
 4using namespace iText.Kernel.Font;
 5using namespace iText.Kernel.Pdf;
 6using namespace iText.Kernel.Pdf.Canvas;
 7using namespace iText.Kernel.Pdf.Extgstate;
 8using namespace iText.Layout;
 9using namespace iText.Layout.Element;
10using namespace iText.Layout.Properties;
11using namespace iText.Commons;

This has to exist at the top of my script, but otherwise does the trick. As an alternative, you can add custom accelerators:

1$Accelerators = [Psobject].Assembly.GetType('System.Management.Automation.TypeAccelerators')
2$Accelerators::Add("Rectangle", "iText.Kernel.Geom.Rectangle")
3$Accelerators::Add("BadPasswordException", "iText.Kernel.Crypto.BadPasswordException")

Applying the watermark

The iText site has some great examples of watermarking, found here: https://kb.itextpdf.com/home/it7kb/examples/watermark-examples. This was the foundation of the script that I wrote, with some modifications.

I wanted the watermark to be corner to corner, instead of printed horizontally along the page. I also wanted the font to be "as big as possible". Reading https://kb.itextpdf.com/home/it7kb/faq/how-to-choose-the-optimal-size-for-a-font was a bit unclear on find the font size, but wound up being computed like so:

1$fontSize = 1000 * [math]::Sqrt([math]::pow($pageSize.GetWidth(),2) +[math]::pow($pageSize.GetHeight(),2))*.8 / $font.GetWidth($textString)
2[Paragraph]$Paragraph = [Paragraph]::new($textString)
3$Paragraph.SetFont($Font).SetFontSize($fontSize)
4"Font size is $($fontSize)"

Simply put, I'm finding how long the angled line is Pythagorean theorem, multiplying by 1000 then by 0.8 (so it's 80% maximum widdth), then dividing it by $font.GetWidth($textstring) (which has already been defined with a font face).

Dealing with passwords

Some of the PDFs may have had passwords on them; turns ou the iText reader object had a method to just.. ignore passwords:

1[PdfReader]$PDFInput = [PdfReader]::new($inputFile)
2$PDFInput.SetUnethicalReading($true)

Which is really quite nice, and makes having to handle the password (which may be different...) really nice. At the tail end, I do want to re-password protect the file with the original restrictions:

1$enc = [system.Text.Encoding]::UTF8
2
3[WriterProperties]$writerProperties = [WriterProperties]::new()
4$writerProperties.SetStandardEncryption($null, $enc.GetBytes("supersecurepassword"), [EncryptionConstants]::ALLOW_PRINTING -bor [EncryptionConstants]::ALLOW_MODIFY_ANNOTATIONS -bor [EncryptionConstants]::ALLOW_SCREENREADERS, [EncryptionConstants]::ENCRYPTION_AES_128)
5
6[Pdfwriter]$PDFOutput = [PdfWriter]::new($outputFile, $writerProperties)

Note how all the encryption options are bitwise OR'd together.

All wrapped up

The function, and required setup, can be found at the end of this post!

Final Notes

During testing, I accidentally unset the textString parameter. This generated an error somewhere around line 68 - on the canvasWatermark3:

1Exception calling "ShowTextAligned" with "7" argument(s): "Object reference not set to an instance of an object."
2At Watermark PDF\test.ps1:68 char:4
3+             $canvasWatermark3.ShowTextAligned($Paragraph, $x, $y, $i, ...

This was a bit of red herring - the actual issue is that $Paragraph is undefined, not $canvasWatermark3.

Final script

All said and done, the final script takes the following shape. It isn't perfect (some variables in weird places, some stuff not a variable that should be, etc), but it did the trick.

  1using namespace iText.IO.Font;
  2using namespace iText.IO.Font.Constants;
  3using namespace iText.Kernel.Colors;
  4using namespace iText.Kernel.Font;
  5using namespace iText.Kernel.Pdf;
  6using namespace iText.Kernel.Pdf.Canvas;
  7using namespace iText.Kernel.Pdf.Extgstate;
  8using namespace iText.Layout;
  9using namespace iText.Layout.Element;
 10using namespace iText.Layout.Properties;
 11using namespace iText.Commons;
 12
 13#Requires -Modules ImportExcel, PoshFunctions
 14
 15$_CLIENT_ID =
 16$_TENANT =
 17$_CERTIFICATE = join-path $PSScriptRoot "Certificate for connecting to SPO.pfx"
 18
 19# Load library files
 20$DLLFiles = get-childitem -path (join-path $PSScriptRoot "lib") -Filter "*.dll" -Recurse  | Where-Object { $_.DirectoryName -like "*net40*" -or $_.DirectoryName -like "*net45*" }
 21foreach ($DLLFile in $DLLFiles ) {
 22	try {
 23		Add-Type -Path $DLLFile.FullName
 24		# Write-Host $DLLFile.FullName
 25	} catch {
 26		# write-warning $DLLFile.FullName
 27	}
 28}
 29
 30# Type accellerators so I don't have to wreite the fully qualified name each time.
 31$Accelerators = [Psobject].Assembly.GetType('System.Management.Automation.TypeAccelerators')
 32$Accelerators::Add("Rectangle", "iText.Kernel.Geom.Rectangle")
 33$Accelerators::Add("BadPasswordException", "iText.Kernel.Crypto.BadPasswordException")
 34
 35function Set-Watermark($inputFile, $outputFile, $textString = "For Internal Use Only") {
 36	$Opacity = 0.25
 37
 38	$enc = [system.Text.Encoding]::UTF8
 39
 40	[WriterProperties]$writerProperties = [WriterProperties]::new()
 41	$writerProperties.SetStandardEncryption($null, $enc.GetBytes("LibrarY07"), [EncryptionConstants]::ALLOW_PRINTING -bor [EncryptionConstants]::ALLOW_MODIFY_ANNOTATIONS -bor [EncryptionConstants]::ALLOW_SCREENREADERS, [EncryptionConstants]::ENCRYPTION_AES_128)
 42
 43	[PdfReader]$PDFInput = [PdfReader]::new($inputFile)
 44	$PDFInput.SetUnethicalReading($true)
 45
 46	try {
 47		[Pdfwriter]$PDFOutput = [PdfWriter]::new($outputFile, $writerProperties)
 48
 49		[PdfDocument]$pdfDoc = [PdfDocument]::new($PDFInput, $PDFOutput)
 50
 51		[PdfFont]$Font = [PdfFontFactory]::CreateFont([FontProgramFactory]::CreateFont([StandardFonts]::HELVETICA))
 52
 53		[PdfExtGState]$gs1 = [PdfExtGState]::new()
 54		$gs1.SetFillOpacity($Opacity)
 55
 56		for ($i = 1; $i -le $pdfDoc.GetNumberOfPages(); $i++) {
 57			[PdfPage]$pdfPage = $pdfDoc.GetPage($i);
 58			[Rectangle]$pageSize = $pdfPage.GetPageSizeWithRotation();
 59
 60			# When "true": in case the page has a rotation, then new content will be automatically rotated in the
 61			# opposite direction. On the rotated page this would look as if new content ignores page rotation.
 62			$pdfPage.SetIgnorePageRotationForContent($true);
 63
 64			$x = ($pageSize.GetLeft() + $pageSize.GetRight()) / 2;
 65			$y = ($pageSize.GetTop() + $pageSize.GetBottom()) / 2;
 66			[PdfCanvas]$over = [PdfCanvas]::new($pdfDoc.GetPage($i));
 67			$over.SaveState();
 68			$over.SetExtGState($gs1);
 69
 70			$fontSize = 1000 * [math]::Sqrt([math]::pow($pageSize.GetWidth(), 2) + [math]::pow($pageSize.GetHeight(), 2)) * .8 / $font.GetWidth($textString)
 71			[Paragraph]$Paragraph = [Paragraph]::new($textString)
 72			$Paragraph.SetFont($Font).SetFontSize($fontSize)
 73			"Font size is $($fontSize)"
 74
 75			[Canvas]$canvasWatermark3 = [Canvas]::new($Over, $PageSize)
 76			$canvasWatermark3.ShowTextAligned($Paragraph, $x, $y, $i, [TextAlignment]::CENTER, [VerticalAlignment]::TOP, -45);
 77			$canvasWatermark3.Close()
 78
 79			$over.RestoreState();
 80		}
 81		$pdfDoc.Close()
 82	} catch [BadPasswordException] {
 83		Write-Warning "Incorrect password specified for file."
 84	} catch [System.IO.IOException] {
 85		write-warning "Could not open the output file for writing. Is it open in Adobe?"
 86
 87	} finally {
 88
 89		$PDFInput.Close()
 90		try { $PDFOutput.Close() } catch {}
 91		try { $pdfDoc.Close() } catch {}
 92	}
 93}
 94
 95function Fix-AuthorsForTermStore($TextString) {
 96	$Authors = $null
 97	if ($TextString.Count -gt 0) {
 98		$Authors = $TextString -split ','
 99		$Authors = $Authors.Trim()
100		$Authors = $Authors | ForEach-Object { if ($_.Length -gt 0) { "Project Data|Authors|$($_)" } }
101	} else {
102		$Authors = $null
103	}
104	return $Authors
105}
106
107function Get-FinalReportSheetContent {
108	[CmdletBinding()]
109	param (
110		# XLS File
111		[Parameter(Mandatory)]
112		[String]
113		$XLSFilename,
114
115		# Sheet Name
116		[Parameter(Mandatory)]
117		[String]
118		$SheetName,
119
120		# Force
121		[Parameter()]
122		[Switch]
123		$Force
124	)
125
126	process {
127		$XLSFile = get-item $XLSFilename
128
129		$CacheFileMD5 = join-path $PSScriptRoot "$($XLSFile.BaseName).cacheMD5.txt"
130		$CacheFile = join-path $PSScriptRoot "$($XLSFile.BaseName).cache.xml"
131
132		if ($Force) {
133			remove-item $CacheFileMD5
134			remove-item $CacheFile
135		}
136
137		# Get the MD5 of the current file
138		$CurrentFileMD5 = (Get-FileHash -Path $XLSFilename).Hash
139		$CacheMD5 = get-content -Path $CacheFileMD5 -ErrorAction SilentlyContinue
140
141		$XLSContent = [System.Collections.ArrayList]@()
142
143		# Check that the MD5's match.
144		if ($CurrentFileMD5 -eq $CacheMD5) {
145			$XLSContent = Import-Clixml $CacheFile
146		} else {
147			# MD5 do not match. Remove the old cache file
148			# This may not exist already. That's OK.
149			Remove-Item $CacheFile -ErrorAction SilentlyContinue
150
151			$Counter = 0
152			$XLSData = import-excel $XLSFile.FullName -WorksheetName $SheetName
153			$FilteredData = $XLSData | Where-Object { $null -ne $_."ID" }
154
155			foreach ($Record in $FilteredData) {
156				$Counter++
157				Write-Progress -Activity "Importing records from XLS" -Status "Record $($Counter) of $($FilteredData.Count)" -PercentComplete (100 * $Counter / $FilteredData.Count)
158				$UniqueIDs = [System.Collections.ArrayList]@(($Record.'ID' -split ',').Trim())
159				foreach ($uniqueID  in $uniqueIDs) {
160					$RecordToLoad = $Record | ConvertTo-Json | ConvertFrom-Json
161					$XLSContent.Add($RecordToLoad) | out-null
162				}
163			}
164			Write-Progress -Activity "Importing records from XLS" -Status "Ready" -Completed
165
166			# Save the cache file
167			$XLSContent | Export-Clixml -Path $CacheFile
168			$CurrentFileMD5 | Set-Content -Path $CacheFileMD5 -Force
169		}
170
171		return $XLSContent
172	}
173}
174
175function Validate-SheetLinks {
176	[CmdletBinding()]
177	param (
178		# Sheet Information
179		[Parameter(Mandatory)]
180		[Object[]]
181		$SheetInfo,
182
183		# Shortcuts Path - Folder containing shortcuts
184		[Parameter(Mandatory)]
185		[String]
186		$ShortcutsPath
187	)
188
189	process {
190		$ShortcutsWithProblems = [System.Collections.ArrayList]@()
191		$FilesReadyToUpload = [System.Collections.ArrayList]@()
192		$Counter = 0
193		foreach ($Record in $SheetInfo) {
194			$Counter++
195			Write-Progress -Activity "Validating Shortcuts" -Status "Record $($Counter) of $($SheetInfo.Count)" -PercentComplete (100 * $Counter / $SheetInfo.Count)
196			$ShortCutFile = Join-Path $ShortcutsPath "$($Record.ID).lnk"
197
198			# Test if the file exists
199			$FoundFile = test-path $ShortCutFile
200			if (!$FoundFile) {
201				$Record | Add-Member -MemberType NoteProperty -Name "ErrorType" -Value "Shortcut target does not exist"
202				$ShortcutsWithProblems.Add($Record) | out-null
203				continue
204			}
205
206			# Test if the file is a PDF
207			$FileObject = get-item ((Get-Shortcut -Path $ShortCutFile).TargetPath)
208			if ($FileObject.Extension -ne ".pdf") {
209				$Record | Add-Member -MemberType NoteProperty -Name "ErrorType" -Value "Shortcut target is not PDF"
210				$ShortcutsWithProblems.Add($Record) | out-null
211				continue
212			}
213
214			$Record | Add-Member -MemberType NoteProperty -Name TargetFilePath -Value ((Get-Shortcut -Path $ShortCutFile).TargetPath)
215			$FilesReadyToUpload.Add($Record) | Out-Null
216		}
217		Write-Progress -Activity "Validating Shortcuts" -Status "Done" -Completed
218
219		return $FilesReadyToUpload, $ShortcutsWithProblems
220	}
221}
222
223function Add-SPORecord {
224	[CmdletBinding(SupportsShouldProcess = $true)]
225	param (
226		# Record
227		[Parameter(Mandatory)]
228		[Object]
229		$RecordToLoad,
230
231		# Library
232		[Parameter(Mandatory)]
233		[String]
234		$Library
235	)
236
237	process {
238		$AuthorsArray = Fix-AuthorsForTermStore ($RecordToLoad."Authors")
239		$ReviewersArray = Fix-AuthorsForTermStore ($RecordToLoad."Reviewers")
240
241		if ($PSCmdlet.ShouldProcess($RecordToLoad.ID, "Add")) {
242			$ValuesHashMap = @{
243				Project              = $RecordToLoad."Project";
244				Job                  = $RecordToLoad."Job";
245				Year                 = $RecordToLoad."Year";
246				_ExtendedDescription = $RecordToLoad."Report Title";
247				Client               = $RecordToLoad.Client;
248				Date                 = $RecordToLoad."Month/Day";
249				PDF                  = (Get-Culture).TextInfo.ToTitleCase($RecordToLoad.'PDF');
250				ReportType           = $RecordToLoad."Report Type";
251				Notes                = $RecordToLoad.Note;
252				Notice               = $RecordToLoad.Notice;
253				ExecSummary          = $RecordToLoad."Exec Summary"
254			}
255			if ($null -ne $AuthorsArray) {
256				$ValuesHashMap."Authors" = @($AuthorsArray)
257			}
258			if ($null -ne $ReviewersArray) {
259				$ValuesHashMap."Reviewers" = @($ReviewersArray)
260			}
261
262			WRite-verbose $RecordToLoad.TargetFilePath
263			write-host "ID: $($RecordToLoad.ID)"
264			$ValuesHashMap | Out-String | write-verbose
265
266			$SourceFileObject = get-item $RecordToLoad.TargetFilePath
267			$WatermarkedFilePath = join-path ($env:TEMP) $SourceFileObject.Name
268			Set-Watermark -inputFile $RecordToLoad.TargetFilePath -outputFile $WatermarkedFilePath
269
270			$AddFileResult = Add-PnPFile -Path $WatermarkedFilePath -Folder $Library -Values $ValuesHashMap
271
272			Remove-item $WatermarkedFilePath
273
274			return $AddFileResult
275		}
276		return $null
277	}
278}
279
280function Load-FinalReports() {
281	[CmdletBinding(SupportsShouldProcess = $true)] # In function header
282	param(
283		# How many records to load
284		[Parameter()]
285		[Int]
286		$HowManyToLoad = 100,
287
288		# Validate only. Do not post to SPO.
289		[Parameter()]
290		[Switch]
291		$ValidateOnly
292	)
293
294	$Metadata = Join-Path $PSScriptRoot '.\list.xlsx'
295	$ShortcutsPath =
296	$SPOUploadList = join-path $PSScriptRoot "ReportsAddedToSPO.xml"
297	$URL =
298	$Library = "Final Reports"
299
300	# Load content from sheet
301	$SheetData = Get-FinalReportSheetContent -XLSFilename $Metadata -SheetName "Ready to Add"
302
303	# Get shortcut target information
304	$RecordsToUpload, $RecordsWithErrors = Validate-SheetLinks -SheetInfo $SheetData -ShortcutsPath $ShortcutsPath
305
306	# Show errors. These are Marilyn's problems.
307	write-warning "Records with errors:"
308	$RecordsWithErrors | select-object ID, ErrorType | Out-String | Write-Output
309
310	if (!$ValidateOnly) {
311		# import the data we think we've uploaded to SPO
312		$LoadedToSPO = [System.Collections.ArrayList]@()
313		if (test-path $SPOUploadList) {
314			$LoadedToSPO = [System.Collections.ArrayList]@(import-clixml -Path $SPOUploadList)
315		}
316
317		# Remove the data we've already uploaded from our list of content to upload
318		$RemoveFromDataToLoad = [System.Collections.ArrayList]@()
319		foreach ($Record in $RecordsToUpload) {
320			if ($Record.ID -in $LoadedToSPO.ID) {
321				write-verbose "$($Record.ID) has already been uplaoded?"
322				$RemoveFromDataToLoad.Add($Record) | out-null
323			}
324		}
325		foreach ($Record in $RemoveFromDataToLoad) {
326			$RecordsToUpload.Remove($Record)
327		}
328
329		# Do the uploading.
330		Connect-PnPOnline -ClientId $_CLIENT_ID -Url $URL -Tenant $_TENANT -CertificatePath $_CERTIFICATE
331		$Counter = 0
332		foreach ($Record in $RecordsToUpload) {
333			$Counter++
334			if ($Counter -gt $HowManyToLoad) {
335				break
336			}
337			Write-Progress -Activity "Loading to SharePoint" -Status "Record $($Counter) of $($RecordsToUpload.Count)" -PercentComplete (100 * $Counter / $RecordsToUpload.Count)
338			$LoadState = Add-SPORecord -RecordToLoad $Record -Library $Library
339			if ($LoadState) {
340				$LoadedToSPO.Add($Record) | out-null
341			}
342		}
343		Write-Progress -Activity "Loading to SharePoint" -Status "Complete" -Completed
344		$LoadedToSPO | Export-Clixml -Path $SPOUploadList -WhatIf:$false
345	}
346}
347
348Load-FinalReports -HowManyToLoad 5000