SSIS: Control Flow Using a FTP Server File Modification Date

SSIS Control Flow FTP File Modification Date

The Scenario, or Where Did My FTP File’s Modification Date Go?

You have this ETL process where, like it or not, you have to pull a source file from a FTP server. You’ve decided you want to add a little intelligence to the process. Maybe you only want to reprocess the file if it’s changed since the last time you loaded it. Maybe you want to make sure it’s changing regularly to confirm that whatever external process that’s creating the file and placing it on the FTP server for you to grab is still running like it’s supposed to.

So you use your SSIS ETL FTP task to download the file. Once downloaded, you start creating some conditional events in your ETL control logic based on the file’s dates. You notice as you look more closely at the file that the modified date is the date your ETL process pulls the file over, not the file’s original modified date. Well, that’s not going to help you with the modified date control logic you were thinking of implementing.

A Solution For Getting a FTP File’s Modification Date

Like any good developer, you start thinking of different approaches to solve the FTP file date problem. If you have a zipped or otherwise archived compressed container file, you can usually get the original dates of the files contained within when you extract them. But what if the file you’re grabbing isn’t in an archived format?

What you’d really like to get is the modified date of the file from the FTP server itself. If you could get the date without downloading the file, even better, because you could make a decision if you even needed to re-download the file again in your ETL control logic. The good news is, you CAN access FTP file dates in this way! Let’s walk through how this can be accomplished and a scenario of how you might use this in an SSIS ETL package.

Let’s start by looking at this sample package, which you can download and follow along with here. In this sample package an HTML version of the book Alice in Wonderland is download from Project Gutenberg.

SSIS FTP File Modification Date 1-1

The first step of this package prepares the connection manager for the FTP command. It is detailed in my article on setting SSIS FTP Connection manager connection and credential info from a variable.

In the second step, we check if the file exists on the FTP server and get its modification date. Have a look at the code below that performs these operations:


public void Main() {
    // Get FTP Connection String from package variable.
    string FTPConnS = Dts.Variables["FTPConnStr"].Value.ToString();

    // Set up a Name Value collection to hold each of the FTP Connection parts.
    NameValueCollection FTPConnColl = new NameValueCollection();

    // Populate the Name Value collection from the Connection String variable.
    string[] aryStrings = FTPConnS.Split(new char[] { ';' }, 4);
    string[] nameAndValue;
    foreach (string s in aryStrings) {
        nameAndValue = s.Split(new char[] { '=' }, 2);
        FTPConnColl.Add(nameAndValue[0], nameAndValue[1]);
    }

    // Grab FTP connection and file info from the FTP Name Value Collection.
    string FTPServer = FTPConnColl["Server"];
    string FTPPort = FTPConnColl["Port"];
    string FTPUser = FTPConnColl["User"];
    string FTPPwd = FTPConnColl["Pwd"];

    // Grab the filename from a SSIS variable
    string FTPRemoteFilePath = Dts.Variables["FTPRemoteFilePath"].Value.ToString(); ;

    // Default file exists to false, and only set to true later if it's determined that the file actually exists.
    Dts.Variables["FTPRemoteFileExists"].Value = false;

    // Set up a FTP Web Request to grab the modified date of the file off of the FTP server
    WebRequest FDRequest = FtpWebRequest.Create(string.Format("ftp://{0}:{1}@{2}:{3}/{4}", FTPUser, FTPPwd, FTPServer, FTPPort, FTPRemoteFilePath));
    FDRequest.Proxy = null;
    FDRequest.Method = WebRequestMethods.Ftp.GetDateTimestamp;
    using (FtpWebResponse FDateResponse = (FtpWebResponse)FDRequest.GetResponse()) {
        Dts.Variables["FTPRemoteFileExists"].Value = true;
        Dts.Variables["FTPRemoteFileDate"].Value = FDateResponse.LastModified;
    }

    Dts.TaskResult = (int)ScriptResults.Success;
}

You may notice the code at the beginning of this segment to be similar to that in my SSIS FTP Connection Manager article mentioned earlier, where FTP connection and credential information is parsed out of a single variable where it is stored. After we set up our connection information we proceed with creating a WebRequest. With this web request we will be requesting the date time stamp of a specific file from the FTP server. The GetDateTimestamp command is available on almost every FTP server, which makes it great not only for picking up the last modified date of a file, but also more generally as a way to check for file existence.

In our next step in this example, we check to see if we already have a local copy of the file and if so get its modification date. We can use the following code to make this local file check:


	String FTPLocalFile = Dts.Variables["FTPLocalFilePath"].Value.ToString() + Dts.Variables["FTPLocalFileName"].Value.ToString();

	// Test if local file exists.
	if (System.IO.File.Exists(FTPLocalFile)) {
		Dts.Variables["FTPLocalFileExists"].Value = true;
		Dts.Variables["FTPLocalFileDate"].Value = System.IO.File.GetLastWriteTime(FTPLocalFile);
	}

Now, armed with information about both the remote file on the FTP server and the local file, we can make a decision on if we need to download the file or not. We use a precedence constraint in the package to make this decision. In this case, we are only going to download the file if either a local version of the file does not exist or if a local file exists but the file out on the FTP server is newer. This precedence constraint would look like this:

@[User::FTPLocalFileExists]  == false || ( @[User::FTPLocalFileExists]  == true && @[User::FTPLocalFileDate]  < @[User::FTPRemoteFileDate] )

Finally, if all of the required conditions are met, we download the file from the FTP server.

Conclusion

In this article you learned a method and walked through an example for getting a file’s modification date from a FTP server before downloading the file and using that date in your SSIS package control flow to make decisions about what to do with the file. As you can see, a few lines of .NET code in a SSIS script task can be a powerful ally when designing control logic for your packages. Feel free to let me know if you have any questions or comments on the material covered here.

Leave a Reply

Your email address will not be published. Required fields are marked *