Parsing Excel to a List of Objects and a List of Objects to workbooks

Home / Parsing Excel to a List of Objects and a List of Objects to workbooks

Recently, I needed to be able to parse Excel files and to output Excel files. However, I found that my old code from previous blog posts were now problematic:

https://long2know.com/2016/05/parsing-excel-to-a-list-of-objects/
https://long2know.com/2016/06/create-an-excel-template-from-objects/

Those old implementations used EPPlus which is now a commercial product. I wanted something open source.

After a bit of searching, I found that Microsoft offers an OpenXml nuget library called DocumentFormat.OpenXml. It offers all of the functionality needed to deal with Excel, although it is a little clunky and has some peculiar behaviors when compared to EPPlus. Where the differences lie is in being more aware of the fact that you’re dealing with Xml and handling data-type conversions in slightly different manners.

Jumping right in, I knew I wanted a simple service that I could inject. I created one with (2) methods. One parses a list to Excel and the other parses Excel to a list. I kept the options relatively light.

public interface IExcelParseService
{
    /// <summary>
    /// Parse a stream containing an Excel document to <see cref="List{T}" />
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="stream"></param>
    /// <param name="defaultStringNullValue"></param>
    /// <param name="map"></param>
    /// <returns></returns>
    Task<List<T>> ParseToList<T>(Stream stream, string defaultStringNullValue = "N/A", Dictionary<string, string> map = null) where T : new();

    /// <summary>
    /// Parse a <see cref="List{T}" /> to an Excel workbook
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="list"></param>
    /// <param name="map"></param>
    /// <returns></returns>
    Task<Stream> ParseToWorkBook<T>(List<T> list, List<string> excludedProps = null, string defaultStringNullValue = "N/A", Dictionary<string, string> map = null) where T : new();
}

I also wanted a small mapping-type object so I could track information about the data being parsed. This would allows tracking columns when the data is being transformed and a bit of metadata about the column.

        public class ExcelMap
        {
            public string Name { get; set; }
            public string MappedTo { get; set; }
            public int Index { get; set; }
            public int ColIndex { get; set; }
            public string CellReference { get; set; }
        }

Examining the ParseToWorkbook first, it’s really taking advantage of the library to create the Excel document, create a workbook within the document, and then iterate over every TType item in our List<T> to create a row per item with each public property, that is not an IEnumerable (other than string), to columnar represenations. The library is a little odd is a little odd in how it uses “shared” data for strings and such to which the cell values are written and then the workbook will reference shared data. However, DateTime and Boolean are a little different and written directly as CellValues. You can see this in the full method below.

In some ways, this seems is pretty clunky and you’ll see a reference to a method called InsertSharedStringItem that handles writing to this SharedStringTable. Additionally, adding styles requires adding a CssStyle sheet and you’ll see this as a helper method too. Other helper methods handle getting cell references, getting cell indexes (based on the alphanumeric references), providing date time formats, etc.

/// <summary>
/// Parse a <see cref="List{T}" /> to an Excel workbook
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="list"></param>
/// <param name="map"></param>
/// <returns></returns>
public async Task<Stream> ParseToWorkBook<T>(List<T> list, List<string> excludedProps = null, string defaultStringNullValue = "N/A", Dictionary<string, string> map = null) where T : new()
{
    try
    {
        // Create a SpreadsheetDocument based on a stream.
        var stream = new MemoryStream();
        var document = await Task.Run(() => SpreadsheetDocument.Create(stream, SpreadsheetDocumentType.Workbook, false));

        // Get the workbook
        var workbookPart = document.AddWorkbookPart();
        var workbook = new Workbook();
        var fileVersion = new FileVersion { ApplicationName = "Microsoft Office Excel" };
        workbookPart.Workbook = workbook;

        // Add our bold style sheet
        AddStyleSheet(document);

        // Create the worksheet and its data
        var worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
        var worksheet = new Worksheet();
        worksheetPart.Worksheet = worksheet;

        // Freeze the first row with SheetView
        var sheetViews = new SheetViews();
        var sheetView = new SheetView() { TabSelected = true, WorkbookViewId = (UInt32Value)0U };

        var pane = new Pane()
        {
            TopLeftCell = "A2",
            VerticalSplit = 1D,
            ActivePane = PaneValues.BottomLeft,
            State = PaneStateValues.Frozen
        };
        var selection = new Selection() { Pane = PaneValues.BottomLeft };
        sheetView.Append(pane);
        sheetView.Append(selection);
        sheetViews.Append(sheetView);
        worksheet.Append(sheetViews);

        // Append SheetData
        var sheetData = new SheetData();
        worksheet.AppendChild(sheetData);

        // Create a sheets collection
        workbook.Append(fileVersion);
        var sheets = workbook.AppendChild(new Sheets());

        // Created a new named sheet
        var sheet = new Sheet()
        {
            Id = document.WorkbookPart.GetIdOfPart(worksheetPart),
            SheetId = 1,
            Name = "Sheet 1"
        };

        // Append the sheet
        sheets.Append(sheet);

        //// Append the sheet
        //var worksheet = workbookPart.WorksheetParts.First().Worksheet;

        // Get the properties of the class to which we are mapping. For now, only support primitives
        var props = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance)
            .Select(prop =>
            {
                var displayAttribute = (DisplayAttribute)prop.GetCustomAttributes(typeof(DisplayAttribute), false).FirstOrDefault();
                return new
                {
                    Name = prop.Name,
                    DisplayName = displayAttribute == null ? prop.Name : displayAttribute.Name,
                    Order = displayAttribute == null || !displayAttribute.GetOrder().HasValue ? 999 : displayAttribute.Order,
                    PropertyInfo = prop,
                    PropertyType = prop.PropertyType,
                    HasDisplayName = displayAttribute != null
                };
            })
        .Where(prop => !(excludedProps?.Contains(prop.Name) ?? false) && !string.IsNullOrWhiteSpace(prop.DisplayName))
        .ToList();

        // Create the SharedStringTablePart
        var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault() ??
            workbookPart.AddNewPart<SharedStringTablePart>();

        // Keep track of our row index
        uint rowIndex = 0;

        // Create the header row
        var headerRow = new Row() { RowIndex = ++rowIndex };

        // Add the header columns using the string table
        var colIndex = 0;
        foreach (var prop in props)
        {
            var strIndex = InsertSharedStringItem(prop.DisplayName, stringTable);
            // Create the cell and ensure CellReference is set; otherwise, parsing is goofy
            var cell = new Cell
            {
                DataType = CellValues.SharedString,
                CellValue = new CellValue($"{strIndex}"),
                StyleIndex = Convert.ToUInt32(1),
                CellReference = new StringValue(GetCellReference(colIndex++))
            };

            headerRow.AppendChild(cell);
        }

        sheetData.AppendChild(headerRow);

        var numericTypes = new List<Type>() {
        typeof(int?), typeof(int), typeof(short?), typeof(short), typeof(decimal?), typeof(decimal),
        typeof(double?), typeof(double), typeof(DateTime?), typeof(DateTime)
    };

        var dateTypes = new List<Type>() { typeof(DateTime?), typeof(DateTime) };
        var booleanTypes = new List<Type>() { typeof(bool?), typeof(bool) };

        // Now let's add the values
        foreach (T item in list)
        {
            var row = new Row() { RowIndex = ++rowIndex };

            colIndex = 0;
            foreach (var prop in props)
            {
                // Note that we could move this up to the prop creation
                var cellDataType = CellValues.SharedString;
                var isNumeric = numericTypes.Contains(prop.PropertyType);

                if (dateTypes.Contains(prop.PropertyType))
                {
                    cellDataType = CellValues.Date;
                }
                else if (booleanTypes.Contains(prop.PropertyType))
                {
                    cellDataType = CellValues.Boolean;
                }

                // Create the cell and ensure CellReference is set; otherwise, parsing is goofy
                var cell = new Cell
                {
                    CellReference = new StringValue(GetCellReference(colIndex++, (int)(row.RowIndex.Value - 1)))
                };

                var cellValue = prop.PropertyInfo.GetValue(item, null);

                switch (cellDataType)
                {
                    case CellValues.SharedString:
                        {
                            var strValue = $"{cellValue}";
                            strValue = string.IsNullOrWhiteSpace(strValue) ? defaultStringNullValue : strValue;
                            cell.DataType = CellValues.SharedString;
                            var strIndex = InsertSharedStringItem(strValue, stringTable);
                            cell.CellValue = new CellValue($"{strIndex}");
                            break;
                        }
                    case CellValues.Date:
                        {
                            cell.DataType = CellValues.Number;
                            cell.StyleIndex = 4;
                            if (DateTime.TryParse($"{cellValue}", out DateTime dateValue))
                            {
                                cell.CellValue = new CellValue($"{dateValue.ToOADate()}");
                            }
                            break;
                        }
                    case CellValues.Boolean:
                        {
                            cell.DataType = CellValues.Boolean;
                            if (bool.TryParse($"{cellValue}", out bool boolValue))
                            {
                                var boolStrVal = boolValue ? "1" : "0";
                                cell.CellValue = new CellValue(boolStrVal);
                            }
                            break;
                        }
                    default:
                        {
                            cell.DataType = CellValues.Number;
                            cell.CellValue = new CellValue($"{cellValue}");
                            break;
                        }
                }

                row.AppendChild(cell);
            }

            sheetData.AppendChild(row);
        }

        // Save the workbook and close the document            
        worksheetPart.Worksheet.Save();
        workbookPart.Workbook.Save();
        document.Close();

        // Move the stream to the beginning
        stream.Seek(0, SeekOrigin.Begin);
        return stream;
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error parsing list to Excel workbook");
        throw;
    }
}

The next method is the ParseToList (List<T>) method. It’s basically the reverse of the previous method. However, we have to take care for parsing cell values to the appropriate C# property methods. We use a bit of reflection and TryParse/ChangeType methods to perform conversions.

/// <summary>
/// Parse a stream containing an Excel document to <see cref="List{T}" />
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="stream"></param>
/// <param name="defaultStringNullValue"></param>
/// <param name="map"></param>
/// <returns></returns>
public async Task<List<T>> ParseToList<T>(Stream stream, string defaultStringNullValue = "N/A", Dictionary<string, string> map = null) where T : new()
{
    try
    {
        // Open a SpreadsheetDocument based on a stream.
        var document = await Task.Run(() => SpreadsheetDocument.Open(stream, false));

        // Get the first worksheet
        var workbookPart = document.WorkbookPart;
        var worksheet = workbookPart.WorksheetParts.First().Worksheet;
        var rows = worksheet.Descendants<Row>();
        var cols = worksheet.Descendants<Column>();

        // Get the properties of the class to which we are mapping
        var props = typeof(T).GetProperties()
            .Select(prop =>
            {
                var displayAttribute = (DisplayAttribute)prop.GetCustomAttributes(typeof(DisplayAttribute), false).FirstOrDefault();
                return new
                {
                    Name = prop.Name,
                    DisplayName = displayAttribute == null ? prop.Name : displayAttribute.Name,
                    Order = displayAttribute == null || !displayAttribute.GetOrder().HasValue ? 999 : displayAttribute.Order,
                    PropertyInfo = prop,
                    PropertyType = prop.PropertyType,
                    HasDisplayName = displayAttribute != null
                };
            })
        .Where(prop => !string.IsNullOrWhiteSpace(prop.DisplayName))
        .ToList();

        var firstRow = rows.FirstOrDefault();
        var firstRowCells = firstRow.Descendants<Cell>();

        var columns = new List<ExcelMap>();
        var retList = new List<T>();

        // Assume first row has column names
        for (var colIndex = 0; colIndex < firstRowCells.Count(); colIndex++)
        {
            var colCell = firstRowCells.ElementAt(colIndex);
            var cellRef = colCell.CellReference ?? GetCellReference(colIndex);
            var cellValue = GetCellValue(colCell, workbookPart);

            if (!string.IsNullOrWhiteSpace(cellValue))
            {
                columns.Add(new ExcelMap()
                {
                    Name = cellValue,
                    MappedTo = map == null || map.Count == 0 ?
                        cellValue :
                        map.ContainsKey(cellValue) ? map[cellValue] : string.Empty,
                    Index = colIndex,
                    ColIndex = GetColumnIndex(cellRef) ?? colIndex,
                    CellReference = cellRef.ToString()
                });
            }
        }

        var rowCount = rows.Count();

        for (var rowIndex = 1; rowIndex < rowCount; rowIndex++)
        {
            var row = rows.ElementAt(rowIndex);
            var rowCells = row.Descendants<Cell>().ToList();
            var item = new T();

            columns.ForEach(column =>
            {
                var colIndex = GetColumnIndex(column.CellReference) ?? column.Index;

                var cell = rowCells.FirstOrDefault(x => GetColumnIndex(x.CellReference) == colIndex);
                //var cell = rowCells[colIndex];
                var valueStr = cell != null ? GetCellValue(cell, workbookPart) : string.Empty;

                // In case Display attributes are used, and no map is passed in, we'll handle checking
                // for both prop Name and DisplayName
                var prop = string.IsNullOrWhiteSpace(column.MappedTo) ?
                    null :
                    props
                        .First(p => p.Name.Contains(column.MappedTo) ||
                            (p.HasDisplayName && string.Equals(p.DisplayName?.Trim(), column.MappedTo?.Trim(), StringComparison.OrdinalIgnoreCase)
                        ));

                if (prop != null)
                {
                    var propertyType = prop.PropertyType;
                    object parsedValue = null;

                    if (propertyType == typeof(int?) || propertyType == typeof(int))
                    {
                        if (!int.TryParse(valueStr, out int val))
                        {
                            val = default;
                        }

                        parsedValue = val;
                    }
                    else if (propertyType == typeof(short?) || propertyType == typeof(short))
                    {
                        if (!short.TryParse(valueStr, out short val))
                            val = default;
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(long?) || propertyType == typeof(long))
                    {
                        if (!long.TryParse(valueStr, out long val))
                            val = default;
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(decimal?) || propertyType == typeof(decimal))
                    {
                        if (!decimal.TryParse(valueStr, out decimal val))
                            val = default;
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(double?) || propertyType == typeof(double))
                    {
                        if (!double.TryParse(valueStr, out double val))
                            val = default;
                        parsedValue = val;
                    }
                    else if (propertyType == typeof(DateTime?) || propertyType == typeof(DateTime))
                    {
                        if (DateTime.TryParse(valueStr, out DateTime val))
                        {
                            parsedValue = val;
                        }
                        else if (propertyType == typeof(DateTime))
                        {
                            parsedValue = default(DateTime);
                        }
                    }
                    else if (propertyType.IsEnum)
                    {
                        try
                        {
                            parsedValue = Enum.ToObject(propertyType, int.Parse(valueStr));
                        }
                        catch
                        {
                            parsedValue = Enum.ToObject(propertyType, 0);
                        }
                    }
                    else if (propertyType == typeof(string))
                    {
                        parsedValue = string.Equals(defaultStringNullValue, valueStr?.Trim(), StringComparison.OrdinalIgnoreCase) ?
                            null : valueStr;
                    }
                    else
                    {
                        try
                        {
                            parsedValue = Convert.ChangeType(valueStr, propertyType);
                        }
                        catch
                        {
                            parsedValue = valueStr;
                        }
                    }

                    try
                    {
                        prop.PropertyInfo.SetValue(item, parsedValue);
                    }
                    catch (Exception ex)
                    {
                        // Indicate parsing error on row?
                    }
                }
            });

            retList.Add(item);
        }

        return retList;
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error parsing Excel file");
        throw;
    }
}

Below are all of the helper methods referenced. I consider their function is relatively obvious.

private static int? GetColumnIndex(string cellReference)
{
    if (string.IsNullOrWhiteSpace(cellReference)) { return null; }

    //remove digits
    string columnReference = Regex.Replace(cellReference.ToUpper(), @"[\d]", string.Empty);

    int columnNumber = -1;
    int mulitplier = 1;

    //working from the end of the letters take the ASCII code less 64 (so A = 1, B =2...etc)
    //then multiply that number by our multiplier (which starts at 1)
    //multiply our multiplier by 26 as there are 26 letters
    foreach (char c in columnReference.ToCharArray().Reverse())
    {
        columnNumber += mulitplier * ((int)c - 64);
        mulitplier *= 26;
    }

    //the result is zero based so return columnnumber + 1 for a 1 based answer
    //this will match Excel's COLUMN function
    return columnNumber + 1;
}

private static string GetCellReference(int columnIndex, int rowIndex = 0, bool zeroBased = true)
{
    string columnName = "";
    int columnNumber = zeroBased ? (columnIndex + 1) : columnIndex;
    int rowNumber = zeroBased ? (rowIndex + 1) : rowIndex;

    while (columnNumber > 0)
    {
        int modulo = (columnNumber - 1) % 26;
        columnName = Convert.ToChar('A' + modulo) + columnName;
        columnNumber = (columnNumber - modulo) / 26;
    }

    return $"{columnName}{rowNumber}";
}

private static string GetCellValue(Cell cell, WorkbookPart workbookPart)
{
    var cellValue = cell.InnerText;

    // Move this to a shared method
    if (cell.DataType != null)
    {
        switch (cell.DataType.Value)
        {
            case CellValues.SharedString:
                {
                    var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                    cellValue = stringTable != null ?
                        stringTable.SharedStringTable.ElementAt(int.Parse(cellValue)).InnerText :
                        cellValue;
                    break;
                }
            case CellValues.Boolean:
                {
                    cellValue = string.Equals("0", cellValue) ?
                        "False" : "True";
                    break;
                }
        }
    }
    else if (cell.StyleIndex != null)
    {
        // Using StyleIndex to detect dates
        CellFormat cellFormat = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats.ChildElements[int.Parse(cell.StyleIndex.InnerText)] as CellFormat;
        if (cellFormat != null)
        {
            var dateFormat = GetDateTimeFormat(cellFormat.NumberFormatId.Value);
            if (!string.IsNullOrWhiteSpace(dateFormat))
            {
                // Dates are stored as epoch offsets
                if (double.TryParse(cellValue, out double val))
                {
                    cellValue = DateTime.FromOADate(val).ToString(dateFormat);
                }
            }
        }
    }

    return cellValue;
}

// Given text and a SharedStringTablePart, creates a SharedStringItem with the specified text 
// and inserts it into the SharedStringTablePart. If the item already exists, returns its index.
private static int InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
    // If the part does not contain a SharedStringTable, create one.
    if (shareStringPart.SharedStringTable == null)
    {
        shareStringPart.SharedStringTable = new SharedStringTable();
    }

    int i = 0;

    // Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
    foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
    {
        if (string.Equals(item.InnerText, text, StringComparison.Ordinal))
        {
            return i;
        }

        i++;
    }

    // The text does not exist in the part. Create the SharedStringItem and return its index.
    shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
    shareStringPart.SharedStringTable.Save();

    return i;
}

private static WorkbookStylesPart AddStyleSheet(SpreadsheetDocument spreadsheet)
{
    WorkbookStylesPart stylesheet = spreadsheet.WorkbookPart.AddNewPart<WorkbookStylesPart>();

    Stylesheet workbookstylesheet = new Stylesheet();

    Font font0 = new Font();         // Default font

    Font font1 = new Font();         // Bold font
    Bold bold = new Bold();
    font1.Append(bold);

    Fonts fonts = new Fonts();      // <APENDING Fonts>
    fonts.Append(font0);
    fonts.Append(font1);

    // <Fills>
    Fill fill0 = new Fill();        // Default fill

    Fills fills = new();      // <APENDING Fills>
    fills.Append(fill0);

    // <Borders>
    Border border0 = new();     // Default border

    Borders borders = new();    // <APENDING Borders>
    borders.Append(border0);

    // CellFormats
    CellFormat cellformat0 = new() { FontId = 0, FillId = 0, BorderId = 0 }; // Default style : Mandatory | Style ID = 0
    CellFormat cellformat1 = new CellFormat() { FontId = 1 };  // Style with Bold text ; Style ID = 1

    // Append CellFormats
    CellFormats cellformats = new CellFormats();
    cellformats.Append(cellformat0);
    cellformats.Append(cellformat1);

    // Append FONTS, FILLS , BORDERS & CellFormats to stylesheet <Preserve the ORDER>
    workbookstylesheet.Append(fonts);
    workbookstylesheet.Append(fills);
    workbookstylesheet.Append(borders);
    workbookstylesheet.Append(cellformats);

    // Finalize
    stylesheet.Stylesheet = workbookstylesheet;
    stylesheet.Stylesheet.Save();

    return stylesheet;
}

//// https://msdn.microsoft.com/en-GB/library/documentformat.openxml.spreadsheet.numberingformat(v=office.14).aspx
private readonly static Dictionary<uint, string> DateFormatDictionary = new()
{
    [14] = "dd/MM/yyyy",
    [15] = "d-MMM-yy",
    [16] = "d-MMM",
    [17] = "MMM-yy",
    [18] = "h:mm AM/PM",
    [19] = "h:mm:ss AM/PM",
    [20] = "h:mm",
    [21] = "h:mm:ss",
    [22] = "M/d/yy h:mm",
    [30] = "M/d/yy",
    [34] = "yyyy-MM-dd",
    [45] = "mm:ss",
    [46] = "[h]:mm:ss",
    [47] = "mmss.0",
    [51] = "MM-dd",
    [52] = "yyyy-MM-dd",
    [53] = "yyyy-MM-dd",
    [55] = "yyyy-MM-dd",
    [56] = "yyyy-MM-dd",
    [58] = "MM-dd",
    [165] = "M/d/yy",
    [166] = "dd MMMM yyyy",
    [167] = "dd/MM/yyyy",
    [168] = "dd/MM/yy",
    [169] = "d.M.yy",
    [170] = "yyyy-MM-dd",
    [171] = "dd MMMM yyyy",
    [172] = "d MMMM yyyy",
    [173] = "M/d",
    [174] = "M/d/yy",
    [175] = "MM/dd/yy",
    [176] = "d-MMM",
    [177] = "d-MMM-yy",
    [178] = "dd-MMM-yy",
    [179] = "MMM-yy",
    [180] = "MMMM-yy",
    [181] = "MMMM d, yyyy",
    [182] = "M/d/yy hh:mm t",
    [183] = "M/d/y HH:mm",
    [184] = "MMM",
    [185] = "MMM-dd",
    [186] = "M/d/yyyy",
    [187] = "d-MMM-yyyy"
};

private static string GetDateTimeFormat(UInt32Value numberFormatId)
{
    return DateFormatDictionary.ContainsKey(numberFormatId) ? DateFormatDictionary[numberFormatId] : string.Empty;
}

And that’s it. It’s a lot of code to digest, but it’s very similar to the old code that utilized EPPlus.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.