Header image for the post titled Power Query: My Custom Functions

Let’s face it, Microsoft Excel is the most impactful data analysis tool in existence. And since 2013, its impact has grown dramatically with the introduction of Power Query, a data refinement and transformation tool that allows the users to create analysis and reporting pipelines from a variety of sources. And while Power Query’s built-in functionality is quite… err… powerful, advanced users will quickly find the need to develop custom solutions. In this note, I keep a constantly growing collection of Power Query custom functions that I use across most of my projects.

M formula language

At the heart of Power Query lies a formula language called M. It’s a functional language specifically crafted for data transformation tasks. While Power Query provides a user-friendly interface for data manipulation, M empowers users with the ability to customize and extend these transformations beyond the built-in tools.

Importing custom M functions

To import custom M functions into Power Query, you can follow these steps:

  1. Save your custom functions as an M query file (.pq) or as a Power BI template (.pbit) if you’re working in Power BI.

  2. In Excel, go to the Data tab and click on “Get Data” > “Get Data From File” > “From Workbook.” In Power BI, go to the “Home” tab and click on “Transform Data.”

  3. Browse to the location where you saved your custom function file and select it. Click “Import” or “Load” to bring it into the Power Query Editor.

  4. Once imported, your custom function will appear as a query in the Queries pane of the Power Query Editor. You can now then use it like any other function within Power Query.

  5. (Optional) You can organize your custom functions by creating a separate folder or group within the Queries pane and moving your function query into it. This helps keep your workspace tidy and makes it easier to locate your functions when needed.

Custom Functions

Combining Tables

The ability to combine data tables is a critical piece of Power Query functionality, but I find that the built-in process is quite sub-optimal. The biggest drawback lies in the fact that it forces you to explicitly define the names of the columns that you are combining, making your pipelines very inflexible and prone to breakage. The below function achieves the same task, but in a much more flexible way. It does not expect a list of columns to combine, which means that if your data source gains or loses a column, your pipeline will still work without any issues (at least related to combining tables).

Function “CombineTables”:

= (TableColumn as list, optional SourceNameColumn as list) =>
let
  AddIDs = 
    if 
    SourceNameColumn=null 
    then 
      TableColumn 
    else 
      let
        ZipNames = 
          List.Zip({TableColumn, SourceNameColumn}),
        AddColumnFunction = 
          (ListIn as list) => 
          Table.AddColumn(ListIn{0}, "Source", each ListIn{1}),
        AddColumns = 
          List.Transform(ZipNames, each AddColumnFunction(_))
      in
        AddColumns,
  Combine = Table.Combine(AddIDs)
in
  Combine

Usage:

= CombineTables(#"Previous Step"[#"Column with Tables"], #"Previous Step"[#"Optional Column with Source name"])

To use this function, first create a table containing (at least) two columns: a column containing the Tables to combine, and an optional “Source” column that somehow describes the origin of each specific Table (e.g. the filename from which the table is derived). Such tables are generated by default in many Power Query built-in data import procedures, including “Import from Folder”. The function simply takes the references to these two columns, and outputs a row-wise concatenation of tables.

Remove Empty Columns

Sometimes, when importing or combining a messy set of tables, you end up with a set of completely empty columns. While you can hunt them down and delete them manually, why not delegate it to a function? Just point it to a table from a past state, and poof, they are gone!

Function “RemoveEmptyCols”:

= (InTable as table) => let
      Reorganized = Table.FromColumns({Table.ColumnNames(InTable),Table.ToColumns(InTable)}),
      #"Added Custom" = Table.AddColumn(Reorganized, "IsNull", each if List.NonNullCount(List.Distinct([Column2]))=0 then "Yes" else "No"),
      #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([IsNull] = "No")),
      #"Clean Table" = Table.SelectColumns(InTable,Table.Column(#"Filtered Rows","Column1"))
  in
      #"Clean Table"

Usage:

= RemoveEmptyCols(#"Previous Step")

Get tidy with Trim

When dealing with messy strings you often end up with leading and trailing white spaces, and you would expect that a powerful data pipeline tool like Power Query would have built-in functionality to deal with that… but it doesn’t. Good thing there is a formula for that.

Function “TrimAll”:

= (text as text, optional char_to_trim as text) =>
let
    char = if char_to_trim = null then " " else char_to_trim,
    split = Text.Split(text, char),
    removeblanks = List.Select(split, each _ <> ""),
    result=Text.Combine(removeblanks, char)
in
    result

Usage:

=TrimAll(#"Previous Step"[Column], "_")

The second argument is optional, and defines a custom target for trimming. If left unspecified, the function will default to trimming spaces.

Get a value from an Excel named range

When using Power Query from Excel, sometimes you want to pull a specific value from within the spreadsheet into Power Query (e.g., a path to a file that you want to grab). While it is preferable to use Parameters to define such dynamic elements of your pipeline, it is often hard to ignore the familiar simplicity of changing a value in a spreadsheet cell. All you have to do is define such a cell as a named range (rangeName in our example), and then use the following function.

Function “GetNamedValue”:

(rangeName) => 
    Excel.CurrentWorkbook(){[Name=rangeName]}[Content]{0}[Column1]

Usage:

=GetNamedValue(rangeName)

Wrap-up

Custom M functions serve as invaluable assets in the toolkit of any Power Query user. By harnessing the flexibility and extensibility of M, these functions empower users to tailor their data transformation workflows to precise requirements, enhancing efficiency, scalability, and maintainability.