Using RegEx to extract MIME parts from a Microsoft Graph API email stream

In this article I will demonstrate how a Regular Expression can be used to extract all MIME content references from within an HTML Stream.

Introduction

In the previous article we looked at how the base64 encoded version of an embedded MIME Image can be extracted from the Microsoft Graph. In this article we will start to look at how we are going to  automate the solving of that problem by identifying all the MIME encoded images from within the graph API HTML stream.

Regular expressions

RegEx is mentally challenging to most of us, and that is why some beautiful people created Stackoverflow and Google. I was able to find this solution to a similar problem of extracting tag attributes from an HTML string.

Modifying this slightly for my needs in context of the graph I was able to come up with the following:

var content = data.body.content;
var cids = content.match(/cid["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)?/g);

and this will take an HTML stream with different kinds of MIME reference and return them all as an array.

Here is the sample HTML string with just the images highlightedc1

and here’s the result of the test in firebug logging the cids array

c2

 

Conclusion

In this article we have seen that with a simple Regular expression we can extract the Image src attributes relating to the MIME parts within the MS Graph API feed.

Caveat: This assumes a lot about the structure of the API and that is will continue to conform to this structure.