In this article I will demonstrate how a Regular Expression can be used to extract all MIME content references from within an HTML Stream.
Introduction
In the previous article we looked at how the base64 encoded version of an embedded MIME Image can be extracted from the Microsoft Graph. In this article we will start to look at how we are going to automate the solving of that problem by identifying all the MIME encoded images from within the graph API HTML stream.
Regular expressions
RegEx is mentally challenging to most of us, and that is why some beautiful people created Stackoverflow and Google. I was able to find this solution to a similar problem of extracting tag attributes from an HTML string.
Modifying this slightly for my needs in context of the graph I was able to come up with the following:
var content = data.body.content; var cids = content.match(/cid["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)?/g);
and this will take an HTML stream with different kinds of MIME reference and return them all as an array.
Here is the sample HTML string with just the images highlighted
and here’s the result of the test in firebug logging the cids array
Conclusion
In this article we have seen that with a simple Regular expression we can extract the Image src attributes relating to the MIME parts within the MS Graph API feed.
Caveat: This assumes a lot about the structure of the API and that is will continue to conform to this structure.