Generative AI in Investment Management: Navigating the jagged frontier of AI

Key takeouts

Corporates are succeeding with practical implementation of Gen AI, despite high-profile failures and a current “trough of disillusionment”.
AI tools significantly impact workplace productivity, with BCG’s study showing dramatic improvements in task quality and a notable “levelling effect” that helps less skilled workers catch up to their more experienced colleagues when embracing AI tools.
The future of AI in investment management demands new skills and adaptability. Investment professionals must learn to balance AI usage – knowing when to use it, what to use it for, and how much to rely on it – while maintaining rigorous standards and transparency.
Tools, like our MAISY, demonstrate carefully designed, domain-specific AI can enhance investment research and decision-making when combined with proper oversight and validation.

Visit STANLIB’s News & Insights page for more articles

Overview

From suggesting inappropriate pizza toppings to wiping billions off tech giants’ market caps, Generative AI’s early stumbles have been as spectacular as its promises. After a year of hands-on experience with STANLIB Multi-Asset’s bespoke AI tool MAISY (Multi-Asset AI System), we’ve gained unique insights into both the pitfalls and potential of this transformative technology. Our journey reflects the broader industry’s evolution: initial excitement, inevitable setbacks, and ultimately, the emergence of practical, value-adding tools. We believe the key to success, lies not in perfect AI, but in understanding its limitations while leveraging its strengths.

The early public failures of Generative AI are inevitable when understanding the hype cycle of new technologies. Yet for those brave enough to start, practical applications are meeting corporate expectations. Within STANLIB Multi-Asset, one such development is our private research chatbot MAISY. Unlike off the shelf models, this AI system is fed daily with curated data including our meeting transcripts designed to enhance research productivity.

The benefits of integrating AI into investment research, include the ability to summarize large amounts of data, generate ideas, and provide counterarguments to test insights. We share our challenges in the journey such as ensuring responsible AI, data security, and transparency.

We reflect on this transformative journey and how the investment management industry will work in the future. Developing skills on the ‘what’, ‘when’ and ‘how much’ of AI usage; emphasizing the importance of human judgment, keeping humans in the loop; and the need for continuous learning and adaptation as the technologies evolve.

Oops-a-daisy

Chat-GPT bursting on the scene in November 2022 captured all our imaginations of what was possible with Generative AI.

But no-one likes a smart-alec, and so the public failures of this new technology have ranged from tragic to comical. To name a few, from airline Air Canada held liable for its chatbot giving a bereaved passenger bad advice, a NZ meal planner app suggesting a chlorine gas recipe, Microsoft Travel suggesting a charity food bank as a tourist hotspot to visit, MacDonalds AI ordering system from IBM retracted after odd orders like bacon ice cream, to Google suggesting glue as a pizza topping (at least the AI said make sure the glue was non-toxic).[i] Some of these errors have had real financial impact, such as Bard’s big reveal launch factual errors wiping $100m of the Google/Alphabet’s price and forcing a rebrand. [ii]

Welcome to the trough

So, is it rubbish? The enthusiasts will point to continuous improvement in models, widening adoption, guard rails preventing inappropriate content, and progress with text to video generation. The sceptics will point to the failures, potential diseconomies of scale, and high costs in processing and energy. Certainly, we are seeing dialling back of expectations, but this is to be expected.

In Figure 1, Gartner’s Hype Cycle illustrates how perceptions of new technologies change. Where initially too much is expected from a new technology as it is hyped up. There is an inevitable shake-out, before we settle into a higher level of productivity. Generative AI is going through such “growing pains”

Source: A visualisation of Gartner’s Hype Cycle, a methodology for describing how new technologies, and the perceptions of them, change as they emerge.

https://en.wikipedia.org/wiki/Gartner_hype_cycle

Nevertheless, apart from initial missteps there are clear signs of traction by corporates. Morgan Stanley commissioned a survey on Gen AI adoption and found around 50% of projects met corporate expectations and another 40% exceeded expectations. For those companies who had not taken the plunge the initial hurdles were data security and fear of brand damage. [iii]

Boston Consulting Group deliberately tested the impact of AI access (GPT 4) in their work, across 18 realistic work tasks over 7% of their workforce.[iv] Figure 2 shows their dramatic findings. For tasks where AI would benefit (dubbed inside the frontier) there were substantial improvements in quality. The big surprise was AI usage’s ability to be a great skill leveller. The less skilled had a 43% improvement catching up to top skilled colleagues, and their work surpassed the non-augmented. There is a great incentive for knowledge workers to embrace AI for upskilling and productivity benefits.

Don’t be disillusioned

We believed that this would be a transformative technology, and the best way to assess it was to take the plunge. At STANLIB Multi-Asset, our remit is to look across asset-classes and so our process lends itself to top-down approaches. We appreciated Generative AI’s ability to summarise large amounts of textual data to increase information consumption, and to play a supporting role in idea generation and validity testing.

Standard Large Language Models (LLM) can be limited to training dates and to generally available content on the web. A big hurdle in creating informed, high quality investment answers is you need both timeous investment data and access to your own private research data. A generalised public model is not good enough. Fortunately, there is an approach to do exactly that, called Retrieval Augmented Generation, or RAG, for short. This allows one to boost a query to a private LLM with one’s own up-to-date investment information to get a better (augmented) answer.

RAGing against the machine?

The end result is a private research chatbot, MAISY, for our Multi-Asset investment team to increase research productivity. First, with a RAG approach one is curating one’s own research pool of data to draw from. We gather not only independent investment research, but our own internal insights and even transcripts of investment meetings at team and wider business level. These transcripts are valuable as they capture investment intuition and experience plus any “off the wall” musings. This gives richer answers than a general off the shelf LLM could provide. Secondly, LLMs multibillion parameter models have a substantial deal of embedded knowledge, so you do not have to explain investment jargon. For example, it knows that Fed means the US Federal Reserve Central Bank, and what the ECB is. LLMs also allows you substantial flexibility in output whether you want to tabulate, contrast approaches, summarise, list or do a SWOT analysis! The tool can also play a helpful devil’s advocate in the team presenting counterarguments to a held view or to list disadvantages (with the added benefit no-one’s feelings are hurt). We wanted this as it aligns with our team’s values of seeking out alternative perspectives and approaches. This ability to take the opposite side of a view is an input leading to more robust investment decisions which benefits our clients and not just our culture.

But it is not all smooth sailing!

Generative AI wants to Generate. Just because you get an answer does not guarantee it is good nor pure fiction. One needs to ensure one’s model has sufficient penalty against hallucinations. We would rather the system says ‘I don’t know’ rather than fabricate research.

We had many challenges to overcome and at times it felt like we spent a little time in the trough of disillusionment ourselves. We journeyed on the path out by ensuring sufficient quantum of data through automation, establishing tight data security, embedding a modular approach to be able to adapt to new models given the pace of change in this space, testing and then more testing! In our testing we noticed sometimes even simple things like dates and norms like “last quarter” can trip up some simpler models.

Adopting AI is not just a journey for an investment team but for a corporate itself. As we mentioned earlier, Morgan Stanley found hurdles of data security and brand concerns stymied adoption. We found it is important to bring the company along as you build these systems. Data security must be addressed and part of the design. Running your own private LLM means there is no data leakage back to an AI web service provider. Secure user access, encryption and going through POPIA risk assessments are all pit stops on this road.

This journey should be paved with Responsible AI principles covering ethical and governance considerations beyond normal design principles such as accuracy and secure IT design. Standard Bank has developed such a group policy to ensure transparency in how decisions are made, accountability for outcomes, and a commitment to enhancing rather than replacing human judgment.

One of the key challenges in implementing these principles is addressing AI’s notorious ‘black box’ reputation, particularly crucial in investment research where decisions need a reasonable and adequate basis. We tackle this head-on by revealing what source documents went into informing the generative AI answer. This visibility allows the investment professional to quickly and directly access the underlying information sources and dig deeper. This we believe will make us more productive and focused on idea generation.

Hiking up the slope of enlightenment

We are not alone in our thinking. Citi looked at the investment industry and see three waves in Gen AI:

internal models to improve productivity and operational efficiency.
AI in client-facing functions to help client communications and recommendations (but still with human oversight).
The third is investment co-pilots in fund management. This work will not happen overnight (nor for some time) as apart from better technology, it may require organisational change and consolidating client data.

There will also need to be personal change as we grow to build the investment professionals of tomorrow. The Boston Consulting group team found there are some tasks where AI can be helpful, and some where it can be a hindrance (some reading, business experience or simple search might be better.) [v] Learning how to navigate that frontier between the two will be a new skill to develop. Developing not only the “what”, but also the “when” of AI usage.

Observing a fellow quantitative investment professional working on code, one might see them jumping between AI and their own coding many times throughout the day, acting more like an indistinguishable “Cyborg” human and machine hybrid. An appreciation of the limitations of AI and specialist coding knowledge is still required but this switching approach is effective as coding little parcels can be more straightforward, and AI code can be tested. In yesteryears, one would be querying textbooks; then digitally scrolling through product documentation or wading through coding websites like Stack Overflow or blogs. Now generative AI can not only teach but also draft sample code, explain how it works, or help debug existing code or error messages. Generative AI is accelerating learning and execution.

Client facing investment specialists might forge a cleaner line between AI and human tasks. Being focused in-person when meeting or calling clients, but then relying on AI horsepower to do administrative tasks like first draft of meeting summaries and action lists later in the day. In this case acting more like a Centaur, mythical half-person, half-horse, strategically deciding which approach is best for the job at hand. Another skill will be “how much” AI as studies have shown that over relying on AI can lead to lazy and poor decision making.[vi] Getting this balance right is important to our team, to avoid crowding toward a consensus view – in our world differentiated or variant perceptions is how we generate excess returns.

The models themselves are also due to change the incline of this slope of enlightenment. Continuous development and technological change are shifting the goal posts. There is much excitement around multi-modal model, i.e. models that can handle multiple types of input such as text and pictures. So, in the future models will not only interpret the text written but also the charts and figures accompanying this investment research. It might take some time to be useful as we can expect general models to be better given the training data at recognising media celebrities and day-to-day objects than economic charts! Being trained with the right domain knowledge will always be important for successful AI work.

Another innovation is shifting to models that check…models. Sequoia Capital note the rise in reasoning models that check the answers that are generated. [vii] In the lingo of Daniel Kahneman’s Thinking Fast and Slow, we will see models that generate answers fast, and then a reasoning layer will stop and think and evaluate. This will lead to a shift from training time compute to inference time compute. This can help assess results but risks creating unnecessary overhead, as this extra work is not needed in every case. For example, with factual answers it is not useful (there is more certainty on the one answer) – but it could help when more judgement is necessary. The challenge is in domains of logic the ‘right’ answer is easier to assess but not in unstructured, creative work.

Apple researchers found Chat GPT stated success in standard high school maths tests collapsed by 65% when they varied the objects (e.g. asking about adding up kiwis not apples) and values (e.g. 12 not 4), plus added irrelevant clauses (e.g. some kiwis are smaller than others).[viii] This shows there is still some way to go from pattern recognition to formal logical reasoning.

Gen AI is just one type of machine learning tool. It should not be the only hammer in the toolbox. Sometimes a predictive machine learning model can be better providing more focused and clearer results. Gen AI models are also good at problems of correlation but not of calculation. Like any tool, Gen AI shines brightest when used for what it does best – finding patterns and connections – rather than trying to make it solve every problem.

Lessons we take for the future

Take the plunge.

Get started with AI. If you are sceptical, then that is good. At best not every answer is 100% correct, and at worst you can have a great laugh when it suggests things like glue on pizza.

Testing never ends.

One cannot be complacent about testing. Large Language Model development is not simple. Model versions can show inconsistency over time.[ix]

Trust but verify.

Even in controlled RAG models there remains small chances of hallucination. Humans also need to vigilant for bias and ethical responses, not just accuracy. We do not know how much toxic social media these models have been trained/poisoned on!

Evolve as models evolve.

But one can see from ranking tables that the position for best model is always in flux. See Chatbot Arena where people judge the responses.[x] The way we work will also need to evolve.

Human in the loop.

Be clear what should be under your control. AI can trip up so ensuring human evaluation is fundamental.

Conclusion

Futurist Roy Amara put this so well – “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” Our year with MAISY has taught us exactly this. While Generative AI hasn’t revolutionized investment management overnight. MAISY is certainly not making investment decisions for us! But as a helpful ‘team player’, it is steadily transforming and refining how we conduct research, challenge our assumptions, and enhance our decision-making process. The technology’s early stumbles have been valuable lessons in themselves. As we continue our journey up the slope of enlightenment, we are neither AI evangelists nor sceptics, but pragmatic adopters focused on one goal: harnessing this powerful tool to deliver better outcomes for our clients. The future of investment management isn’t about AI replacing human judgment – it’s about finding the sweet spot where technology and human expertise work in concert on this journey together.

The Technical Bit

This section outlines the ingredients in making our Gen AI tool MAISY (Multi-Asset AI System) work. The user logins into and only sees a web front-end, but there is quite a bit going on under the hood.

Data, it all starts with Data. We build a large repository of documents and mp4 meeting recordings. We dubbed this collection the “Wishing Well”. Data is swept into this daily. Curating this set of data is part of the richness beyond a general model. We think about data that is helpful to our investment style and process. For example, using meeting transcripts allows us to capture the team’s investment insights and off the wall exchanges beyond the written word.

Vector database. To later search this knowledge base one needs a good AI librarian. Vector databases store your data based on meaning and context. They convert documents into mathematical representations (vectors) that capture their essence, making it possible to find relevant information even when the wording isn’t exactly the same. For example, if you ask about “What’s the outlook for US interest rates?” it will look at documents covering “Fed policy”, “monetary easing”, and “future path of Federal funds rate” even though none of these phrases matched your initial query. But they are connected by meaning (close enough in vector space).

We update this daily with our latest documents so the database gets richer over time.

Retrieval model. This is an important semantic layer that helps to gather the most relevant documents to aid in the initial query. This model evolved through substantial testing. For example, we removed duplicate information to get a good spread of sources and worked to get the most recent views.

Private LLM. We host an open source model in the cloud that the augmented query will be channelled to. This runs on a GPU in the cloud to provide a speedy response. The user’s initial query, plus some prompting guidelines and the relevant chunks of documents are then sent securely to the LLM.

Web-frontend. We have a secure web portal where users can login and post questions to MAISY. So from a simple line of typing, the request is mapped using the vector embeddings to gather a set of pertinent and timeous documents to best answer the question. This set is refined before passing to the LLM to get a boosted response back to the user. The benefit is these answers are backed by our actual research rather than generic knowledge. We provide transparency to the user to allow to see them to view the list of relevant documents behind their query. We also provide SharePoint links so our investment professionals can drill down to do further research.

This is a field that is changing rapidly. We designed MAISY to have a modular nature allowing us to change the elements as technology develops. For example, testing and swopping out the LLM model or switching to new vector embedding protocols.

References:

[i] See lists of further AI errors here https://originality.ai/blog/ai-hallucination-factual-error-problems or https://tech.co/news/list-ai-failures-mistakes-errors.

[ii] Bard launch failure error (BBC ,“Google’s Bard AI bot mistake wipes $100bn off shares”, Feb 2023) https://www.bbc.com/news/business-64576225.

[iii] Morgan Stanley, GenAI Adoption – 2 Years In, 7 October 2024.

[iv] Boston Consulting Group paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321

[v] For more on the Jagged frontier idea see https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the-jagged and their paper here https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321

[vi] Asleep at the Wheel paper https://www.almendron.com/tribuna/wp-content/uploads/2023/09/falling-asleep-at-the-whee.pdf

[vii] See more here https://www.sequoiacap.com/article/generative-ais-act-o1/

[viii] Apple’s paper “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models”, https://arxiv.org/abs/2410.05229. Hat tip to Macrostrategy.

[ix] Stanford & Berkely researchers found ChatGPT performance on standard tasks changed within a short period. https://arxiv.org/pdf/2307.09009

[x] ChatBot Arena https://lmarena.ai/