@senspond

빅데이터/AI 🍎>프롬프트 엔지니어링 / LLM

LangChain - Output Parser 로 LLM 응답결과를 변형하기

등록일시 : 2024-01-03 (수) 10:27

업데이트 : 2024-01-03 (수) 10:27

오늘 조회수 : 7

총 조회수 : 2606

이번 글은 파이썬기반의 LangChain 프레임워크에서 Output Parser 로 LLM 응답결과를 변형하는 방법에 대한 내용입니다.

Output Parser란?

https://python.langchain.com/docs/modules/model_io/output_parsers/

LanChain Model I/O 의 기능으로 Language Model 의 Ouput 출력결과를 변형시키는 도구로 사용됩니다. 만약에 LLM의 출력결과를 데이터베이스 등에 저장해야 하는데 어떤 특정 자료구조로 변형해야 한다면 유용하게 사용 될 수 있습니다.

SimpleJsonOutputParser

from langchain.output_parsers.json import SimpleJsonOutputParser
from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI()
message = chat.predict("""
Return a JSON object with an `answer` key that answers the following question: 
what is you? """)

print(message, type(message))
jp = SimpleJsonOutputParser()

result = jp.parse(message)
print(result, type(result))

result['answer']

공식 사이트에 레퍼런스로 나와있는 SimpleJsonOutputParser를 참고해서 answer 를 key로 담은 Json 형태의 문자열로 응답해달라고 프롬프트를 작성하였습니다.

OuputParser 을 통해 python 자료구조인 딕셔너리 형태로 변환된것을 확인 할 수 있습니다.

그럼 만약에 OuputParser 로 가공할 수 있는 형태의 문자열이 아닌경우는 어떻게 될까요?

위처럼 아무런 응답을 주지 않습니다. 즉 이말은 LLM 모델에 프롬프트로 질의한 결과에 OuputParser 를 적용하면 항상 해당 형식의 출력일때만 결과를 얻을 수 있다는 말이 됩니다.

Custom Output Parser 만들기

다음과 같이 LLM에 질의하고 돌려받은 응답결과가 콤마 "," 로 구분된 메시지를 받는다고 가정해봅니다.

from langchain.schema import BaseOutputParser

# BaseOutputParser 을 상속받아서 구현
class CommaOutputParser(BaseOutputParser):
    
    """ parse 메소드를 반드시 정의해줘야 한다
        Can't instantiate abstract class CommaOutputParser with abstract method parse'
    """
    def parse(self, text):
        items = text.strip().split(",")
        return list(map(str.strip, items))

위와 같이 BaseOutputParser 을 상속받아 정의를 할 수 있습니다.

p = CommaOutputParser()
p.parse("banana ,apple,donut")

결과 : ['banana','apple','donut']

OpenAI 모델 응답데이터를 Custom Output Parser로 가공하기

OpenAI Playground

System 프롬프트

You are with a comma separated list generating machine.
Everything you are asked will be asnwered with a list of max 10 in lowercase.
DO NOT reply with anything else

이렇게 작성하면 사용자의 질문에 대해 10개의 목록을 콤마(",")로 구분하여 메시지를 작성해주는 것을 볼 수 있습니다.

LangChain Example

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages([
    ("system", """You are with a comma separated list generating machine. 
     Everything you are asked will be asnwered with a list of max {max_items} in lowercase. 
     DO NOT reply with anything else""")
    ,
    ("human", "{question}")
])
prompt = template.format_messages(max_items = 50, question = "What are the Planets?")

chat = ChatOpenAI(temperature=0.7)

result = chat.predict_messages(prompt)
print(result)

지난 글에 프롬프트 템플릿을 사용하여 프롬프트에 변수를 지정할 수 있다고 하였는데, 여기서 max_items과 question이 변수로 사용되었습니다.

결과 : content='mercury, venus, earth, mars, jupiter, saturn, uranus, neptune'

이제 이 응답결과를 위에서 정의한 CommaOutputParser 를 사용해서 가공해봅니다.

p = CommaOutputParser()
p.parse(result.content)

결과 : ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']

Use LCEL(LangChain Expand Language) Example

그런데 소스코드가 그렇게 간단하지가 않습니다. 그래서 LangChain 프레임워크에서는 LCEL라는 확장언어를 제공하여 좀더 간편하게 코드를 작성할 수 있도록 지원합니다.

| 연산자를 통해서 prompt 와 model, ouputparser 을 체이닝으로 연결할 수 있습니다.

사용예시 : prompt 또는 template | model | outputparser

chain = template | chat | CommaOutputParser()

이렇게 한줄의 코드로 정의하면 template 를 사용하여 chat 모델을 사용하고 CommaOutputParser 로 변형해서 응답하라는 chain 이 만들어 지게 됩니다.

chain.invoke({
    "max_items" :  50,
    "question" : "What are colors?"
})

그리고 invoke 메소드를 사용하여 호출 할 수가 있고 template 에 전달해야 하는 매개변수는 위처럼 할당하여 전달 할 수가 있습니다.

LCEL(LangChain Expand Language) 에 대해서는 추후에 한번 정리해보도록 하겠습니다.

senspond

안녕하세요. Red, Green, Blue 가 만나 새로운 세상을 만들어 나가겠다는 이상을 가진 개발자의 개인공간입니다.

댓글 ( 0 )

카테고리내 관련 게시글

현재글에서 작성자가 발행한 같은 카테고리내 이전, 다음 글들을 보여줍니다