- Cross-platform support:
- Mac
- Windows
- Linux
- Easy to use, just replace
base_urlwithhttp://localhost:9988after launching to start debugging; - Captures complete requests, including the “scene of the accident” when network errors occur;
- Quickly search and view request information using
request_idandchatcmpl_id; - One-click export of BadCase structured reporting data, helping to enhance Kimi’s model capabilities;
Installation Methods
Using the go Command to Install
If you have the go toolchain installed, you can run the following command to install MoonPalace:
$GOPATH/bin/ directory. Run the moonpalace command to check if it has been installed successfully:
moonpalace binary file, try adding the $GOPATH/bin/ directory to your $PATH environment variable.
Downloading from the Releases Page
You can download the precompiled binary (executable) files from the Releases page:- moonpalace-linux
- moonpalace-macos-amd64 for Intel-based Macs
- moonpalace-macos-arm64 for Apple Silicon-based Macs
- moonpalace-windows.exe
$PATH environment variable. Rename it to moonpalace and then grant it executable permissions.
Usage
Starting the Service
Use the following command to start the MoonPalace proxy server:--port parameter specifying the local port that MoonPalace will listen on. The default value is 9988. When MoonPalace starts successfully, it will output:
base_url with the displayed address. If you are using the default port, set base_url=http://127.0.0.1:9988/v1. If you are using a custom port, replace base_url with the displayed address.
Additionally, if you want to always use a debugging api_key during debugging, you can use the --key parameter when starting MoonPalace to set a default api_key for MoonPalace. This way, you don’t have to manually set the api_key in each request. MoonPalace will automatically add the api_key you set with --key when requesting the Kimi API.
If you have correctly set base_url and successfully called the Kimi API, MoonPalace will output the following information:
stderr to a file).
Note: In the logs, the value of the Msh-Request-Id field in the Response Headers corresponds to the --requestid parameter in the Search Request and Export Request sections below. The id in the Response corresponds to the --chatcmpl parameter, and last_insert_id corresponds to the --id parameter.
max_tokens value.
Enabling Repeated Content Output Detection
MoonPalace offers a feature to detect repeated content output from the Kimi large language model. Repeated content output refers to the model continuously outputting a specific word, sentence, or blank character without stopping before reaching themax_tokens limit. This can lead to additional Token costs when using more expensive models like moonshot-v1-128k. Therefore, MoonPalace provides the --detect-repeat option to enable repeated content output detection, as shown below:
--detect-repeat option, MoonPalace will interrupt the output of the Kimi large language model and log the following message when it detects repeated content:
--detect-repeat option only interrupts the output in streaming mode (stream=True). It does not apply to non-streaming output.
You can adjust MoonPalace’s blocking behavior using the --repeat-threshold and --repeat-min-length parameters:
- The
--repeat-thresholdparameter sets MoonPalace’s tolerance for repeated content. A higher threshold means lower tolerance, and repeated content will be blocked more quickly. The range is 0 threshold 1. - The
--repeat-min-lengthparameter sets the minimum number of characters before MoonPalace starts detecting repeated content. For example, —repeat-min-length=100 means that repeated content detection will only start when the output exceeds 100 UTF-8 characters.
Enabling Forced Streaming Output
MoonPalace provides the--force-stream option to force all /v1/chat/completions requests to use streaming output mode:
stream field in the request parameters to True. When receiving a response, it will automatically determine the response format based on whether the caller has set stream:
- If the caller has set
stream=True, the response will be returned in streaming format without any special handling by MoonPalace. - If the caller has not set
streamor has setstream=False, MoonPalace will concatenate all the streaming data chunks into a complete completion structure and return it to the caller after receiving all the data chunks.
--force-stream option will not affect the Kimi API response content you receive. You can still use your original code logic to debug and run your program. In other words, enabling the --force-stream option will not change or break anything. You can safely enable this option.
Why provide this option?
We initially hypothesize that common network connection errors and timeouts (Connection Error/Timeout) occur because, in non-streaming request scenarios (stream=False), intermediate gateways or proxy servers may have set read_header_timeout or read_timeout. This can cause the gateway or proxy server to disconnect while the Kimi API server is still assembling the response (since no response, or even the response header, has been received), resulting in Connection Error/Timeout. We added the--force-streamparameter to MoonPalace. When starting withmoonpalace start --force-stream, MoonPalace converts all non-streaming requests (stream=False or unset) to streaming requests. After receiving all data chunks, it assembles them into a complete completion response structure and returns it to the caller. For the caller, you can still use the non-streaming API as before. However, after MoonPalace’s conversion, it can reduce Connection Error/Timeout issues to some extent because MoonPalace has already established a connection with the Kimi API server and started receiving streaming data chunks.
Retrieving Requests
After MoonPalace is started, all requests routed through MoonPalace are recorded in an sqlite database located at$HOME/.moonpalace/moonpalace.sqlite. You can directly connect to the MoonPalace database to query the specific content of the requests, or you can use the MoonPalace command-line tool to query the requests:
list command to view the content of the most recent requests. By default, it displays fields that are easy to search, such as id/chatcmpl/request_id, as well as status/server_timing/requested_at for checking the request status. If you want to view a specific request, you can use the inspect command to retrieve it:
inspect command does not print the body of the request and response. If you want to print the body, you can use the following command:
Exporting Requests
If you find that a request does not meet your expectations, or if you want to report a request to Moonshot AI (whether it’s a Good Case or a Bad Case, we welcome both), you can use theexport command to export a specific request:
id/chatcmpl/requestid is the same as in the inspect command, used to retrieve a specific request. The --good/--bad options are used to mark the request as a Good Case or a Bad Case. The --tag option is used to add relevant tags to the request. For example, in the example above, we assume that the request is related to the Python programming language, so we add two tags: code and python. The --directory option specifies the path to the directory where the exported file will be saved.
The content of the successfully exported file is: