Why are the results from the API different from those from the Kimi large language model?
The API and the Kimi large language model use the same underlying model. If you notice discrepancies in the output, you can try modifying the System Prompt. Additionally, the Kimi large language model includes tools like a calculator, which are not provided by default in the API. Users need to assemble these tools themselves.Does the Kimi API have the âweb surfingâ feature of the Kimi large language model?
tool_calls guide:
Using the Kimi API for Tool Calls
If you seek assistance from the open-source community, you can refer to the following open-source projects:
If you are looking for services provided by professional vendors, the following options are available:
The content returned by the Kimi API is incomplete or truncated
If you find that the content returned by the Kimi API is incomplete, truncated, or does not meet the expected length, you can first check the value of thechoice.finish_reason field in the response. If this value is length, it means that the number of Tokens in the content generated by the current model exceeds the max_tokens parameter in the request. In this case, the Kimi API will only return content within the max_tokens limit, and any excess content will be discarded, resulting in the aforementioned âincomplete contentâ or âtruncated content.â
When encountering finish_reason=length, if you want the Kimi large language model to continue generating content from where it left off, you can use the Partial Mode provided by the Kimi API. For detailed documentation, please refer to:
Using the Partial Mode Feature of the Kimi API
To avoid finish_reason=length, we recommend increasing the value of max_tokens. Our best practice suggestion is: use the estimate-token-count API to calculate the number of Tokens in the input content, then subtract this number from the maximum number of Tokens supported by the Kimi large language model (for example, for the moonshot-v1-32k model, the maximum is 32k Tokens). The resulting value should be used as the max_tokens value for the current request. The maximum value of max_tokens is 32k.
What is the output length of the Kimi large language model?
- For the
moonshot-v1-8kmodel, the maximum output length is8*1024 - prompt_tokens; - For the
moonshot-v1-32kmodel, the maximum output length is32*1024 - prompt_tokens; - For the
moonshot-v1-128kmodel, the maximum output length is128*1024 - prompt_tokens;
How many Chinese characters does the Kimi large language model support?
- The
moonshot-v1-8kmodel supports approximately 15,000 Chinese characters; - The
moonshot-v1-32kmodel supports approximately 60,000 Chinese characters; - The
moonshot-v1-128kmodel supports approximately 200,000 Chinese characters;
Inaccurate file content extraction or inability to recognize images
We offer file upload and parsing services for various file formats. For text files, we extract the text content; for image files, we use OCR to recognize text in the images; for PDF documents, if the PDF contains only images, we use OCR to extract text from the images, otherwise we only extract the text content.; Note that for images, we only use OCR to extract text content, so if your image does not contain any text, it will result in a parsing failure error. For a complete list of supported file formats, please refer to: File InterfaceWhen using the files interface, I want to reference file content using file_id
We currently do not support referencing file content using the file file_id.
Error content_filter: The request was rejected because it was considered high risk
The input to the Kimi API or the output from the Kimi large language model contains unsafe or sensitive content. Note: The content generated by the Kimi large language model may also contain unsafe or sensitive content, which can lead to the content_filter error.
Connection-related errors
If you frequently encounter errors such asConnection Error or Connection Time Out while using the Kimi API, please check the following in order:
- Whether your program code or the SDK you are using has a default timeout setting;
- Whether you are using any type of proxy server and check the network and timeout settings of the proxy server;
stream=True is not enabled. This can cause the waiting time for the Kimi large language model to generate content to exceed the timeout settings of an intermediate gateway. Typically, some gateway applications determine whether a request is valid by detecting whether a status_code and header are received from the server. When not using stream output stream=True, the Kimi server will wait for the Kimi large language model to finish generating content before sending the header. While waiting for the header to return, some gateway applications may close connections that have been waiting for too long, resulting in connection-related errors.
We recommend enabling stream output stream=True to minimize connection-related errors.
The TPM and RPM limits shown in the error message do not match my account Tier level
If you encounter arate_limit_reached_error while using the Kimi API, such as:
api_key for your account. In most cases, the reason for the mismatch between TPM and RPM and expectations is the use of an incorrect api_key, such as mistakenly using an api_key provided by another user, or mixing up api_keys when you have multiple accounts.
Make sure you have correctly set base_url=https://api.moonshot.ai in your SDK. The model_not_found error usually occurs because the base_url value is not set when using the OpenAI SDK. As a result, requests are sent to the OpenAI server, and OpenAI returns the model_not_found error.
Numerical Calculation Errors in the Kimi Large Language Model
Due to the uncertainty in the generation process of the Kimi large language model, it may produce calculation errors of varying degrees when performing numerical computations. We recommend using tool calls (tool_calls) to provide the Kimi large language model with calculator functionality. For more information on tool calls (tool_calls), you can refer to our guide on Using the Kimi API for Tool Calls (tool_calls).
The Kimi Large Language Model Cannot Answer Todayâs Date
The Kimi large language model cannot access highly time-sensitive information such as the current date. However, you can provide this information to the Kimi large language model through the system prompt. For example:- python
- node.js
How to Handle Errors Without Using an SDK
In some cases, you might need to directly interface with the Kimi API (instead of using the OpenAI SDK). When interfacing with the Kimi API directly, you need to determine the subsequent processing logic based on the status returned by the API. Typically, we use the HTTP status code 200 to indicate a successful request, while 4xx and 5xx status codes indicate a failed request. We provide error information in JSON format. For specific handling logic based on the request status, please refer to the following code snippets:- python
- node.js
Why Do Some Requests Respond Quickly While Others Respond Slowly When the Prompt Is Similar?
If you find that some requests respond quickly (e.g., in just 3 seconds) while others respond slowly (e.g., taking up to 20 seconds) with similar prompts, it is usually because the Kimi large language model generates a different number of tokens. Generally, the number of tokens generated by the Kimi large language model is directly proportional to the response time of the Kimi API; the more tokens generated, the longer the complete response time. It is important to note that the number of tokens generated by the Kimi large language model only affects the response time for the complete request (i.e., after generating the last token). You can setstream=True and observe the time to first token (TTFT) return time. Under normal circumstances, when the length of the prompt is similar, the first token response time will not vary significantly.
I Set max_tokens=2000 to Have Kimi Output 2000 Characters, but the Output Is Less Than 2000 Characters
The max_tokens parameter means: When calling /v1/chat/completions, it specifies the maximum number of tokens the model is allowed to generate. When the number of tokens already generated by the model exceeds the set max_tokens, the model will stop generating the next token.
The purpose of max_tokens is:
- To help the caller determine which model to use (for example, when
prompt_tokens + max_tokens ⤠8 * 1024, you can choose themoonshot-v1-8kmodel); - To prevent the Kimi model from generating excessive unexpected content in certain unexpected situations, which could lead to additional cost consumption (for example, the Kimi model repeatedly outputs blank characters).
max_tokens does not indicate how many tokens the Kimi large language model will output. In other words, max_tokens will not be used as part of the prompt input to the Kimi large language model. If you want the model to output a specific number of characters, you can refer to the following general solutions:
- For occasions where the output content should be within 1000 characters:
- Specify the number of characters in the prompt to the Kimi large language model;
- Manually or programmatically check if the output character count meets expectations. If not, in the second round of conversation, indicate to the Kimi large language model that the âcharacter count is too highâ or âcharacter count is too lowâ to generate a new round of content.
- For occasions where the output content should be more than 1000 characters or even more:
- Try to break down the expected output content into several parts by structure or chapter and create a template, using placeholders to mark the positions where you want the Kimi large language model to output content;
- Have the Kimi large language model fill in each placeholder of the template one by one, and finally assemble the complete long text.
I Made Only One Request in a Minute, but Triggered the Your account reached max request Error
Typically, the SDK provided by OpenAI includes a retry mechanism:
Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.This retry mechanism will automatically retry 2 times (a total of 3 requests) when encountering an error. Generally speaking, in cases of unstable network conditions or other situations that may cause request errors, using the OpenAI SDK can amplify a single request into 2 to 3 requests, all of which will count towards your RPM (requests per minute) limit. Note: For users using the OpenAI SDK with a
tier0 account level, due to the default retry mechanism, a single erroneous request can exhaust the entire RPM quota.
To Facilitate Transmission, I Used base64 Encoding for My Text Content
Please do not do this. Encoding your files with base64 will result in a huge consumption of tokens. If your file type is supported by our /v1/files file interface, you can simply upload the file and extract its content using the file interface.
For binary or other encoded file formats, the Kimi large language model currently cannot parse the content, so please do not add it to the context.
Why Canât I Use the Key Applied on the platform.kimi.com Platform on the platform.kimi.ai Platform?
Kimi Open Platform officially provides two platforms, with the mainland China platform recommended to use platform.kimi.com, and the international platform recommended to use platform.kimi.ai. The account and key on the two platforms are completely independent and cannot be mixed. If you use the wrong platform, you will receive a 401 invalid_authentication_error error. If you receive a 401 error, please first check if you are using the wrong platform key.- Domestic open platform base_url: https://api.moonshot.cn/v1
- International open platform base_url: https://api.moonshot.ai/v1