Where Should You Put TikTok Keywords: Caption, On-Screen Text, or Spoken Audio?
Put the viewer’s question where a person can understand it—not everywhere a keyword can technically fit.
By Trytagly Editorial Team · Updated
The short answer
If you are optimizing a TikTok for search, use the phrase—or a natural close variation—in three useful places: make the spoken opening identify the problem, make the on-screen title clarify the promise, and make the post caption add context. Then use a small set of relevant hashtags to classify the finished video. The wording does not need to be identical in every location.
Repeating “how to clean a cast iron skillet” word for word in the voiceover, text overlay, post caption, subtitles, and hashtags does not make the answer better. A clearer video might say, “Here is how I clean cast iron without stripping the seasoning,” show “Clean cast iron without damaging the seasoning” on screen, and use the post caption to name the method’s limits.
The goal is query-to-proof alignment: the words set an accurate expectation, and the video visibly delivers the answer.
First, separate the four text and audio layers
Creators often call two different things a “caption,” which makes TikTok SEO advice confusing. Spoken audio is what the creator says or narrates. It should make the subject understandable even when the opening is conversational.
On-screen text is a title, label, step, or callout added to the video. It can summarize the question before the demonstration begins.
Creator captions or auto-generated captions are subtitles that transcribe speech. TikTok says creators can review and edit them. Their primary value is accessibility and an accurate reading experience.
The post caption is the description below the video. It can explain the method, audience, conditions, or next step that would make the video itself too crowded.
Hashtags sit inside the post caption, but they have a narrower role: they label the topic or community. They should support the meaning already present in the video rather than introduce an unrelated trend.
What TikTok actually confirms about search
TikTok’s public explanation of recommendations says search results may be influenced by user interactions, user information, and content information. For search, that content information includes how well the content matches the specific query, hashtags, and the sound used. TikTok does not publish a universal formula showing the weight of each element.
There is also evidence that TikTok can technically interpret more than a post description. TikTok’s Content Suite, an advertiser tool for finding relevant organic creator content, says it identifies brand relevance through captions, hashtags, voiceover using speech recognition, and text within a video using optical character recognition.
That is useful evidence of TikTok’s content-understanding capabilities. It is not proof that natural search assigns a fixed ranking weight to each field. TikTok has not publicly confirmed common third-party claims such as “the first three seconds carry a specific percentage of SEO value,” “on-screen text ranks four times better,” or “a phrase must appear in exactly three places.”
Make every layer accurate and helpful, then measure the search information available in your account. Do not build a workflow around an invented weighting table.
Give each placement one clear job
Spoken opening: identify the viewer’s problem. A search-focused opening should tell the right viewer that the video addresses their question. “Here is how I clean cast iron without removing the seasoning” closely mirrors a likely query. “If your cast iron still feels greasy after rinsing, start with this cleaning step” uses natural speech while naming the object and problem. Either can work if the next shot begins proving the promise.
Avoid a long mystery hook that hides the subject, such as “You have been doing this wrong your whole life.” It may create curiosity, but a searcher cannot tell whether the video is about cookware, laundry, or car maintenance.
On-screen title: make the promise scannable. “Clean cast iron without stripping the seasoning” clarifies the video before or alongside the audio. Keep the title away from interface controls, give it enough contrast, and leave it on screen long enough to read. Those are usability decisions, not secret ranking tricks.
Do not turn the first frame into a keyword list. “Cast iron cleaning / skillet care / pan cleaning hack / cookware tips” gives the viewer four labels and no clear promise.
Speech captions: correct meaning-changing errors. Review names, technical terms, measurements, product models, and words that change the safety or meaning of an instruction. “Do use soap” and “don’t use soap” are not small transcription differences.
TikTok’s accessibility instructions say eligible uploaded videos can receive auto-generated captions that creators can edit or remove. Creator captions also allow line-by-line review during editing. Correct captions because viewers deserve an accurate version. A clean transcript may make the subject easier for TikTok systems to interpret, but TikTok does not promise a search-position increase for each correction.
Post caption: add the missing context. For the cast iron example, a useful caption is: “A simple cast iron cleaning routine for stuck-on food that preserves the existing seasoning. This covers a normally seasoned skillet, not rust removal or full restoration.” It names the subject naturally and prevents the video from promising more than it delivers.
Hashtags: classify the finished answer. Choose them after the video, hook, and caption are settled. A useful set might include #castironcare, #castironskillet, #kitchentips, and #cookwarecare if each matches the demonstration. Do not add #easyrecipe, #homeorganization, or a trending food tag just because neighboring videos receive attention.
- Use spoken audio to identify the problem in natural language.
- Use on-screen text to make the promise easy to scan.
- Use speech captions to represent the actual instruction accurately.
- Use the post caption for audience, method, and limitations.
- Use hashtags as accurate supporting labels.
Build a relevant hashtag shortlist · Avoid common hashtag mismatches
A repeatable query-to-proof workflow
Start with one complete viewer question, not a pile of keywords: “How do I clean stuck food from cast iron without damaging the seasoning?” If one focused video cannot answer it, narrow it before filming. “Everything about cast iron” is a category, not a usable brief.
After publishing, Creator Search Insights includes Search analytics. Record the target question, date, opening, format, and search information visible to your account. Compare several related posts instead of judging one result.
If a video receives little search visibility, placement is only one possible cause. The query may have weak audience fit, the opening may lose viewers, or the demonstration may not answer the question. Treat analytics as feedback on the complete post.
- Define the visible proof. List what the viewer must see: the starting condition, the action, and the final result. Search wording cannot compensate for missing proof.
- Draft the spoken hook and screen title together. They should agree without sounding duplicated. Let speech carry personality and the screen title carry scan-friendly clarity.
- Record, then review the transcript. Fix substantive mistakes and break lines where they are easy to follow. Do not force keyword repetitions into subtitles that were never spoken.
- Write a context caption. Add the key limitation, audience, or method instead of merely copying the on-screen title.
- Add relevant hashtags, then read the opening, screen title, post caption, and hashtags as one package.
- Run a mismatch check: do all elements describe the same problem, does the video show the promised answer, and did any hashtag introduce a different audience or intent?
The practical rule
Say what problem the video solves. Show that promise in readable language. Correct speech captions so they reflect what was actually said. Use the post caption for context and hashtags for accurate classification. Then judge the finished package by whether a real searcher gets the answer they expected.
That approach is less dramatic than an algorithm hack. It is also more durable, easier to repeat, and less likely to turn a useful video into keyword-shaped noise.
Frequently asked questions
- Do I need to say my exact TikTok keyword out loud? No official TikTok document requires an exact-match phrase. Say the topic clearly in natural language and make sure the video answers the same intent.
- Should the same keyword appear in spoken audio, on-screen text, and the post caption? The three layers should agree, but they do not need identical wording. Use each layer for a distinct communication job.
- Do auto-generated captions help TikTok SEO? TikTok confirms that auto-generated captions use speech recognition and can be edited, but it does not publicly promise a specific organic search boost from enabling or correcting them. Use accurate captions first for accessibility and comprehension.
- Does TikTok read text inside a video? TikTok says its Content Suite can use optical character recognition to find brand relevance in text within organic videos. That shows the platform has this capability, but it does not reveal the precise ranking weight of on-screen text in normal search results.
- How early should I show the search phrase? Show the topic early enough that a viewer immediately understands the promise. Third-party guides prescribe one, three, or five seconds, but TikTok’s public organic-search documentation does not publish a universal deadline.
- Are hashtags or caption keywords more important for TikTok search? TikTok lists both query-content matching and hashtags among information that may influence search. It does not publish a universal comparison of their weights.