I'm launching something that started as a side project publicly today: ReadToMe, which is an iPhone app that turns paper books and other printed text into audio.Originally this was a Christmas present for my fiancée, who loves books but has an eye problem that makes it hard for her to read more than a few pages at a time. She mostly listens to audiobooks while following along with the paper book, but some books aren't available in audiobook or even e-book form, and all of the existing apps we tried were surprisingly bad at scanning paper books into audio — they make lots of mistakes, include footnotes and page numbers, etc., in a way that really degrades the experience.Being an AI-oriented engineer by training, I had a crack at solving the problem myself, and was pleasantly surprised at how well the proof of concept worked. I then had some time free while shutting down my previous company (Mezli, YC W21), during which I polished up the app to the point you see it at now.The way it works:On the front end, it's a SwiftUI app (mostly written by ChatGPT!) that consists mostly of a document scanner (VNDocumentCameraViewController) and a custom-built audio player.The back end is more complex — book photos are first sent to an OCR API, then some custom code I wrote does a first pass at stitching together and correcting the results. Then, the corrected OCR results are sent to GPT-3.5-turbo for further post-processing and re-stitching together, and finally to a text-to-speech API for conversion to audio.The hardest part of this process was actually getting the GPT calls right — I ended up writing a custom LLM eval framework for making sure the LLM wasn't making edits relative to the true text of the book.A few issues remain, which I'll work on fixing if the app gets a significant amount of traction, including:1) It can take multiple minutes to get audio back from a scan, especially if it's on the longer side (10+ pages). I'll be able to bring this down by spinning up dedicated servers for the OCR and TTS back-end.2) The LLM sometimes does TOO good of a job at correcting "mistakes" in book text. This issue crops up particularly often when an author deliberately uses improper grammar, e.g. in dialogue.The app is priced at $9.99/month for up to 250 pages/month right now, which I estimate will just about cover the costs of API calls. I'll be bringing the price point down as the pricing of the required AI APIs comes down. There's also a 3-day free trial if you want to try it out.If you do find this useful, or know somebody who might, I'd appreciate you giving it a try or letting them know! And please let me know if you have any feedback, including issues or feature requests.
Users expressed concerns about the $9.99 price point, considering it too high, especially for the limited page access offered. There's interest in high-quality text-to-speech (TTS) for various formats, with some users suggesting alternatives like Narakeet or native iOS solutions, while others criticize the robotic sound of native TTS. Questions about onboard text recognition, speech synthesis APIs, and potential copyright issues were raised. Positive feedback includes excitement for the product, its features, and suggestions for marketing strategies. Users are also looking forward to a demo and further development updates.
Users criticized the product for its limited use case, particularly for audiobooks, and its high cost relative to the perceived value. The 250-page monthly limit was seen as too restrictive, and there were concerns about price and competition. The quality of text-to-speech (TTS) was deemed poor, with native TTS sounding robotic and the setup being complex. Users were disappointed with the non-integration of native iOS TTS and the lack of consideration for alternatives like ElevenLabs. Legal issues regarding copyright and sharing, as well as the lack of language and translation options, were also points of contention.