Ep 022: Evidence of Attempted Posting

Functional Design in Clojure

محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

6+ y ago 33:52

MP3•خانه قسمت

Christoph questions his attempts to post to Twitter.

This week, continuing to dig into the "Twitter problem". We want to post to Twitter on a schedule.
"Writing code to help out with laziness."
Start with data to keep track of: inside (our data) and outside (Twitter data)
"Data from a foreign land."
We need to determine our "working view" of Twitter's data.
What is in our data? For each "scheduled tweet":
- Text to post: the "status"
- Timestamp of when to post
Timestamps are nice
- Milliseconds since the epoch
- Universal instant
- Allows the client to localize
How do we know a scheduled tweet has been posted? A "posted?" boolean?
Boolean says, "Yes! It has been posted somewhere on the Internet."
Correlating identifiers are more useful than a Boolean.
The tweet ID is a correlating identifier. We can use it to lookup all of Twitter's data about it.
"We don't need to store all of Twitter in our database."
What is the story you need to tell about what happened?
- A record of all the attempts allows us to tell a story about what happened.
- Useful to have the timestamp of when our application posted it.
Make a separate log for attempts.
- Attempting to post is a separate concern than what to post.
- Don't complicate the scheduled tweet information by embedding the log.
"Once you have all the data, it allows you to ask new questions you didn't originally think of."
Clojure makes it easy to work with a large tree of data that came from an external source. We don't have to care about the structure of that data. We can just write it down.
Simply attempt to post the next scheduled tweet that does not have a Twitter ID recorded.
If it fails, just record the attempt, and go back to sleep.
"Handle the brick in front of you, and if you keep doing that, you'll eventually build the wall."
What if we don't hear the success response from Twitter, but it did get posted?
Idea: Try to detect if a tweet has already been posted.
If we can uniquely identify something by its content, we can know two things are the same without having a common ID.
Problem: Twitter can alter the contents.
Idea: fuzzy "measure of similarity" between our recent tweets and the next scheduled tweet.
We can record the fuzzy match in our attempt log too!
If we can correlate by contents, we could even identify when we manually post in advance.
As soon as you can determine equality by the substance of the thing itself, you can have more than one writer.
How "recent" is "recent"? Is it 100? Is it 200? Is it 500?
Even better, fetch all the tweets since the last ID we recorded.
- we know we're seeing all of the tweets
- can scan each of those for a match (in the case of a manual post)
- know when the tweet stream ends, so we can know a posting is still needed
The worker will get there eventually. Can just give up on an error. No complex retry and recovery logic.
With more than one writer, we still can have a race condition. Ultimately Twitter has to deal with deduplication to avoid a double post in a short interval.

Message Queue discussion:

Namespacing in a map is really useful
A flat, namespaced map is easier to traverse than a nested map.
One use for namespaces: indicate the origin of the data
Eg. :twitter/id, :twitter/status vs :local/id, :local/text
You see the namespace in your code, so it makes the data origin very visible.

Related episodes:

Schemas for data. "Internal" vs "external" data.
- 007: Input Overflow
Creating a "big bag of data" and asking it questions.

Related projects:

Clojure in this episode:

pr-str

118 قسمت

#Prog Languages #Software #Design #Functional #Clojure #Immutable #Geek #Development #Nate #Jones #Neumann #Christoph Neumann and Nate Jones #Christoph Neumann #Nate Jones #Tech #Nerds #Christopher